Extracting citations from a BibTex file using Linux terminal

I had a big (around 40 entries) BibTex file with the references of some papers I studied and I wanted to extract the citations in the format used for citing in Latex (\cite{AuthorYear}). Just today I read some tutorials about awk, so I thought “Let’s use it!!”.

An example BibTex file:

@article{Kotselidis2010,
author = {Kotselidis, Christos and Lujan, Mikel and Ansari, Mohammad and Malakasis,
    Konstantinos and Kahn, Behram and Kirkham, Chris and Watson, Ian},
doi = {10.1109/IPDPS.2010.5470460},
isbn = {978-1-4244-6442-5},
journal = {2010 IEEE International Symposium on Parallel \&
    Distributed Processing (IPDPS)},
pages = {1--12},
publisher = {Ieee},
title = {{Clustering JVMs with software transactional memory support}},
url = {http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5470460},
year = {2010}
}
@phdthesis{Zhang2009c,
author = {Zhang, Bo},
keywords = {cache-coherence,contention manager,distributed transactional memory},
title = {{On the Design of Contention Managers and Cache-Coherence Protocols for
    Distributed Transactional Memory}},
year = {2009}
}

Solution

awk 'BEGIN{FS="[{,]"} /@/ {print "\\cite{"$2"}"}' filename.bib

\cite{Zhang2009c}
\cite{Kotselidis2010}

In order to save the output in a file named cites.txt:

awk 'BEGIN{FS="[{,]"} /@/ {print "\\cite{"$2"}"}' filename.bib > cites.txt

Hint: Use “>>” if you want to append the output. Single > creates a new file (if not existing), or empties the existing one and then appends the content..

If you want to know my “implementation” process, continue reading 😉

Implementation Steps

So, I followed the following steps:

  1. Extracted the lines that contain the keyword (AuthorYear) with grep:
    grep @ filename.bib

    got:

    Zhang2009c
    Kotselidis2010
  2. Pipeline it to sed in order to remove the ‘{‘ and ‘,’:
    grep @ filename.bib | sed s/'{'/' '/g | sed s/','/''/g

    got:

    @phdthesis Zhang2009c
    @article Kotselidis2010
  3. Pipeline it to awk to print the final result:
    grep @ filename.bib | sed s/'{'/' '/g | sed s/','/''/g |
        awk '{ print "\\cite{"$2"}" }'

    got:

    \cite{Zhang2009c}
    \cite{Kotselidis2010}
  4. Redirect output to a file:
    grep @ filename.bib | sed s/'{'/' '/g | sed s/','/''/g |
        awk '{ print "\\cite{"$2"}" }' > cites.txt

Done!

Then I thought about the cut command that can be used to remove sections from each line of input. With cut, instead of sed:

grep @ filename.bib | cut -d{ -f2 | cut -d, -f1 |
    awk '{print "\\cite{"$1"}"}' > cites.txt

where -d indicates the delimiter to use in order to split the input and -f which field (column) to keep (cut command).

Update: Just found out about the -F parameter for awk, which sets the the field separator. Using it:

grep @ filename.bib | cut -d{ -f2 | awk -F, '{print "\\cite{"$1"}"}' > cites.txt

And, of course, instead of having two different sed calls, we can use a regular expression:

grep @ filename.bib | sed s/[{,]/" "/g | awk '{print "\\cite{"$2"}"}' > cites.txt

Finally, the shortest way I could find is by using the following awk script:

BEGIN {
    FS="[{,]"
}
    /@/ {print "\\cite{"$2"}"}
END{}

Let’s say you save it as bibtex.awk, then you can call it as:

awk -f bibtex.awk filename.bib

Of course, you can still use it without saving it to a file:

awk 'BEGIN{FS="[{,]"} /@/ {print "\\cite{"$2"}"}' filename.bib

4 Responses to “Extracting citations from a BibTex file using Linux terminal”

  • Nice 🙂 … Do you also know a nice script to extract from a huge .bib file only those references actually used in a particular .tex file, so generate a shorter .bib file you can share with, e.g., a publisher? 🙂

    • Hej. I did not find an easy way to fully automate the process you describe. I got up to the point you get in which lines in the bib file the references you want to keep are. You can do this using the following script:

      #! /bin/sh                                                                                                                                                                                          
       
      KEYS=$(grep -e "cite{[a-zA-Z0-9]*" $1 -o | cut -d'{' -f 2 | uniq)
       
      for key in $KEYS
      do
          grep -n "$key" $2
      done

      Use it as ./scriptname texfilename bibfilename and will produced an output like:

      15:@article{Lenoski1990a,
      47:@article{Bilir1999,
      

      indicating the line at which each (used in the tex file) reference resides in the bib file.

      Whenever I have time I will look at it more closely.

    • PeterP:

      Take a look at bibtool (http://ctan.org/tex-archive/biblio/bibtex/utils/bibtool/). It does exactly what you want.

Leave a Reply

*