Extracting citations from a BibTex file using Linux terminal
I had a big (around 40 entries) BibTex file with the references of some papers I studied and I wanted to extract the citations in the format used for citing in Latex (\cite{AuthorYear}
). Just today I read some tutorials about awk
, so I thought “Let’s use it!!”.
An example BibTex file:
@article{Kotselidis2010, author = {Kotselidis, Christos and Lujan, Mikel and Ansari, Mohammad and Malakasis, Konstantinos and Kahn, Behram and Kirkham, Chris and Watson, Ian}, doi = {10.1109/IPDPS.2010.5470460}, isbn = {978-1-4244-6442-5}, journal = {2010 IEEE International Symposium on Parallel \& Distributed Processing (IPDPS)}, pages = {1--12}, publisher = {Ieee}, title = {{Clustering JVMs with software transactional memory support}}, url = {http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5470460}, year = {2010} } @phdthesis{Zhang2009c, author = {Zhang, Bo}, keywords = {cache-coherence,contention manager,distributed transactional memory}, title = {{On the Design of Contention Managers and Cache-Coherence Protocols for Distributed Transactional Memory}}, year = {2009} } |
Solution
awk 'BEGIN{FS="[{,]"} /@/ {print "\\cite{"$2"}"}' filename.bib |
\cite{Zhang2009c}
\cite{Kotselidis2010}
In order to save the output in a file named cites.txt
:
awk 'BEGIN{FS="[{,]"} /@/ {print "\\cite{"$2"}"}' filename.bib > cites.txt |
Hint: Use “>>
” if you want to append the output. Single >
creates a new file (if not existing), or empties the existing one and then appends the content..
If you want to know my “implementation” process, continue reading 😉
Implementation Steps
So, I followed the following steps:
- Extracted the lines that contain the keyword (AuthorYear) with
grep
:grep @ filename.bib
got:
Zhang2009c Kotselidis2010
- Pipeline it to
sed
in order to remove the ‘{‘ and ‘,’:grep @ filename.bib | sed s/'{'/' '/g | sed s/','/''/g
got:
@phdthesis Zhang2009c @article Kotselidis2010
- Pipeline it to
awk
to print the final result:grep @ filename.bib | sed s/'{'/' '/g | sed s/','/''/g | awk '{ print "\\cite{"$2"}" }'
got:
\cite{Zhang2009c} \cite{Kotselidis2010}
- Redirect output to a file:
grep @ filename.bib | sed s/'{'/' '/g | sed s/','/''/g | awk '{ print "\\cite{"$2"}" }' > cites.txt
Done!
Then I thought about the cut
command that can be used to remove sections from each line of input. With cut
, instead of sed
:
grep @ filename.bib | cut -d{ -f2 | cut -d, -f1 | awk '{print "\\cite{"$1"}"}' > cites.txt |
where -d
indicates the delimiter to use in order to split the input and -f
which field (column) to keep (cut
command).
Update: Just found out about the -F
parameter for awk
, which sets the the field separator. Using it:
grep @ filename.bib | cut -d{ -f2 | awk -F, '{print "\\cite{"$1"}"}' > cites.txt |
And, of course, instead of having two different sed
calls, we can use a regular expression:
grep @ filename.bib | sed s/[{,]/" "/g | awk '{print "\\cite{"$2"}"}' > cites.txt |
Finally, the shortest way I could find is by using the following awk
script:
BEGIN { FS="[{,]" } /@/ {print "\\cite{"$2"}"} END{} |
Let’s say you save it as bibtex.awk, then you can call it as:
awk -f bibtex.awk filename.bib |
Of course, you can still use it without saving it to a file:
awk 'BEGIN{FS="[{,]"} /@/ {print "\\cite{"$2"}"}' filename.bib |
Nice 🙂 … Do you also know a nice script to extract from a huge .bib file only those references actually used in a particular .tex file, so generate a shorter .bib file you can share with, e.g., a publisher? 🙂
Hej. I did not find an easy way to fully automate the process you describe. I got up to the point you get in which lines in the bib file the references you want to keep are. You can do this using the following script:
Use it as
./scriptname texfilename bibfilename
and will produced an output like:indicating the line at which each (used in the tex file) reference resides in the bib file.
Whenever I have time I will look at it more closely.
Take a look at bibtool (http://ctan.org/tex-archive/biblio/bibtex/utils/bibtool/). It does exactly what you want.
:-O, great. Thanks for your hint.