{"id":168,"date":"2011-02-14T00:03:38","date_gmt":"2011-02-13T23:03:38","guid":{"rendered":"http:\/\/trigonakis.com\/blog\/?p=168"},"modified":"2011-02-28T17:45:07","modified_gmt":"2011-02-28T16:45:07","slug":"extracting-citations-from-a-bibtex-file-using-linux-terminal","status":"publish","type":"post","link":"http:\/\/trigonakis.com\/blog\/2011\/02\/14\/extracting-citations-from-a-bibtex-file-using-linux-terminal\/","title":{"rendered":"Extracting citations from a BibTex file using Linux terminal"},"content":{"rendered":"<p>I had a big (around 40 entries) BibTex file with the references of some papers I studied and I wanted to extract the citations in the format used for citing in Latex (<code>\\cite{AuthorYear}<\/code>). Just today I read some tutorials about <code>awk<\/code>, so I thought &#8220;Let&#8217;s use it!!&#8221;.<\/p>\n<p>An example BibTex file:<\/p>\n<pre lang=\"latex\">@article{Kotselidis2010,\r\nauthor = {Kotselidis, Christos and Lujan, Mikel and Ansari, Mohammad and Malakasis,\r\n    Konstantinos and Kahn, Behram and Kirkham, Chris and Watson, Ian},\r\ndoi = {10.1109\/IPDPS.2010.5470460},\r\nisbn = {978-1-4244-6442-5},\r\njournal = {2010 IEEE International Symposium on Parallel \\&\r\n    Distributed Processing (IPDPS)},\r\npages = {1--12},\r\npublisher = {Ieee},\r\ntitle = {{Clustering JVMs with software transactional memory support}},\r\nurl = {http:\/\/ieeexplore.ieee.org\/lpdocs\/epic03\/wrapper.htm?arnumber=5470460},\r\nyear = {2010}\r\n}\r\n@phdthesis{Zhang2009c,\r\nauthor = {Zhang, Bo},\r\nkeywords = {cache-coherence,contention manager,distributed transactional memory},\r\ntitle = {{On the Design of Contention Managers and Cache-Coherence Protocols for\r\n    Distributed Transactional Memory}},\r\nyear = {2009}\r\n}\r\n<\/pre>\n<h4>Solution<\/h4>\n<pre lang=\"bash\">\r\nawk 'BEGIN{FS=\"[{,]\"} \/@\/ {print \"\\\\cite{\"$2\"}\"}' filename.bib\r\n<\/pre>\n<p>\\cite{Zhang2009c}<br \/>\n\\cite{Kotselidis2010}<\/p>\n<p>In order to save the output in a file named <code>cites.txt<\/code>:<\/p>\n<pre lang=\"bash\">\r\nawk 'BEGIN{FS=\"[{,]\"} \/@\/ {print \"\\\\cite{\"$2\"}\"}' filename.bib > cites.txt\r\n<\/pre>\n<p><strong>Hint<\/strong>: Use &#8220;<strong><code>&gt;&gt;<\/code><\/strong>&#8221; if you want to append the output. Single <code>&gt;<\/code> creates a new file (if not existing), or empties the existing one and then appends the content..<\/p>\n<p>If you want to know my &#8220;implementation&#8221; process, continue reading \ud83d\ude09<br \/>\n<!--more--><\/p>\n<h4>Implementation Steps<\/h4>\n<p>So, I followed the following steps:<\/p>\n<ol>\n<li>Extracted the lines that contain the keyword (AuthorYear) with <code>grep<\/code>:\n<pre lang=\"bash\">grep @ filename.bib<\/pre>\n<p>got:<\/p>\n<pre lang=\"bash\">Zhang2009c\r\nKotselidis2010\r\n<\/pre>\n<\/li>\n<li>Pipeline it to <code>sed<\/code> in order to remove the &#8216;{&#8216; and &#8216;,&#8217;:\n<pre lang=\"bash\">grep @ filename.bib | sed s\/'{'\/' '\/g | sed s\/','\/''\/g<\/pre>\n<p>got:<\/p>\n<pre lang=\"bash\">@phdthesis Zhang2009c\r\n@article Kotselidis2010\r\n<\/pre>\n<\/li>\n<li>Pipeline it to <code>awk<\/code> to print the final result:\n<pre lang=\"bash\">grep @ filename.bib | sed s\/'{'\/' '\/g | sed s\/','\/''\/g |\r\n    awk '{ print \"\\\\cite{\"$2\"}\" }'\r\n<\/pre>\n<p>got:<\/p>\n<pre lang=\"latex\">\\cite{Zhang2009c}\r\n\\cite{Kotselidis2010}\r\n<\/pre>\n<\/li>\n<li>Redirect output to a file:\n<pre lang=\"bash\">grep @ filename.bib | sed s\/'{'\/' '\/g | sed s\/','\/''\/g |\r\n    awk '{ print \"\\\\cite{\"$2\"}\" }' > cites.txt<\/pre>\n<\/li>\n<\/ol>\n<p><strong>Done!<\/strong><\/p>\n<p>Then I thought about the <code>cut<\/code> command that can be used to remove sections from each line of input. With <code>cut<\/code>, instead of <code>sed<\/code>:<\/p>\n<pre lang=\"bash\">grep @ filename.bib | cut -d{ -f2 | cut -d, -f1 |\r\n    awk '{print \"\\\\cite{\"$1\"}\"}' > cites.txt\r\n<\/pre>\n<p>where <code>-d<\/code> indicates the delimiter to use in order to split the input and <code>-f<\/code> which field (column) to keep (<code>cut<\/code> command).<\/p>\n<p><strong>Update: <\/strong>Just found out about the <code>-F<\/code> parameter for <code>awk<\/code>, which sets the the <em>field separator<\/em>. Using it:<\/p>\n<pre lang=\"bash\">\r\ngrep @ filename.bib | cut -d{ -f2 | awk -F, '{print \"\\\\cite{\"$1\"}\"}' > cites.txt\r\n<\/pre>\n<p>And, of course, instead of having two different <code>sed<\/code> calls, we can use a regular expression:<\/p>\n<pre lang=\"bash\">\r\ngrep @ filename.bib | sed s\/[{,]\/\" \"\/g | awk '{print \"\\\\cite{\"$2\"}\"}' > cites.txt\r\n<\/pre>\n<p>Finally, the shortest way I could find is by using the following <code>awk<\/code> script:<\/p>\n<pre lang=\"awk\">\r\nBEGIN {\r\n    FS=\"[{,]\"\r\n}\r\n    \/@\/ {print \"\\\\cite{\"$2\"}\"}\r\nEND{}\r\n<\/pre>\n<p>Let&#8217;s say you save it as <i>bibtex.awk<\/i>, then you can call it as:<\/p>\n<pre lang=\"bash\">\r\nawk -f bibtex.awk filename.bib\r\n<\/pre>\n<p>Of course, you can still use it without saving it to a file:<\/p>\n<pre lang=\"bash\">\r\nawk 'BEGIN{FS=\"[{,]\"} \/@\/ {print \"\\\\cite{\"$2\"}\"}' filename.bib\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>I had a big (around 40 entries) BibTex file with the references of some papers I studied and I wanted to extract the citations in the format used for citing in Latex (\\cite{AuthorYear}). Just today I read some tutorials about awk, so I thought &#8220;Let&#8217;s use it!!&#8221;. An example BibTex file: @article{Kotselidis2010, author = {Kotselidis, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[37],"tags":[10,36,9,35],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p1ouW6-2I","_links":{"self":[{"href":"http:\/\/trigonakis.com\/blog\/wp-json\/wp\/v2\/posts\/168"}],"collection":[{"href":"http:\/\/trigonakis.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/trigonakis.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/trigonakis.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/trigonakis.com\/blog\/wp-json\/wp\/v2\/comments?post=168"}],"version-history":[{"count":23,"href":"http:\/\/trigonakis.com\/blog\/wp-json\/wp\/v2\/posts\/168\/revisions"}],"predecessor-version":[{"id":272,"href":"http:\/\/trigonakis.com\/blog\/wp-json\/wp\/v2\/posts\/168\/revisions\/272"}],"wp:attachment":[{"href":"http:\/\/trigonakis.com\/blog\/wp-json\/wp\/v2\/media?parent=168"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/trigonakis.com\/blog\/wp-json\/wp\/v2\/categories?post=168"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/trigonakis.com\/blog\/wp-json\/wp\/v2\/tags?post=168"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}