Archive for July 28, 2011

Pattern Matching Hyphen-Minus Sign in Bash

I was trying to use the sed command to perform some changes to a text and stepped into an interesting “problem”; pattern matching the minus-hyphen (-) symbol.

Assume we have the following text:

something
SoMeThiNg
some-thing
soMe_thing

and we want to match all the different versions of the word with one expression (one by one).

My initial idea was to use this regular expression:

's/[a-zA-Z\-\_]*/matched/'

Naturally, I tried to escape the – sign. As you can see from the output, this doesn’t work:

$ sed 's/[a-zA-Z\-\_]*/matched/' test 
matched
matched
matched-thing
matched

The minus sign is not matched, because of its special meaning (setting ranges). In order to make the expression work, you need to move the “-” either in the beginning or in the end of the expression:

$ sed 's/[a-zA-Z\_-]*/matched/' test 
matched
matched
matched
matched
$ sed 's/[-a-zA-Z\_]*/matched/' test 
matched
matched
matched
matched

and leave it un-escaped!