Pattern Matching Hyphen-Minus Sign in Bash

I was trying to use the sed command to perform some changes to a text and stepped into an interesting “problem”; pattern matching the minus-hyphen (-) symbol.

Assume we have the following text:

something
SoMeThiNg
some-thing
soMe_thing

and we want to match all the different versions of the word with one expression (one by one).

My initial idea was to use this regular expression:

's/[a-zA-Z\-\_]*/matched/'

Naturally, I tried to escape the – sign. As you can see from the output, this doesn’t work:

$ sed 's/[a-zA-Z\-\_]*/matched/' test 
matched
matched
matched-thing
matched

The minus sign is not matched, because of its special meaning (setting ranges). In order to make the expression work, you need to move the “-” either in the beginning or in the end of the expression:

$ sed 's/[a-zA-Z\_-]*/matched/' test 
matched
matched
matched
matched
$ sed 's/[-a-zA-Z\_]*/matched/' test 
matched
matched
matched
matched

and leave it un-escaped!

3 Responses to “Pattern Matching Hyphen-Minus Sign in Bash”

  • aleph31:

    That one should be a bug in sed. Leaving the hyphen unescaped at the beginning or end of the square brackets is optional. You can escape the hyphen in perl:

    $ perl -ne ‘s/[a-zA-Z\-\_]*/matched/;print’ test
    $ perl -ne ‘s/[a-zA-Z\_\-]*/matched/;print’ test
    $ perl -ne ‘s/[\-a-zA-Z\_]*/matched/;print’ test
    $ perl -ne ‘s/[-a-zA-Z\_]*/matched/;print’ test
    $ perl -ne ‘s/[a-zA-Z\_-]*/matched/;print’ test

    All the above throwed:
    matched
    matched
    matched
    matched

    $ perl -v | head -2 | tail -1
    This is perl 5, version 12, subversion 3 (v5.12.3) built for i686-linux-thread-multi

    • This is a good point. grep has the same issue with sed though:

      $ grep '^[a-zA-Z-_]*' test -o
      something
      SoMeThiNg
      some
      soMe_thing

      What’s actually happening is that the regexp tries to match the range from character ‘\’ to character ‘_’.

      $ echo "something" >> test
      $ grep '^[a-zA-Z-_]*' test -o
      something
      SoMeThiNg
      some
      soMe_thing
      something
      • John Duncan:

        I ran into this bug in GNU sed today when trying to pattern match the hypen in my markdown to html blogging script. I had to put it at the beginning of the statement for it to work! Thanks for the explanation here; it was driving me crazy.

Leave a Reply

*