#8: Advanced usage of grep
grep
is probably one of the best known, or let's better say: the most used command line tools on UNIX systems. It's often used for things like
ps aux | grep someprocess
or anything related. That's very basic usage, but grep
is so much more than just a very simple text search command.
First of all, let me make clear that grep
is not only built to work with streams but also with files. As stated in Article #4: Cut your use of cat of this Advent series, there is a lot of meaningless use of grep
in combination with cat
an pipes. That usage is absolute nonsense. So if your work on files, don't use pipes, just write this:
grep expr file
where expr
is your search expression and file
is your file name. So much by way of introduction, let's come to the more crucial points.
Are you familiar with regular expressions? You should. In my opinion, each programmer and system administrator should at least know the basic facts about regular expressions. Personally I love regexps but I know that they cause many people headaches. There is no reason for that, don't be afraid of regular expressions. Once you've understood the concept they're quite simple.
grep
basically understands three types of regular expressions: basic regexp, extended regexp and Perl compatible regexp (PCRE). When you run grep
without any further parameter (or with -G
), grep
will assume your expression to be basic regexp. Basic regular expressions are specified by POSIX (Portable Operating System Interface for UNIX) but in grep
they have a few extensions such as the quantifiers \?
and \+
. And here you already see the basic syntax: special meta characters have to be escaped with backslashes. So if you've ever bothered about grep
not interpreting your regular expression right, it's because you haven't escaped the meta characters:
echo 'foooooooooobar' | grep 'fo\+b.\?r'
That'll work fine. The same applies to parentheses and quantifiers in braces, but not to squared brackets, which define character classes:
echo 'foofoobar' | grep '\(foo\)\{1,2\}[abr]\{3\}'
Notice the unescaped square brackets. Escaping these would erase their special meaning. The same for the meta characters .
(any non-whitespace character) and *
(quantifier equal to \{0,\}
) and of course the backslash itself \
. These only have a special meaning without backslashes, so escaping would make them normal characters.
This syntax is not very convenient, so there are extended regular expressions where all these meta characters are written without backslashes. To use extended regexps, specify the parameter -E
:
echo 'foobar' | grep -E 'fo+b.?r'
Extended regular expressions also introduce a few more character classes like \w
(word characters, equal to the POSIX class [[:alnum:]]
or [a-zA-Z0-9]
), \W
(non-word characters, i.e. the opposite) and \b
(word boundaries). There are also some more, for those have a look at the man page.
Extended regular expressions are ways more comfortable than basic regular expressions but they also have a few limitations. For instance, escape sequences like \d
for numbers and \s
for whitespace are not defined. These are included in the next level of regexp: the Perl compatible regular expressions. To use PCREs, specify the parameter -P
:
echo 'foo2bar blablub' | grep -P '^[^\W]o+\d\w+\s(?:bl[aub]{1,2}){2}$'
If you know Perl or PHP you might have worked with PCREs already. If not, now is the time, it's fun!
That's the very, very basic introduction into regular expressions with grep
. By the way, if you don't want to use regular expressions at all, set the -F
parameter, which tells grep
to handle your expressions as a fixed string, which has to match as is.
As yet we've passed our expression to grep
as a single parameter. But you can also provide multiple expressions. For instance:
echo 'foobar' | grep -e foo -e bar
If you have just one expression, -e
can be omitted. Another possibility is to load search patterns from a file with -f
. Assuming we have a regexp containg file called mypattern
, we can use it to match our string with:
echo 'foobar' | grep -f mypattern
grep
has tons of other parameters which are very interesting and useful. I list the most important here:
-i
: make the pattern case-insensitive (sox
becomes equal toX
).-w
: only match whole words-c
: don't print the result to the screen but just the number of matches.-m N
: only findN
occurrences, then exit (N
is a number).-l
: don't print the matches but the names of all files with matches.-L
: don't print the matches but the names of all files without matches.-H
: print the file name before each match. This is the default if you specified more than just one file to search (if working on STDIN, the output will be(standard input)
).-h
: suppress file names, this is the default if you're searching only one file or operating on STDIN.-n
: print the line numbers of each match.-o
: show only matches without context.
There are many, many other parameters but I can't list them all. All of the above ones also have long names such as --ignore-case
for -i
and --word-regexp
for -w
, but I prefer the short ones. Whatsoever, I advise you to read the manual for grep
carefully. There might be a lot you haven't known this tool can do. You can also work on binaries, device files and FIFOs, output the complete lines with matches on it, rather than just the matches themselves, and much more. There is a lot of hidden functionality, which not many people know about. So have fun with it and yeah… become a grep
and (reg)expert!
Read more about grep
and regular expressions:
- die.net: grep man page
- Wikipedia.org: Regular expressions
- Regular-Expressions.info
- POSIX Basic and Extended Regular Expressions
RT @reflinux: #Advent series "24 Short #Linux #Hints", day 8: Advanced usage of #grep http://bit.ly/i84ymb