User Tools

Site Tools


wiki:2018sed_and_grep

Sed, grep and awk 🙊

Back to previous page Loops and More Conditionals 🙉

Before scripting languages like perl and python, the UNIX/linux user had to choose between using shell scripting or writing a small c program to accomplish some task. Many things are not possible with builtin shell operations, whereas writing a c program is not convenient or accessible in all situations to all users. Hence the birth of in-between level tools, including, but not limited to, awk, grep and sed.

  • awk - a full scripting language, but common usage is one-liners
  • grep - globally search a regular expression and print - all about pattern matching.
  • sed - stream editor. Not a full-fledge scripting language, but has multiple commands to edit files (streams).

Of these three, only awk is useful for computation because it can handle floating point numbers (as opposed to just integers). However, its arcane syntax makes it inferior to modern languages like python and will not be bothered with.

The sed and grep commands are useful command line tools, though, so we will explore some common applications of them.

Grep

Globally search a regular expression and print. A “regular” expression, as opposed to something else (irregular?) is a formal syntax for string-matching. The regular expression is commonly referred to as a “pattern”.

Pattern-matching

More specifically, grep returns all lines of input that match the given pattern. The flash-point use of grep is to search for lines in a file that contain some particular text.

Try this:

$ grep 'PATH' .bash_profile
PATH=$HOME/bin:$PATH
CLASSPATH=.
######### additions to PATH
PATH=/usr/local/mysql/bin:$PATH
PATH=/Applications/Unity/Unity.app/Contents/Tools:$PATH
PATH=/Applications/LilyPond.app/Contents/Resources/bin:$PATH
CLASSPATH=~/src/java/fop.jar:~/src/java/xalan-j_2_7_2/*:$CLASSPATH
PATH=~/src/apache-ant-1.9.8/bin:$PATH
CLASSPATH=~/src/apache-ant-1.9.8/lib/*:$CLASSPATH
PATH=~/bin/UCSC_userApps:$PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/opt
PATH=~/bin/tophat-2.1.1.OSX_x86_64:$PATH
PATH=~/bin/sratoolkit.2.8.2-1-mac64/bin:$PATH
PATH=~/bin/bedtools2/bin:$PATH
PATH=~/bin/fastx-bin:$PATH
PATH=~/bin/bowtie-1.2:$PATH
export CLASSPATH
export PATH
# Setting PATH for Python 3.6
PATH="/Library/Frameworks/Python.framework/Versions/3.6/bin:${PATH}"
export PATH
export PYTHONPATH="/Users/david/lib/python2.7/site-packages/"

My output returns a lot of lines because I amend my PATH variable to include new software in its own directory rather than pooling it all into a common place… You do you.

You can also use it in a pipe. The equivalent command in a pipe could be:

$ cat .bash_profile | grep PATH

Using cat on a file puts it on stdout, but you might be running a command and grepping its output, without the availability of a file.

$ env | grep PATH
PATH=/Library/Frameworks/Python.framework/Versions/3.6/bin:/Users/david/bin/bowtie-1.2:/Users/david/bin/fastx-bin:/Users/david/bin/bedtools2/bin:/Users/david/bin/sratoolkit.2.8.2-1-mac64/bin:/Users/david/bin/tophat-2.1.1.OSX_x86_64:/Users/david/bin/UCSC_userApps:/Users/david/src/apache-ant-1.9.8/bin:/Applications/LilyPond.app/Contents/Resources/bin:/Applications/Unity/Unity.app/Contents/Tools:/usr/local/mysql/bin:/Users/david/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin
PYTHONPATH=/Users/david/lib/python2.7/site-packages/
CLASSPATH=/Users/david/src/apache-ant-1.9.8/lib/*:/Users/david/src/java/fop.jar:/Users/david/src/java/xalan-j_2_7_2/*:.

Although we are just grepping for the literal string “PATH”, that string is being treated as a regular expression that lacks any operators or meta-characters. Therefore it returns lines that contain an exact match to “PATH”.

But lets add some of those functional characters in to the pattern, in order to refine the matches.

$ env | grep ^PATH
PATH=/Library/Frameworks/Python.framework/Versions/3.6/bin:/Users/david/bin/bowtie-1.2:/Users/david/bin/fastx-bin:/Users/david/bin/bedtools2/bin:/Users/david/bin/sratoolkit.2.8.2-1-mac64/bin:/Users/david/bin/tophat-2.1.1.OSX_x86_64:/Users/david/bin/UCSC_userApps:/Users/david/src/apache-ant-1.9.8/bin:/Applications/LilyPond.app/Contents/Resources/bin:/Applications/Unity/Unity.app/Contents/Tools:/usr/local/mysql/bin:/Users/david/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin

This example returns only lines that start with the string “PATH”. The caret “^” means start-of-line. It's a special condition in which nothing may precede the pattern in the line in order to match. It's kind of like an imaginary number.

Likewise, there is the end-of-line symbol, another freaking dollar sign '$'. When using the end-of-line '$', nothing come between the given string (or pattern) and the newline.

$ env | grep PATH$

No output. I don't have any lines that match that construct in my environment. Do you?

Sed

Sed - the Stream Editor - is similar to grep in that it often runs on user-supplied patterns. However, sed can alter the content using various functions/commands in its syntax.

Exercise

We are going to write a script that creates an input file for us to perform sed operations on. Save this awesome text in a file such as sed_examples.bash

#!/bin/bash
 
# write some lines to a file
echo -ne "" > somelines.txt # create a blank file "somelines.txt". HAZARD! this overwrites any content that used to be in that file.
i=1
while [ $i -le 10 ];
do
    echo "This is line $i" >> somelines.txt
    i=$(expr $i + 1)
done
 
echo "Contents of input file:"
cat somelines.txt
echo
 
echo "Delete lines 3-5:"
sed '3,5d' somelines.txt
echo
 
echo "Print out contents of file until the string 'line 5' is found"
sed '/line 5/q' somelines.txt
echo
 
echo "Print out contents of file until the string 'line 5' is found, excluding last"
sed '/line 5/Q' somelines.txt
echo
 
echo "Delete any lines containing 'line 5'"
sed '/line 5/d' somelines.txt
echo
 
echo "Suppress normal output. Only print the first and last lines:"
sed -n '1p;$p' somelines.txt 
echo

Run it.

chmod 755 sed_examples.bash
./sed_examples.bash
Contents of input file:
This is line 1
This is line 2
This is line 3
This is line 4
This is line 5
This is line 6
This is line 7
This is line 8
This is line 9
This is line 10
 
Delete lines 3-5:
This is line 1
This is line 2
This is line 6
This is line 7
This is line 8
This is line 9
This is line 10
 
Print out contents of file until the string 'line 5' is found
This is line 1
This is line 2
This is line 3
This is line 4
This is line 5
 
Print out contents of file until the string 'line 5' is found, excluding last
This is line 1
This is line 2
This is line 3
This is line 4
 
Delete any lines containing 'line 5'
This is line 1
This is line 2
This is line 3
This is line 4
This is line 6
This is line 7
This is line 8
This is line 9
This is line 10
 
Suppress normal output. Only print the first and last lines:
This is line 1
This is line 10

Study the code underlying each of these statements to see some interesting uses of sed. I actually never use this functionality of sed, but I should!

SUBSTITUTIONS AND REGULAR EXPRESSIONS 🙈

wiki/2018sed_and_grep.txt · Last modified: 2018/08/31 10:47 by david