User Tools

Site Tools


wiki:2018functional_revcomp

Intro to Arguments and Conditionals πŸ™Ž

New terms in this lesson
$# Number of arguments
$(cmd) Save output of cmd
$0 Scriptname
$1, $2, $3, etc. Individual Positional parameters
$? Exit status of previous command
$@ All positional parameters
[] Conditional
if Keyword
test Conditional

We have a script that gives the reverse complement of a hard-coded sequence. But we want it to be more flexible so we can run it on different sequences.

Ultimately, we want the script to work on the command line like this

$ revComp sequence
[reverse complement of sequence]

Positional parameters

From within the script, there has to be a way to access what the user types on the prompt. The technical term for this is positional parameters, in that you can access them by β€œposition” on the command line as variables in the script.

$ scriptname first second third etc
#!/bin/bash
 
echo "$1 equals first"
 
echo "$2 equals second"
 
echo "$3 equals third"
 
echo "$4 equals etc"

There are also special variables that are useful:

  • $@ - All arguments as a single string
  • $# - The number of arguments
  • $0 - The name of the script

Using command line args in revComp

Let's change revComp to work on command line arguments.

#!/bin/bash
 
# Assign a DNA sequence to a variable (sequence):
#sequence=ACTGTACGGTACAC
sequence=$1
echo $sequence | tr [ACTGactg] [TGACtgac] | rev 

I have commented-out the hard-coded declaration of sequence and replaced it with a statement that uses the first argument of the command.

Now test it by running the script, but giving it a sequence on the command line.

$ revComp ACTGTACGGTACAC
GTGTACCGTACAGT
$ revComp TAT
ATA
$ revComp GGG
CCC

But what happens if we give it something unanticipated? Like nothing, or something that's not a sequence?

$ revComp
 
$ revComp trash
hstra

It's trash, alright! Backwards! The problem is this is undefined behavior. As programmers, we want want things to have a prescribed outcome, even when given incorrect input.

The most common practice of this is to have your program tell you how to use it, or what you did wrong.

Intro to Flow control: if,then,else

In order to respond to different types of input, we need to read variables and react to their values.

Want:

$ revComp
revComp -- Return the reverse complement of a DNA sequence.
Usage: revComp sequence

We're going to tell the user (that's you, or whomever else runs your script), how to run the script when it is not given any arguments.

There are a few approaches here, but an easy way is to check if the 1st positional parameter is blank. This indicates that there were no arguments supplied on the command line.

Here is the syntax:

if [ -z "$1" ];
then
   # no arguments given
   # print usage message and exit
fi

Example explained

if condition;
then
  statement(s)
  
elif other condition;
then
   statement(s)
  
else
   statement(s)
fi   

The keyword if must be followed by conditional expressions. The keyword then marks the beginning of statements that are executed if the conditionals resolve to TRUE. OPTIONALLY, an if-elif-else construct can contain an elif and else clause. The fi (β€œif” backwards), closes the statement.

Spaces are important: The single bracket enclosure, isolated by spaces, signifies a simple conditional. A conditional always evaluates to true or false. I use TRUE and FALSE in all caps from here on in order to signify these special boolean values.

You can think of it as asking a question that is expressed through operators and operands. Is one quantity greater than another? Does a file exist? Is a variable blank?

The last question, β€œis a variable blank?” is evaluated with the -z comparison operator. More formally, -z evaluates whether a string is null (or has zero length).

Its complement, -n, evaluates whether a string is not empty (or has great-than-zero length). Let's do an exercise to get a feel for it.

β˜… Interactive Exercise: conditionals

[ -z "" ] # evaluates to TRUE
[ -n "" ] # evaluates to FALSE
[ -z "hi" ] # evaluates to FALSE
[ -n "hi" ] # evaluates to TRUE 

Let's incorporate an alternate, but equivalent syntax on the command line:

$ test -z ""
$ [ -z "" ]
$ test -z "nonempty"

Unfortunately, there is no output on the command line to show you the outcome of the conditional. It is behind the scenes.

To make use of the hidden outcome, you can use the if construct.

$ if test -z ""; then echo "it is empty"; fi

Up-arrow and change the tested string to something non-null, such as:

$ if test -z "funkadelic"; then echo "it is empty"; fi

Obviously here the conditional evaluates to false, and nothing is printed. Let's add statements for the alternate outcome:

$ if [ -z "funkadelic" ]; then echo "it is empty"; else echo "it is not empty"; fi
it is not empty

Shorthand for one-liners

A convenient shorthand for the above is available for statement blocks that only have one line, like the examples above.

$ [ -z "funkadelic" ] && echo "it is empty" || echo "it is not empty"
it is not empty

There is much less typing in this example, but the operators are a little obtuse. The && is a boolean AND operator, but here you can read if as then, while the || is a boolean OR operator, but here you can read it as else.

We'll see more about how this works when we talk about exit statuses in this lesson, and compound conditionals in a future lesson.

So much semicolon

When you type on the command line, semicolons are a substitute for hitting the return key. Try doing the above example, but hitting the return key everywhere there is a semicolon.

$ if test -z "funkadelic"
>

Hold up. See your prompt has changed to a '>'. That means that more input is expected. If you screw up here, you'll get a syntax error, so the stakes are high!

$ if test -z "funkadelic"
> fi
-bash: syntax error near unexpected token `fi'

The shell is not having it.

The shell needs a then keyword to tell it that you're done building the conditional. It's just the way it is.

$ if test -z "funkadelic"
> then
> echo "yes"
> else
> echo "no"
> fi
no

If you up-arrow, you'll see the single-line version.

$ if test -z "funkadelic"; then echo "yes"; else echo "no"; fi

Wow!

revComp with usage message

Let's add this decision-making capacity to our script, and also print an usage message:

#!/bin/bash
 
if [ -z "$1" ];
then
    echo "revComp -- Return the reverse complement of a DNA sequence."
    echo "Usage: revComp sequence"
    exit
fi
 
sequence=$1
echo $sequence | tr [ACTGactg] [TGACtgac] | rev 

Try it out!

$ ./revComp.sh
revComp -- Return the reverse complement of a DNA sequence.
Usage: revComp sequence
$ ./revComp ATG
CAT

The code now gives an uses message and exits

Note: Quoting variables inside a conditional

Why do we use

[ -z "$1" ]

instead of

[ -z $1 ]

The answer is an issue of best practice. In a loosely-typed language such as bash, the programmer has to enforce certain constraints. This practice is taught in this class, but is best learned by experience. Quote your variables in comparisons so that they do not evaluate to a syntax error.

Expanding function: check for valid input

What do we do about the other problem, of getting a non-DNA sequence? This may not seem like a very likely problem, but it will be a useful illustration of how to check the input and respond.

We can build on the tools we've used so far to filter the input sequence.

#!/bin/bash
 
#### check for no arguments ####
if [ -z "$1" ];
then
    echo "revComp -- Return the reverse complement of a DNA sequence."
    echo "Usage: revComp sequence"
    exit
fi
 
sequence=$1
 
#### check for invalid input ####
filtered=$(echo $sequence | tr -d [ACTGactg])
if [ -n "$filtered" ]
then
    echo "Invalid characters detected in '$sequence' = '$filtered'" >&2
    exit 1
fi
 
#### take the reverse complement ####
echo $sequence | tr [ACTGactg] [TGACtgac] | rev

New concepts

I used some new features to accomplish this last task:

  • Use tr to delete the characters with -d
  • Store the result of a command

And I did some unusual things to respond to the error:

  • Output to stderr
  • Exit status of 1

I used tr to delete the valid characters, leaving only invalid ones. By storing the result of this command, I was able to use it later.

filtered=$(tr -d [ACGTacgt])

An alternate formulation is to use the backquote `, but this version is less portable.

filtered=`tr -d [ACGTacgt]`

As always, there can be no spaces surround the equal sign in the assignment.

Let's try it out:

$ revComp CACx
Invalid characters detected in 'CACx' = 'x'
$ revComp AAAeCCGTGAwGTGGAwGcgtwAC!
Invalid characters detected in 'AAAeCCGTGAwGTGGAwGcgtwAC!' = 'ewww!'

Error reporting

Standard Error

Conventionally, messages that are intended to be separate from normal output are printed to standard error.

On the command line, you can do (use single quotes):

$ echo 'BAD!' >&2
BAD!

It looks the same as:

$ echo 'BAD!'

This is so error messages don't get printed into the normal program output.

Exit Status

The exit status of a program is used by the shell to see if a job failed. We used exit once before without arguments, but it defaulted to 0. Anything greater than 0 signifies a non-standard program completion, including error.

You can use a hidden variable $? to see the exit status of a program.

$ echo "hi"
$ echo $?
0

0 means OK.

$ asdfasdf
-bash: asdfasdf: command not found
$ echo $?
127

127 means command not found.

1 refers to miscellanous errors, which is how we ended our script when the input was wrong.

In pipelines, you aren't interested in the specific error code if there is one. It only makes sense to check to see that the exit status of a command is ZERO (or NOT ZERO), to see if you should continue.

if [ $? -eq 0 ]; then echo 'OK!'; else echo 'error!'; fi 

Or alternatively,

if [ $? -ne 0 ]; then echo 'error!'; else echo 'OK'; fi 

Next lesson Loops and More Conditionals πŸ™‰

wiki/2018functional_revcomp.txt Β· Last modified: 2018/09/06 10:37 by david