Assignment 3

Due Sept 4, 2018

Compile your answers in a .txt document

Turn your answers by uploading your .txt document to CANVAS

:!: HINT: If the question asks for a command, write the full command as you would write it on the command line.

:!: HINT: You don't need to include the question in your write-up, just the answer.

Question 1

  • Let's download the C. elegans genome. Create a directory called celegans and navigate into it.
  • Download the ce10 version of the C. elegans genome from UCSC Genome browser with one of the following commands:
$ rsync -avzP rsync:// .
$ wget --timestamping '*'
  • Not working? Click here for more help
  • A. What is the md5sum you obtain for the file chrI.fa.gz?
  • B. What command would you execute to decompress all the .fa.gz files (in one command)? execute the command
  • C. Now what is the md5sum for the expanded file chrI.fa?

Question 2

If you did Exercise 1 correctly, you should have a directory containing individual fasta files for each C. elegans chromosome. Now, let's merge these individual chromosome files into one large genome fasta file.

  • A. What command would you execute to concatenate all the fasta files into a genome fasta file called celegans_genome.fa? execute the command
  • B. Let's double check that the file celegans_genome.fa contains seven concatenated chromosomes. What (piped) set of commands would you execute to check that the file is composed of seven chromosomes?

Question 3

  • Use the file you made in Exercise 2, celegans_genome.fa. Assume that the file blerg.jpg doesn't exist. What will be saved in the file output.txt when you execute the following commands?:
A  $ wc celegans_genome.fa blerg.jpg > output.txt
B  $ wc celegans_genome.fa blerg.jpg 2> output.txt
C  $ wc celegans_genome.fa blerg.jpg &> output.txt

Question 4

  • Say your directory has the following contents:
$ ls -1
  • Explain what each step of the following piped command chain does:

#This one is for MAC people:

$ md5 *.fa | tail -n 7 | cut -d ' ' -f 4 

#This one is for WINDOWS people:

$ md5sum-lite *.fa | tail -n 7 | cut -d ' ' -f 1

Question 5

  • Your collaborator wants to write a single command line of code that outputs the line number of the file celegans_genome.fa that contains the annotation line for chrII. However, she erroneously comes up with this output…
  • She accidentally captured chromosome III's line number and annotation line as well.
  • What command line should she execute so she only receives chrII as output and still retains the line numbers?

Fun Stuff

What's wrong with wc?

  • Use wc to save a file containing word count information for one file.
  • Try to use cut to parse the word count information (in your saved file) into lines, words, and characters. It doesn't work.
  • Open the file in your text editor. Can you figure out why you couldn't parse columns using cut?
