HINT: If the question asks for a command, write the full command as you would write it on the command line.
HINT: You don't need to include the question in your write-up, just the answer.
9/7/20 - update Question 5B should be : When you execute
wc ce11_CDS.bed what is the result?If you already turned it in the other way, not a problem.
celegansand navigate into it.
$ rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/ce11/chromosomes/ . OR $ wget --timestamping 'ftp://hgdownload.cse.ucsc.edu/goldenPath/ce11/chromosomes/*'
If you did Exercise 1 correctly, you should have a directory containing individual fasta files for each C. elegans chromosome. Now, let's merge these individual chromosome files into one large genome fasta file.
celegans_genome.fa? execute the command
celegans_genome.facontains seven concatenated chromosomes. What (piped) set of commands would you execute to check that the file is composed of seven chromosomes?
celegans_genome.fa. Assume that the file
blerg.jpgdoesn't exist. What will be saved in the file
output.txtwhen you execute the following commands?:
A $ wc celegans_genome.fa blerg.jpg > output.txt B $ wc celegans_genome.fa blerg.jpg 2> output.txt C $ wc celegans_genome.fa blerg.jpg &> output.txt
$ ls -1 README.txt celegans_genome.fa chrI.fa chrII.fa chrIII.fa chrIV.fa chrM.fa chrV.fa chrX.fa md5sum.txt
#This one is for MAC people:
$ md5 *.fa | tail -n 7 | cut -d ' ' -f 4
#This one is for WINDOWS people:
$ md5sum-lite *.fa | tail -n 7 | cut -d ' ' -f 1
Let's make a bed file. Bed files are long lists of genome features in which each row in the file corresponds to a genomic region. The first column of each row lists the chromosome, the second column lists the start site, and the third row lists the stop site. The columns are tab delimited.
Download a C. elegans gtf file using the following command:
$ wget 'http://22.214.171.124:34/Pangea-Web/onishlab/dsci510/ce11_annotation_ensembl_to_ucsc.gtf.gz'
Just download it here: ce11_annotation_ensembl_to_ucsc.gtf.gz
Create a .bed file called
ce11_CDS.bed for the genome locations OF JUST THE CODING SEQUENCES (listed as CDS in column 3 of the GTF file). Your .bed file should look like this if you peek into it using head:
$ $ head ce11_CDS.bed chrV 1480 1579 chrV 1691 1782 chrV 2851 3036 chrV 5690 5966 chrV 6024 6508 chrV 7651 7818 chrV 7433 7609 chrV 7158 7384 chrV 6939 7110 chrV 7651 7818
A. What piped command line did you use to generate ''ce11_CDS.bed''? B. When you execute ''wc ce11_CDS.bed'' what is the result?
wcto save a file containing word count information for one file.
cutto parse the word count information (in your saved file) into lines, words, and characters. It doesn't work.