Let's download a few extra files!

Three other files we'll need to run jobs are:

  1. whole genome fasta file
  2. chromosome sizes file
  3. annotation file

Let's go ahead and download these in a directory one level up from your by_chrom directory. So you should be in the PROJ02_yeastGenome directory:

$ pwd # should be ~/DSCI512_RNAseq/PROJ02_yeastGenome

Get the whole genome fasta file and chromosome sizes file

  • Navigate to the UCSC Yeast genome page again:

  • Under Apr. 2011 (SacCer_Apr2011/sacCer3), click on Full data set
  • rsync the files chromFa.tar.gz and sacCer3.chrom.sizes and md5sum.txt
$ pwd # should be ~/DSCI512_RNAseq/PROJ02_yeastGenome
$ rsync -avzP rsync:// .
$ rsync -avzP rsync:// .
$ rsync -avzP rsync:// .
$ md5sum chromFa.tar.gz
  • Check the md5 sums

Get the annotation file

You'll recall from DSCI510/LINUX that genomes are annotated and all the features are stored in either GTF or GFF files. We'll need to download a GTF file for the yeast genome:

  • Under Tools, select Table Browser
    • clade: Other
    • genome: S. cerevisiae
    • assembly: Apr. 2011 (SacCer_Apr2011/sacCer3)
    • group: Genes & Gene Predictions
    • track: ensemble genes
    • table: ensgenes
    • output format: gtf
    • output file: 181115_Scer_annotation.gtf
    • file type returned: gzip compressed
  • Download the file

It should look like this:

Assignment 3

Extra Self Study: Syncing

