The purpose of this exercise is to introduce tools for analyzing differential gene expression in RNA-seq data. You will analyze RNA-seq data from human reference (a mix of tissues) and brain tissue to identify genes for which expression is enriched in the brain. Due to time constraints, the data we will analyze is a subset (chr22) of a larger RNA-seq dataset.
1. Move the 6 bowtie index files and the genome sequence file to a new folder called
1. Change the name of the genome sequence file to
2. Align sequences from each of the libraries to the human genome using
TopHat2 (you will run tophat 6 times in total):
$ tophat -p 8 -G 'path_to_genome_annotations.gtf' -o 'output_folder' 'path_to_bowtie_index_for_reference_genome/prefix' 'fastq_file_paired_1_1','fastq_file_paired_1_2','fastq_file_unpaired_1_1','fastq_file_unpaired_1_2'
See the TopHat manual for additional details:https://ccb.jhu.edu/software/tophat/manual.shtml
3. Determine what proportion of the reads from each library were aligned:
Use the UNIX
more command to open each TopHat summary file in the terminal. The TopHat summary files are named
align_summary.txt and are located in the output folder specified in step 5.
$ more ./ref1/align_summary.txt etc.
Identify the read mapping rate for each library and submit the results on Canvas as Exercise 15.