User Tools

Site Tools



RNA-seq Data Analysis: part 2

The purpose of this exercise is to introduce tools for analyzing differential gene expression in RNA-seq data. You will analyze RNA-seq data from human reference (a mix of tissues) and brain tissue to identify genes for which expression is enriched in the brain. Due to time constraints, the data we will analyze is a subset (chr22) of a larger RNA-seq dataset.

Read mapping

1. Move the 6 bowtie index files and the genome sequence file to a new folder called chr22.

1. Change the name of the genome sequence file to chr22.fa.

2. Align sequences from each of the libraries to the human genome using TopHat2 (you will run tophat 6 times in total):

$ tophat -p 8 -G 'path_to_genome_annotations.gtf' -o 'output_folder' 'path_to_bowtie_index_for_reference_genome/prefix' 'fastq_file_paired_1_1','fastq_file_paired_1_2','fastq_file_unpaired_1_1','fastq_file_unpaired_1_2'
  • NOTE: There are no spaces between the fastq file names.
  • The directory containing the fastq files should be the current working directory.
  • Name the output folders as follows: ref1, ref2, ref3, brain1, brain2, brain3

See the TopHat manual for additional details:

3. Determine what proportion of the reads from each library were aligned:

Use the UNIX more command to open each TopHat summary file in the terminal. The TopHat summary files are named align_summary.txt and are located in the output folder specified in step 5.

$ more ./ref1/align_summary.txt

Identify the read mapping rate for each library and submit the results on Canvas as Exercise 15.

assignments/ex14.txt · Last modified: 2018/11/29 09:30 by dokuroot