The purpose of this exercise is to introduce tools for analyzing differential gene expression in RNA-seq data. You will analyze RNA-seq data from human reference (a mix of tissues) and brain tissue to identify genes for which expression is enriched in the brain. Due to time constraints, the data we will analyze is a subset (chr22) of a larger RNA-seq dataset.
1. Identify genes differentially regulated between the reference and brain tissue samples using
output_foldername such as
accepted_hits.bamoutput files from TopHat. If you are in the RNA-seq_Data directory, the paths to these files are as follows:
$ cuffdiff -p 8 -o 'output_folder' -L ref,brain 'path_to_gtf' 'path_to_tophat_output_library1_replicate1'/accepted_hits.bam,'path_to_tophat_output_library1_replicate2'/accepted_hits.bam,'path_to_tophat_output_library1_replicate3'/accepted_hits.bam \'path_to_tophat_output_library2_replicate1'/accepted_hits.bam,'path_to_tophat_output_library2_replicate2'/accepted_hits.bam,'path_to_tophat_output_library2_replicate3'/accepted_hits.bam
NOTE: There are no spaces between the label names (i.e. ref and brain).
Several output files are generated. Explore these on your own. The
gene_exp.diff file contains a summary of differential gene expression.
2. Identify which genes are enriched or depleted in brain tissue:
gene_exp.difffile from step 13 using Excel.
Submit an answer to the following question on Canvas for Exercise 16:
How many genes are significantly different between the brain and reference tissue?
1. Open the Integrative Genome Viewer.
2. Select 'more' from the toolbar (top left) and select from the popup menu 'hg38 human genome'.
accepted_hits.bam file generated by TopHat and convert to
tdf format using IGV tools:
Input file, select one of the
accepted_hits.bam files and select
Repeat for each for each of the
Close IGV tools when complete.
4. Load each of the tdf formatted files into IGV:
Load from File.
Select the tdf file that contains sequencing data.
Right click on each plot to modify the parameters (e.g. Change Track Height, Autoscale, Track Color etc).
Repeat for each of the tdf formatted RNA-seq files.
5. Examine the genes in IGV that were differentially regulated based on the cuffdiff analysis:
To view all the data, in the search box that says 'go' next to it, enter
In the same search box, enter the ID of a gene that is significantly different in brain tissue to zoom in on that region of the genome.
Repeat for several genes.
See the IGV website for additional information: http://software.broadinstitute.org/software/igv/
1. Open RStudio.
2. Load the CummeRbund package:
3. Create a CummeRbund database:
data <- readCufflinks('path_to_cuffdiff_output')
4. Draw a volcano plot displaying brain and reference tissues:
csVolcano(genes(data), 'ref', 'brain', alpha=0.05, showSignificant=T)