User Tools

Site Tools


RNA-seq Data Analysis: part 3

The purpose of this exercise is to introduce tools for analyzing differential gene expression in RNA-seq data. You will analyze RNA-seq data from human reference (a mix of tissues) and brain tissue to identify genes for which expression is enriched in the brain. Due to time constraints, the data we will analyze is a subset (chr22) of a larger RNA-seq dataset.

Differential gene expression analysis

1. Identify genes differentially regulated between the reference and brain tissue samples using cuffdiff:

  • Provide an output_folder name such as cuffdiff_output
  • You will compare the three reference libraries to the three brain libraries. Cufflinks uses the accepted_hits.bam output files from TopHat. If you are in the RNA-seq_Data directory, the paths to these files are as follows:
    • ./ref1/accepted_hits.bam
    • ./ref2/accepted_hits.bam
    • etc
$ cuffdiff -p 8 -o 'output_folder' -L ref,brain 'path_to_gtf' 'path_to_tophat_output_library1_replicate1'/accepted_hits.bam,'path_to_tophat_output_library1_replicate2'/accepted_hits.bam,'path_to_tophat_output_library1_replicate3'/accepted_hits.bam \'path_to_tophat_output_library2_replicate1'/accepted_hits.bam,'path_to_tophat_output_library2_replicate2'/accepted_hits.bam,'path_to_tophat_output_library2_replicate3'/accepted_hits.bam

NOTE: There are no spaces between the label names (i.e. ref and brain).

Several output files are generated. Explore these on your own. The gene_exp.diff file contains a summary of differential gene expression.

2. Identify which genes are enriched or depleted in brain tissue:

  • Open the gene_exp.diff file from step 13 using Excel.
  • Reverse sort the data in Excel based on significance.

Submit an answer to the following question on Canvas for Exercise 16:
How many genes are significantly different between the brain and reference tissue?

Visualizing data with IGV

1. Open the Integrative Genome Viewer.

2. Select 'more' from the toolbar (top left) and select from the popup menu 'hg38 human genome'.

3. Reformat accepted_hits.bam file generated by TopHat and convert to tdf format using IGV tools:

Tools > Run igvtools Under Command, select Count

Under Input file, select one of the accepted_hits.bam files and select Run

Repeat for each for each of the accepted_hits.bam files.

Close IGV tools when complete.

4. Load each of the tdf formatted files into IGV:

File > Load from File.
Select the tdf file that contains sequencing data.

Right click on each plot to modify the parameters (e.g. Change Track Height, Autoscale, Track Color etc).

Repeat for each of the tdf formatted RNA-seq files.

5. Examine the genes in IGV that were differentially regulated based on the cuffdiff analysis:

To view all the data, in the search box that says 'go' next to it, enter chr22.

In the same search box, enter the ID of a gene that is significantly different in brain tissue to zoom in on that region of the genome.

Repeat for several genes.

See the IGV website for additional information:

Plotting data in R

1. Open RStudio.

2. Load the CummeRbund package:


3. Create a CummeRbund database:

data <- readCufflinks('path_to_cuffdiff_output')

4. Draw a volcano plot displaying brain and reference tissues:

csVolcano(genes(data), 'ref', 'brain', alpha=0.05, showSignificant=T)
assignments/ex15.txt · Last modified: 2018/12/03 13:23 by dokuroot