Our goal … to take sequenced samples from an RNA-seq experiment and process those sequences so we can determine the answers to biological questions.

Our strategy … to use a series of computational algorithms to align the reads to the genome, tabulate the number of reads associated with each gene, and use statistical methods to determine significantly differentially abundant transcripts.

Here is our strategy map…

Today, we're going to talk about aligning the reads to a reference genome. For today's purposes, we can focus on just one branch of this process here…

Example Data

The dataset we will use is:

RNA-seq-based transcriptomic and metabolomic analysis reveal stress responses and programmed cell death induced by acetic acid in Saccharomyces cerevisiae. Dong Y, Hu J, Fan L, and Chen Q. (2016) Scientific Reports. 7:4. DOI: 10.1038/srep42659.


Here is the structure of the data:

Building HISAT2 genome indices

