User Tools

Site Tools


assignments:ex9

Exercise 9


Map High-Throughput Sequencing Data to a Reference Genome

Map high-throughput sequencing data to the E. coli genome using bowtie2.

Document the exercise in your electronic lab notebook. The notebook entry should have the following format:

# Date

# General description of experiment.

# Description of step 1.

commands

# Results

# Description of individual step 2.

commands

# Results
etc.


Quality Control

We will use FastQC to assess the quality of our data.

1. Download high-throughput sequencing data from an E. coli genome sequencing experiment. The data is on the montgomery lab server. Use Cyberduck or FileZilla to transfer the 'e_coli' directory to your computer. * This step should have been completed last week.

2. Open a terminal window.

3. Change into the 'xbowtie_data' directory within the 'e_coli' directory you downloaded.

4. Open each of the files in FastQC. How many reads are there in total between the two fastq files? How long are the reads? Record this information in your lab notebook.

Mapping

We will use Bowtie2 (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) to map reads to an E. coli reference genome.

1. The first step in doing a bowtie alignment is to obtain or create a bowtie index.

a. Download the E. coli reference genome from the rna server (/home/student/ecoli.fa) into the 'e_coli folder' in your 'Documents' folder.

b. Open a terminal window and change into the 'e_coli' folder.

c. Make a new directory name 'ecoli_bowtie' and change into it.

d. Build a bowtie index:

$ bowtie2-build -t number_of_threads path_to_ecoli.fa ecoli

2. Change into the directory containing the fastq files '/Users/bz360/Desktop/e_coli_xbowtie_data' and run bowtie2 to map reads using the fastq files in the folder and the bowtie index from step 1 (run time ~15 min):

$ bowtie2 -p number_of_threads -x path_to_index/prefix_of_index -1 fastq_file_mate1 -2 fastq_file_mate1  -S output_file.sam

* Record the number of reads that mapped in your lab notebook.

3. Use samtools (http://samtools.sourceforge.net) to convert the sam formatted file to binary bam format:

$ samtools view -bS -o output_file.bam input_file.sam -@ number_of_threads

4. Use samtools to sort the bam file and create an index file:

$ samtools sort input_file.bam -o output_file.sorted.bam -@ number_of_threads
$ samtools index input_file.sorted.bam


Visualizing Data

We will use the genome browser software IGV to visualize the alignment data from above.

1. Open the Integrative Genome Viewer (http://www.broadinstitute.org/igv/).

2. Select the E coli. genome to load from the dropdown menu (U00096.2, top left).

3. Reformat sorted bam file to tdf format using igv tools:

Tools > Run igvtools
Under Command, select Count
Under Input file, select the sorted bam file
Close igv tools when complete

4. Load the tdf formatted file:

File > Load from File
Select the tdf file that contains sequencing data
Right click on the plot to modify the parameters


Submit your answers to the following questions on Canvas for grading:

1. Based on the FastQC analysis, how many reads did the library contain (one fastq file contains half the reads)? How long were the reads?

2. Based on the Bowtie alignment, what proportion of the reads aligned to the E. coli genome?

assignments/ex9.txt · Last modified: 2018/10/11 13:20 by dokuroot