User Tools

Site Tools


This is an old revision of the document!

Exercise 10

Small RNA Data Analysis Part 1: quality control and read mapping

Document the exercise in your electronic lab notebook.

NOTE: text in quotes within commands must be changed based on your directory and naming of files.

1. Download the small RNA data from the montgomery lab server using Cyberduck or FileZilla. The path to the folder containing the data is as follows:


2. Change into the prg1 directory.

3. There are two small RNA libraries: 1) wt_fastq.txt.gz and 2) prg-1_fastq.txt.gz. Examine the contents of the two files using zmore.

4. Decompress the files using gzip.

5. Examine the fastq files in FastQC. How many reads are there in each library and what is the read length? Record this information in your lab notebook.

6. Use Trimmomatic to remove adapter sequences and reads shorter than 16 nt:

$ trimmomatic SE -phred33 'input_file' 'output_file' ILLUMINACLIP:/Users/bz360/Documents/Trimmomatic_Files/TruSeq-smallRNA.fa:2:30:10 MINLEN:16

What proportion of reads were dropped?

7. Examine the contents of the two original files and the two new files using more. What is different from the original files?

8. Examine the trimmed fastq files in FastQC. Now how many reads are there in each library and what is the read length? Record this information in your lab notebook.

9. Index the C. elegans genome for Bowtie2 (notice that a fasta formatted file containing the genome sequence is in the prg1 folder):

$ bowtie2-build 'genome_input_file.fa' 'genome_name'

10. Create a new folder within the prg1 directory called bowtie_cel using mkdir. Move all the files related to the bowtie index and the fasta formatted genome sequence into the bowtie_cel folder using mv:

$ mv c* bowtie_cel

11. Map reads from each adapter-trimmed library (wt and prg-1 from step 6) to the C. elegans genome using Bowtie2. While waiting on bowtie, if you haven't done so already, examine the original and trimmed fastq files in FastQC.

$ bowtie2 -x 'path_to_bowtie_index/prefix' -U 'fastq_file_name' -S 'strain.sam'

NOTE: record the total number of reads and total mapped reads for each library in your lab notebook.


wt: prg-1: wt: prg-1:

Submit your answer to the following question on Canvas:

What proportion of reads in each library was mappable?

Exercise 11

Small RNA Data Analysis Part 2: obtaining read counts

1. Convert the wt and prg-1 alignment files generated by Bowtie in step 10 above to bam format (binary format) using SAMtools:

$ samtools view -bS -o 'output_file.bam' 'input_file.sam'

2. Sort and index Bam files:

$ samtools sort 'input_file.bam' -o 'output_file.sorted.bam'
$ samtools index 'input_file.sorted.bam'

3. Obtain small RNA read counts from each of several miRNA, siRNA, and piRNA loci using SAMtools:

$ samtools view -c 'input_file.sorted.bam' 'chr:start-end'

This is best accomplished by writing a bash script to get reads for each set of coordinates and could easily be expanded to capture every small RNA locus in the genome. A file containing the coordinates of several small RNAs, smallRNA_Coordinates.xlsx, is in the prg1 directory.

Copy the results into the Excel document and use Excel' for performing the normalization in step 4.
4. Normalize small RNA reads based on library size (use the value from bowtie, see step 11 from Exercise 10): divide the number of reads for small RNA locus by the total number of reads in millions in each library (reads per million total small RNA reads - RPM).
5. Plot the normalized data for each small RNA in
Submit your answer to the following question on Canvas:

Which classes of small RNAs are depleted in prg-1 mutants? Which small RNA pathway does prg-1 likely function in? ====== Exercise 12 ====== —- ===== Small RNA Data Analysis Part 3: data visualization ===== 1. Open
IGV and select the C. elegans genome WS220 from the dropdown men (top left, select more to see additional genomes available).
2. Reformat the wild type and prg-1 mutant bam files to tdf format using
igv tools: Tools > Run igvtools Under Command, select Count and Run Under Input file, select the sorted.bam file Close igv tools when complete 3. Load the two tdf formatted files into IGV'':

File > Load from File
Select the tdf file that contains sequencing data created in step 3 above
Right click on the plot to modify the parameters - change track height, track color, and data range. 

5. Enter the chromosome coordinates for each miRNA, siRNA, and piRNA gene and examine how the small RNA reads differ between wild type and prg-1.

6. Go to the wormbase website and search for prg-1. What family of genes prg-1 part of and what is its function? Are the results consistent with the function of prg-1?

Submit your answer to the following question on Canvas:

What were the six major steps in our analysis of the small RNA data in exercises 10-12, starting with 'Quality control using FastQC'?

assignments/ex10.1509132841.txt.gz · Last modified: 2017/10/27 13:34 by dokuroot