Exercise 10

Small RNA data analysis (part 1): quality control and read mapping

Document the exercise in your electronic lab notebook.

NOTE: text in quotes within commands must be changed based on your directory and naming of files.

1. Open a terminal window.

2. Change into the prg1 directory: /Users/bz360/Documents/SmallRNA_data/prg1.

3. There are two small RNA libraries: 1) wt_fastq.txt.gz and 2) prg-1_fastq.txt.gz. Examine the contents of the two files using zmore.

4. Decompress the files using gzip.

5. Examine the fastq files in FastQC. How many reads are there in each library and what is the read length? Record this information in your lab notebook.

6. Use Trimmomatic to remove adapter sequences and filter poor quality reads and reads shorter than 16 nt:

$ trimmomatic SE 'input_file' 'output_file' ILLUMINACLIP:/Users/bz360/Documents/Trimmomatic_Files/TruSeq-smallRNA.fa:2:30:10 MINLEN:16 AVGQUAL:30

What proportion of reads were dropped?

7. Examine the contents of the two original files and the two new files using more. What is different between the trimmed and original files?

8. Index the C. elegans genome for Bowtie2 (notice that a fasta formatted file containing the genome sequence is in the prg1 folder):

$ bowtie2-build 'genome_input_file.fa' 'genome_name'

9. Create a new folder within the prg1 directory called bowtie_cel using mkdir. Move all the files related to the bowtie index and the fasta formatted genome sequence into the bowtie_cel folder using mv:

$ mv c* bowtie_cel

10. Map reads from each adapter-trimmed library (wt and prg-1 from step 6) to the C. elegans genome using Bowtie2. While waiting on bowtie, if you haven't done so already, examine the original and trimmed fastq files in FastQC.

$ bowtie2 -x 'path_to_bowtie_index/prefix' -U 'fastq_file_name' -S 'sampleID.sam'

NOTE: record the total number of reads and total mapped reads for each library in your lab notebook:

Total Reads wt:
Total Reads prg-1:
Total Mapped Reads wt:
Total Mapped Reads prg-1:

Submit your answer to the following question on Canvas:

What proportion of reads in each library was mappable?

