Document the exercise in your electronic lab notebook.
NOTE: text in quotes within commands must be changed based on your directory and naming of files.
1. Open a terminal window.
2. Change into the
3. There are two small RNA libraries: 1)
wt_fastq.txt.gz and 2)
prg-1_fastq.txt.gz. Examine the contents of the two files using
4. Decompress the files using
5. Examine the fastq files in
FastQC. How many reads are there in each library and what is the read length? Record this information in your lab notebook.
Trimmomatic to remove adapter sequences and filter poor quality reads and reads shorter than 16 nt:
$ trimmomatic SE 'input_file' 'output_file' ILLUMINACLIP:/Users/bz360/Documents/Trimmomatic_Files/TruSeq-smallRNA.fa:2:30:10 MINLEN:16 AVGQUAL:30
What proportion of reads were dropped?
7. Examine the contents of the two original files and the two new files using
more. What is different between the trimmed and original files?
8. Index the C. elegans genome for Bowtie2 (notice that a fasta formatted file containing the genome sequence is in the
$ bowtie2-build 'genome_input_file.fa' 'genome_name'
9. Create a new folder within the
prg1 directory called
mkdir. Move all the files related to the bowtie index and the fasta formatted genome sequence into the
bowtie_cel folder using
$ mv c* bowtie_cel
10. Map reads from each adapter-trimmed library (wt and prg-1 from step 6) to the C. elegans genome using
Bowtie2. While waiting on bowtie, if you haven't done so already, examine the original and trimmed fastq files in
$ bowtie2 -x 'path_to_bowtie_index/prefix' -U 'fastq_file_name' -S 'sampleID.sam'
NOTE: record the total number of reads and total mapped reads for each library in your lab notebook:
Total Reads wt:
Total Reads prg-1:
Total Mapped Reads wt:
Total Mapped Reads prg-1:
Submit your answer to the following question on Canvas:
What proportion of reads in each library was mappable?