User Tools

Site Tools


assignments:ex10

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
assignments:ex10 [2017/10/27 13:34]
dokuroot created
assignments:ex10 [2018/11/05 15:14] (current)
dokuroot
Line 4: Line 4:
 ---- ----
  
-===== Small RNA Data Analysis Part 1: quality control and read mapping =====+===== Small RNA data analysis (part 1): quality control and read mapping =====
  
 \\ Document the exercise in your electronic lab notebook. ​ \\ \\ Document the exercise in your electronic lab notebook. ​ \\
Line 10: Line 10:
 NOTE: text in quotes within commands must be changed based on your directory and naming of files. \\ \\ NOTE: text in quotes within commands must be changed based on your directory and naming of files. \\ \\
  
-**1.**  ​Download the small RNA data from the montgomery lab server using Cyberduck or FileZillaThe path to the folder containing the data is as follows:+**1.**  ​Open a terminal window\\
  
-  ​Documents/SmallRNA_Data/prg1+**2.** Change into the ''​prg1''​ directory: ''/​Users/​bz360/​Documents/SmallRNA_data/prg1''​.\\
  
-**2.** Change into the **prg1** directory.\\ +**3.** There are two small RNA libraries: 1) ''​wt_fastq.txt.gz'' ​and 2) ''​prg-1_fastq.txt.gz''​. Examine the contents of the two files using ''​zmore''​. \\
- +
-**3.** There are two small RNA libraries: 1) **wt_fastq.txt.gz** and 2) **prg-1_fastq.txt.gz**. Examine the contents of the two files using ''​zmore''​. \\+
  
 **4.** Decompress the files using ''​gzip''​. \\ **4.** Decompress the files using ''​gzip''​. \\
Line 22: Line 20:
 **5.** Examine the fastq files in ''​FastQC''​. How many reads are there in each library and what is the read length? Record this information in your lab notebook. \\ **5.** Examine the fastq files in ''​FastQC''​. How many reads are there in each library and what is the read length? Record this information in your lab notebook. \\
  
-**6.** Use ''​Trimmomatic''​ to remove adapter sequences and reads shorter than 16 nt:+**6.** Use ''​Trimmomatic''​ to remove adapter sequences ​and filter poor quality reads and reads shorter than 16 nt:
  
-  $ trimmomatic SE -phred33 ​'​input_file'​ '​output_file'​ ILLUMINACLIP:/​Users/​bz360/​Documents/​Trimmomatic_Files/​TruSeq-smallRNA.fa:​2:​30:​10 MINLEN:16+  $ trimmomatic SE '​input_file'​ '​output_file'​ ILLUMINACLIP:/​Users/​bz360/​Documents/​Trimmomatic_Files/​TruSeq-smallRNA.fa:​2:​30:​10 MINLEN:​16 ​AVGQUAL:30
  
 What proportion of reads were dropped? What proportion of reads were dropped?
  
-**7.** Examine the contents of the two original files and the two new files using ''​more''​. What is different ​from the original files? \\+**7.** Examine the contents of the two original files and the two new files using ''​more''​. What is different ​between ​the trimmed and original files? \\
  
-**8.** Examine the trimmed fastq files in FastQC. Now how many reads are there in each library and what is the read length? Record this information in your lab notebook. \\ +**8.** Index the C. elegans genome for Bowtie2 (notice that a fasta formatted file containing the genome sequence is in the ''​prg1'' ​folder):
- +
-**9.** Index the C. elegans genome for Bowtie2 (notice that a fasta formatted file containing the genome sequence is in the **prg1** folder):+
  
   $ bowtie2-build '​genome_input_file.fa'​ '​genome_name'​   $ bowtie2-build '​genome_input_file.fa'​ '​genome_name'​
  
-**10.** Create a new folder within the **prg1** directory called ​**bowtie_cel** using ''​mkdir''​. Move all the files related to the bowtie index and the fasta formatted genome sequence into the **bowtie_cel** folder using ''​mv'':​+**9.** Create a new folder within the ''​prg1'' ​directory called ​''​bowtie_cel'' ​using ''​mkdir''​. Move all the files related to the bowtie index and the fasta formatted genome sequence into the ''​bowtie_cel'' ​folder using ''​mv'':​
  
   $ mv c* bowtie_cel   $ mv c* bowtie_cel
  
-**11.** Map reads from each adapter-trimmed library (wt and prg-1 from step 6) to the C. elegans genome using ''​Bowtie2''​. While waiting on bowtie, if you haven'​t done so already, examine the original and trimmed fastq files in ''​FastQC''​.+**10.** Map reads from each adapter-trimmed library (wt and prg-1 from step 6) to the C. elegans genome using ''​Bowtie2''​. While waiting on bowtie, if you haven'​t done so already, examine the original and trimmed fastq files in ''​FastQC''​.
  
-  $ bowtie2 -x '​path_to_bowtie_index/​prefix'​ -U '​fastq_file_name'​ -S 'strain.sam'+  $ bowtie2 -x '​path_to_bowtie_index/​prefix'​ -U '​fastq_file_name'​ -S 'sampleID.sam'
    
-NOTE: record the total number of reads and total mapped reads for each library in your lab notebook.\\ +NOTE: record the total number of reads and total mapped reads for each library in your lab notebook
-  +\\ 
->TOTAL READS     TOTAL MAPPED READS +Total Reads wt: 
->​wt: ​  ​prg-1: ​  ​wt:   ​prg-1:+> Total Reads prg-1: 
 +> Total Mapped Reads wt: 
 +> Total Mapped Reads prg-1:
  
 **Submit your answer to the following question on Canvas:** \\ **Submit your answer to the following question on Canvas:** \\
Line 54: Line 52:
  
    
-~~NOTOC~~ ​ 
-====== Exercise 11 ====== 
- 
----- 
- 
-===== Small RNA Data Analysis Part 2: obtaining read counts ===== 
- 
- 
-**1.** Convert the wt and prg-1 alignment files generated by Bowtie in step 10 above to bam format (binary format) using ''​SAMtools'':​ 
- 
-  $ samtools view -bS -o '​output_file.bam'​ '​input_file.sam'​ 
- 
-**2.** Sort and index Bam files: 
- 
-  $ samtools sort '​input_file.bam'​ -o '​output_file.sorted.bam'​ 
- 
-  $ samtools index '​input_file.sorted.bam'​ 
- 
-**3.** Obtain small RNA read counts from each of several miRNA, siRNA, and piRNA loci using ''​SAMtools'':​ 
- 
-  $ samtools view -c '​input_file.sorted.bam'​ '​chr:​start-end'​ 
- 
-This is best accomplished by writing a bash script to get reads for each set of coordinates and could easily be expanded to capture every small RNA locus in the genome. A file containing the coordinates of several small RNAs, **smallRNA_Coordinates.xlsx**,​ is in the **prg1** directory. \\ 
- 
-Copy the results into the Excel document and use ''​Excel'​ for performing the normalization in step 4. \\ 
- 
-**4.** Normalize small RNA reads based on library size (use the value from bowtie, see step 11 from Exercise 10): divide the number of reads for small RNA locus by the total number of reads in millions in each library (reads per million total small RNA reads - RPM). \\ 
- 
-**5.** Plot the normalized data for each small RNA in ''​Excel''​. \\  
- 
-**Submit your answer to the following question on Canvas:** \\ \\ 
- 
-Which classes of small RNAs are depleted in prg-1 mutants? ​ Which small RNA pathway does prg-1 likely function in? 
- 
- 
-====== Exercise 12 ====== 
- 
----- 
- 
-===== Small RNA Data Analysis Part 3: data visualization ===== 
- 
-**1.** Open ''​IGV''​ and select the C. elegans genome WS220 from the dropdown men (top left, select more to see additional genomes available). ​ \\ 
- 
-**2.** Reformat the wild type and prg-1 mutant bam files to tdf format using ''​igv tools'':​ 
- 
-  Tools > Run igvtools 
-  Under Command, select Count and Run 
-  Under Input file, select the sorted.bam file 
-  Close igv tools when complete 
- 
-**3.** Load the two tdf formatted files into ''​IGV'':​ 
- 
-  File > Load from File 
-  Select the tdf file that contains sequencing data created in step 3 above 
-  Right click on the plot to modify the parameters - change track height, track color, and data range. ​ 
- 
-**5.** Enter the chromosome coordinates for each miRNA, siRNA, and piRNA gene and examine how the small RNA reads differ between wild type and prg-1. \\ 
- 
-**6.** Go to the wormbase website and search for prg-1. ​ What family of genes prg-1 part of and what is its function? ​ Are the results consistent with the function of prg-1? \\ 
- 
-**Submit your answer to the following question on Canvas:** \\ 
- 
-What were the six major steps in our analysis of the small RNA data in exercises 10-12, starting with '​Quality control using FastQC'?​ 
assignments/ex10.1509132841.txt.gz · Last modified: 2017/10/27 13:34 by dokuroot