User Tools

Site Tools


assignments:ex13

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
assignments:ex13 [2017/11/15 12:54]
dokuroot
assignments:ex13 [2018/11/27 12:56]
dokuroot
Line 9: Line 9:
 \\ \\
  
-**Note:** this pipeline is no longer updated and has been replaced with a more efficient and accurate pipeline, however, we will use the original Tuxedo pipeline because there is far more support currently available for it and it has fewer bugs to contend with. The new and improved pipeline consists of a similar suite of tools: HISAT, StringTie, and Ballgown. ​+**Note:** this pipeline is no longer updated and has been replaced with a more efficient and accurate pipeline, however, we will use the original Tuxedo pipeline because there is far more support currently available for it and it has fewer bugs to contend with. The new and improved pipeline consists of a similar suite of tools: HISAT, StringTie, and Ballgown. ​See https://​www.nature.com/​articles/​nprot.2016.095
  
 ====OUTLINE==== ====OUTLINE====
Line 24: Line 24:
 ====Quality control and filtering==== ====Quality control and filtering====
  
-**1. ** Open cyberduck and sftp into the montgomery lab server:\\ 
-''​montgomeryserver,​biology.colostate.edu''​ 
-\\ \\ 
  
-**2. ** Download brain and download ​the ''​brain_data'' ​folder onto your desktop\\ +**1. ** Open a terminal window ​and change into the ''​brain_data'' ​directory: ''​/​Users/​bz360/Documents/RNAseq_Files/​brain_data'' ​\\ 
-''​genomics/Documents/RNA-seq_data/​brain_data''​+
  
 There are 12 RNA-seq datasets corresponding to paired-end data for 3 replicates from two sample sets (brain and ref). Examine a few lines of one of the files using ''​zmore''​ or ''​zless''​. ​ What information is contained in each line? There are 12 RNA-seq datasets corresponding to paired-end data for 3 replicates from two sample sets (brain and ref). Examine a few lines of one of the files using ''​zmore''​ or ''​zless''​. ​ What information is contained in each line?
Line 47: Line 44:
 \\ \\
  
-**3.** Assess the quality of the data using ''​FastQC'':​+**2.** Assess the quality of the data using ''​FastQC'':​
 \\ \\
  
Line 57: Line 54:
 \\ \\
     ​     ​
-**4.** Trim adapter sequences and quality filter the RNA-seq data (fastq files) using ''​Trimmomatic'':​+**3.** Trim adapter sequences and quality filter the RNA-seq data (fastq files) using ''​Trimmomatic'':​
   ​   ​
 Trim adapter sequences and quality filter each dataset using Trimmomatic (you will run trimmomatic 6 times in total). Trim adapter sequences and quality filter each dataset using Trimmomatic (you will run trimmomatic 6 times in total).
  
-  $ trimmomatic PE -phred33 ​'​input_fastq_1'​ '​input_fastq_2'​ 'output_fastq_paired_1' 'output_fastq_unpaired_1' 'output_fastq_paired_2' 'output_fastq_unpaired_2' ILLUMINACLIP:/​usr/share/trimmomatic/​adapters/​TruSeq3-PE.fa:​2:​30:​10 LEADING:3 TRAILING:3 SLIDINGWINDOW:​4:​15 MINLEN:36 +  $ trimmomatic PE '​input_fastq_1'​ '​input_fastq_2'​ 'trimmed_P1' 'trimmed_P2' 'trimmed_U1' 'trimmed_U2' ILLUMINACLIP:/​Users/bz360/Documents/​TruSeq3-PE.fa:​2:​30:​10 LEADING:3 TRAILING:3 SLIDINGWINDOW:​4:​15 MINLEN:36 
 + 
 +For output file names, use:\\ 
 +trimmed_brain1_P1.fastq.gz\\ 
 +trimmed_brain1_P2.fastq.gz\\ 
 +trimmed_brain1_U1.fastq.gz\\ 
 +trimmed_brain1_U2.fastq.gz\\
  
 //See the Trimmomatic manual for a detailed description of options: http://​www.usadellab.org/​cms/​uploads/​supplementary/​Trimmomatic/​TrimmomaticManual_V0.32.pdf // //See the Trimmomatic manual for a detailed description of options: http://​www.usadellab.org/​cms/​uploads/​supplementary/​Trimmomatic/​TrimmomaticManual_V0.32.pdf //
Line 68: Line 71:
 **Submit an answer to the following question on Canvas:** **Submit an answer to the following question on Canvas:**
 \\ \\
-What proportion of the reads were retained?+What proportion of the reads in each library was retained?
 \\ \\ \\ \\ \\ \\
  
-**5.** Assess the quality of one of the datasets after quality filtering using ''​FastQC'':​+**4.** Assess the quality of one of the datasets after quality filtering using ''​FastQC'':​
  
 In FastQC: ​ In FastQC: ​
Line 78: Line 81:
 \\ \\
  
-**6.** Create a ''​bowtie index''​ for the human chromosome 22 sequence: \\+**5.** Create a ''​bowtie index''​ for the human chromosome 22 sequence: \\
  
  $ bowtie2-build '​sequence.fa'​ '​prefix'​  $ bowtie2-build '​sequence.fa'​ '​prefix'​
  
-The chr22 sequence ​is in the ''​brain_data''​ folder: ''​hg38_chr22.fa''​\\ +The chr22 sequence ​should ​in the ''​brain_data''​ folder: ''​Homo_sapiens.GRCh38.dna.chromosome.22.fa''​.
- +
-For the bowtie prefix, use ''​chr22''​.\\ +
- +
-\\ \\ +
-**7.** Move the 6 bowtie index files and the genome sequence file (''​hg38_chr22.fa''​) to a new folder called ''​bowtie_chr22''​. +
- +
- +
-//See the bowtie manual for additional details: http://​bowtie-bio.sourceforge.net/​bowtie2/​manual.shtml // +
- +
-\\  +
- +
-**8.** Align sequences from each of the libraries to the human genome using ''​TopHat2''​ (you will run tophat 6 times in total): +
- +
-  $ tophat -p 8 -G '​path_to_genome_annotations.gtf'​ -o '​output_folder'​ '​path_to_bowtie_index_for_reference_genome/​prefix'​ '​fastq_file_paired_1_1','​fastq_file_paired_1_2','​fastq_file_unpaired_1_1','​fastq_file_unpaired_1_2'​ +
- +
-  * NOTE: There are no spaces between the fastq file names. +
-  * The directory containing the fastq files should be the current working directory. +
-  * Name the output folders as follows: ''​ref1'',​ ''​ref2'',​ ''​ref3'',​ ''​brain1'',​ ''​brain2'',​ ''​brain3''​ +
-   +
-//See the TopHat manual for additional details:​https://​ccb.jhu.edu/​software/​tophat/​manual.shtml // +
 \\ \\
 +You will need to decompress the file if it has the ''​.gz''​ extension. ​ \\
 +\\
 +If you don't have the chromosome sequence, you can download it here:
 +\\
 +ftp://​ftp.ensembl.org/​pub/​release-94/​fasta/​homo_sapiens/​dna/​
  
-**9.** Determine what proportion of the reads from each library were aligned: 
  
-Use the UNIX ''​more'' ​command to open each TopHat summary file in the terminalThe TopHat summary files are named ''​align_summary.txt''​ and are located in the output folder specified in step 5. +For the bowtie prefix, use ''​chr22''​.\\
-  $ more ./​ref1/​align_summary.txt +
-  etc. +
- +
-\\ +
- +
-**Submit an answer to the following question on Canvas:** +
-\\ +
-What proportion of reads in each library aligned?+
assignments/ex13.txt · Last modified: 2018/11/27 12:56 by dokuroot