User Tools

Site Tools


assignments:rna2018_assignment4

Assignment 4

DSCI512: RNAseq
Due date: December 4, 2016, 10:00am
Submit your assignment on canvas


Start an RNA-seq analysis project

  • Make a directory for the project on summit called PROJ05_ce_homework
  • Make input, scripts, and output sub-directories.

Start an RNA-seq analysis project

  • We are going to re-analyze the data from a study entitled Effect of the diet type and temperature on the C. elegans transcriptome
  • I have already uploaded the .fastq files associated with this study to summit.
  • Copy over the following files from my directory to your 01_input directory.
# double check you are in your new project directory and in the input sub-directory
$ pwd
$ ~//PROJ05_ce_homework/01_input
 
 
#copy over the input files
$ cp /scratch/summit/erinnish@colostate.edu/DATA_HOMEWORK/* .
 
$ ls -1
metadata_gomezOrte.txt
SRR5832182_1.fastq
SRR5832182_2.fastq
SRR5832183_1.fastq
SRR5832183_2.fastq
SRR5832184_1.fastq
SRR5832184_2.fastq
SRR5832185_1.fastq
SRR5832185_2.fastq
SRR5832186_1.fastq
SRR5832186_2.fastq
SRR5832187_1.fastq
SRR5832187_2.fastq
SRR5832188_1.fastq
SRR5832188_2.fastq
SRR5832189_1.fastq
SRR5832189_2.fastq
SRR5832190_1.fastq
SRR5832190_2.fastq
SRR5832191_1.fastq
SRR5832191_2.fastq
SRR5832192_1.fastq
SRR5832192_2.fastq
SRR5832193_1.fastq
SRR5832193_2.fastq
SRR5832194_1.fastq
SRR5832194_2.fastq
SRR5832195_1.fastq
SRR5832195_2.fastq
SRR5832196_1.fastq
SRR5832196_2.fastq
SRR5832197_1.fastq
SRR5832197_2.fastq
SRR5832198_1.fastq
SRR5832198_2.fastq
SRR5832199_1.fastq
SRR5832199_2.fastq

Gather your required resources

  • Ensure that you have the necessary hiat2 files:
    • hisat2 indexes (from last homework)
    • If you don't have this completed, return to Assignment3, do Numbers 1 - 3, but skip #4.
  • Use the following .gtf file. Don't use the one from the homework assignment. Use this one!!!

Write a basic RNA-seq analysis pipeline for the first TWO samples in this dataset

  • In the 01_scripts directory, start a script called simple_ce11_pipeline.sh.
  • Copy and paste the initiating information into that script:
#!/usr/bin/env bash
 
#SBATCH --job-name=simple_ce11_analysis
#SBATCH --nodes=1
#SBATCH --ntasks=6      # modify this number to reflect how many cores you want to use (up to 24)
#SBATCH --partition=shas-testing
#SBATCH --qos=testing     # modify this to reflect which queue you want to use. Options are 'normal' and 'testing'
#SBATCH --time=0:29:00   # modify this to reflect how long to let the job go. 
#SBATCH --output=log_pipeline_ce11_%j.txt
 
# Source the bashrc link to install software
 
 
# Quality control with FASTQC
 
 
# Alignment to reference genome with hisat2
 
 
# Tabulation of read counts per gene with featureCounts

Write the script

  • Write in the script so it analyzes the following .fastqc files with the following sample names:
SRR5832182_1.fastq    EG01
SRR5832182_2.fastq    EG01
SRR5832183_1.fastq    EG02
SRR5832183_2.fastq    EG02

Turn in the following

This can only be submitted as a .txt file or copied and pasted into canvas as .txt documentation. No word files, No fancy text.

  1. Include the text of your script simple_ce11_pipeline.sh
  2. Include the text of the summary outputs from featureCounts. Typically these are .txt.summary files.

Notebook

  • Make notes of what you did for this homework in your computational notebook.
  • Turn this in with your notebook at the end of the semester.

Download before next class

Please download IGV (Integrative Genome Browser) before the next class:

assignments/rna2018_assignment4.txt · Last modified: 2018/11/27 06:01 by erin