User Tools

Site Tools


wiki:tabulation

Tabulation Using featureCounts

References


What does featureCounts do?

featureCounts simply counts the number of reads that fall within the limits of each “feature”.

:!: EXERCISE: How would you design a read-counting program to assign reads to different features? What would you consider to be a “feature”? Would it be an exon, a gene, or a transcript?


featureCounts solves this problem by only counting reads if they can uniquely be mapped to a single gene.

featureCounts has two levels of feature organization. meta-features are genes. features are exons. Each gene has multiple exons. Each exon is associated with only one gene.

“We recommend that reads or fragments overlapping more than one gene are not counted for RNA-seq experiments because any single fragment must originate from only one of the target genes but the identity of the true target gene cannot be confidently determined.” = Liao et al., 2014.


The featureCounts algorithm takes the following input and gives you the following output…

featureCounts input:

  • .bam/.sam files. Also known as an alignment file.
  • a .gtf/.gff annotation file for your genome

featureCounts output:

  • a .txt file containing count data for the entire experiment
  • a summary file for the entire experiment

featureCount Usage:

featureCounts [options] -a <annotation_file> -o <output_file.txt> input_file1.sam [input_file2.sam] ... 
 
Options
     -p             paired-end sequencing   #Default is for single-end
     -Q <number>    minimum mapping quality score a read must satisfy to be counted. We'll use 20.
     -T <number>    number of threads

featureCount exercise

:!: Add a line of code to your simple_pipeline.sh script that performs a feature count for sample01. Set the quality threshold to 20.

:!: Quick tip: You should have downloaded the annotation file from the remaining exercises from Nov 15th. However, if you weren't able to, you can use this one. Just download from this site and then upload to summit.

181115_scer_annotation.gtf.gz

:?: Questions: What is the featureCounts output? Where is it? Explore the output.

:?: Question: How many reads were successfully mapped? How many were unsuccessfully mapped?

:!: ANSWER: Here are the lines of code: CLICK HERE FOR simple_pipeline.sh

:!: HOMEWORK HINT: For this exercise, we are only tabulating one sample. For your homework, you'll tabulate two samples. Don't forget that you'll need to add the second sample to the end of your line of code. So you'll only run feature count once for the entire experiment and each sample will be added as an argument in a single line of code. If you have three annotation files, EN01.sam, EN02.sam, EN03.sam, you code may look something like this…

featureCounts [options] -a <annotation_file> -o <output_file.txt> ../03_output/EN01.sam ../03_output/EN02.sam ../03_output/EN03.sam

:!: NOTEBOOK: Make some notes in your computational notebook about what you were able to accomplish.

Assignment 4

wiki/tabulation.txt · Last modified: 2018/11/27 05:53 by erin