User Tools

Site Tools


assignment4

This is an old revision of the document!


Assignment 4

Due date: 10/16/18 at 10 am

Exercise 1

Write a function, fastq_fasta(), that converts a fastq file to a new fasta formatted file.

A sample fastq dataset can be downloaded from here.
A description of fastq files is here.
A description of fasta files is here.

Note that '@' and nucleotide characters sometimes appears in the quality score line and thus to identify the header and sequence lines you will need to use line numbers. Each read is represented by 4 lines within a fastq file. The first and second lines contain the ID and sequence, respectively.

Hint: use the modulus operator (%) to determine if a line is a header or sequence line.
For example: if (line_number + 3) % 4 == 0: # header line

The function should have the following attributes:

  • Prompts the user for the input and output file names.
  • Trims off the '@' sign at the beginning of each ID line.
  • Exits gracefully if it can't open the the files.
  • Prints to the terminal the number of reads that were processed (e.g. “Reads processed: 12345678”).

The input and output files should have the following formats:

Input: a fastq file

@NS500697:12:HN75WBGXX:1:11101:19826:1052 1:N:0:1
GCGGGNTGGAAGGTGGAGCACGATCTCGAGTGGGTTGACGTCGTGAGCGA
+
@AAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE

Output: a fasta file with header and sequence

>NS500697:12:HN75WBGXX:1:11101:19826:1052 1:N:0:1
GCGGGNTGGAAGGTGGAGCACGATCTCGAGTGGGTTGACGTCGTGAGCGA


Exercise 2

Write a function, fastq_trimmer(), that trims any number of nucleotides from the 5' end of each read and any number of nucleotides from the 3' end of each read in a fastq file and writes the results to a new fastq formatted file.

The function should have the following attributes:

  • Prompts the user for the input file name, output file name, the number of nucleotides to trim from the 5' end, and the number of nucleotides to trim from the 3' end.
  • Exits gracefully if it can't open the the files
  • Prints to the terminal the number of reads that were processed (e.g. “Reads processed: 12345678”) and the original and resulting read length (e.g. “Original read length: 50 nt; Trimmed read length: 40 nt”), assuming all reads are the same length.


Follow the template shown in class for writing your code. The “main” segment of the module should be used to test each of the functions.

Combine your functions into a single module and submit via Canvas for grading.

assignment4.1539209919.txt.gz · Last modified: 2018/10/10 16:18 by dokuroot