User Tools

Site Tools


assignment4

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
assignment4 [2018/10/10 16:21]
dokuroot
assignment4 [2018/10/10 22:02] (current)
dokuroot
Line 4: Line 4:
  
 **Due date:​** ​ 10/16/18 at 10 am **Due date:​** ​ 10/16/18 at 10 am
 +\\
 +\\
 +Follow the [[ script:​template | template]] shown in class for writing your code.  The "​main"​ segment of the module should prompt the user for the arguments and pass them to the functions and print the return values to the terminal.
  
 ====Exercise 1==== ====Exercise 1====
  
-Write a function, fastq_fasta(),​ that converts a fastq file to a new fasta formatted file.+Write a function, ​''​fastq_fasta(input_file, output_file)''​, that converts a fastq file to a new fasta formatted file.
 \\ \\
  
 The function should have the following attributes: The function should have the following attributes:
-  * Prompts the user for the input and output file names. 
   * Trims off the '​@'​ sign at the beginning of each ID line.   * Trims off the '​@'​ sign at the beginning of each ID line.
 +  * Writes to the output file only the ID lines and sequence lines, per fasta formatting rules.
 +  * Includes a '>'​ at the beginning of each ID line in the fasta output, per fasta formatting rules.
   * Exits gracefully if it can't open the the files.   * Exits gracefully if it can't open the the files.
-  * Prints to the terminal ​the number of reads that were processed ​(e.g. "Reads processed: 12345678").+  * Returns ​the number of reads that were processed. ​The return value can be printed from the "main" ​segment of the module.
   ​   ​
-The input and output files should have the following formats:+The input and output files should have the following formats ​(excluding the comments):
 \\ \\
  
 Input: a fastq file Input: a fastq file
-  @NS500697:​12:​HN75WBGXX:​1:​11101:​19826:​1052 1:N:0:1 +  @NS500697:​12:​HN75WBGXX:​1:​11101:​19826:​1052 1:​N:​0:​1 ​  # line 1: sequence identifier 
-  GCGGGNTGGAAGGTGGAGCACGATCTCGAGTGGGTTGACGTCGTGAGCGA +  GCGGGNTGGAAGGTGGAGCACGATCTCGAGTGGGTTGACGTCGTGAGCGA ​ # line 2: sequence 
-  + +  +                                                   # line 3: optional identifier 
-  @AAAA#​EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE+  @AAAA#​EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE ​ # line 4: quality values
  
 Output: a fasta file with header and sequence Output: a fasta file with header and sequence
-  >​NS500697:​12:​HN75WBGXX:​1:​11101:​19826:​1052 1:N:0:1 +  >​NS500697:​12:​HN75WBGXX:​1:​11101:​19826:​1052 1:​N:​0:​1 ​  # sequence identifier 
-  GCGGGNTGGAAGGTGGAGCACGATCTCGAGTGGGTTGACGTCGTGAGCGA+  GCGGGNTGGAAGGTGGAGCACGATCTCGAGTGGGTTGACGTCGTGAGCGA ​ # sequence
  
-A sample fastq dataset can be downloaded ​from {{ :​sample_data.fastq.gz | here}}.\\+A sample fastq dataset can be downloaded {{ :​sample_data.fastq.gz | here}}.\\
 \\ \\
 A description of fastq files is [[https://​en.wikipedia.org/​wiki/​FASTQ_format | here]].\\ A description of fastq files is [[https://​en.wikipedia.org/​wiki/​FASTQ_format | here]].\\
Line 35: Line 39:
  
 Note that '​@'​ and nucleotide characters sometimes appears in the quality score line and thus to identify the header and sequence lines you will need to use line numbers. Each read is represented by 4 lines within a fastq file. The first and second lines contain the ID and sequence, respectively.\\ ​ Note that '​@'​ and nucleotide characters sometimes appears in the quality score line and thus to identify the header and sequence lines you will need to use line numbers. Each read is represented by 4 lines within a fastq file. The first and second lines contain the ID and sequence, respectively.\\ ​
 +
 **Hint:** use the modulus operator (''​%''​) to determine if a line is a header or sequence line. \\ **Hint:** use the modulus operator (''​%''​) to determine if a line is a header or sequence line. \\
 For example: ''​if (line_number + 3) % 4 == 0: # header line''​ For example: ''​if (line_number + 3) % 4 == 0: # header line''​
Line 41: Line 46:
 ====Exercise 2==== ====Exercise 2====
  
-Write a function, fastq_trimmer(),​ that trims any number of nucleotides from the 5' end of each read and any number of nucleotides from the 3' end of each read in a fastq file and writes the results to a new fastq formatted file.+Write a function, ​''​fastq_trimmer(input_file, output_file,​ trim_5p, trim_3p)''​, that trims any number of nucleotides from the 5' end of each read and any number of nucleotides from the 3' end of each read in a fastq file and writes the results to a new fastq formatted file. The quality score lines should be trimmed exactly as the sequence lines.
  
 The function should have the following attributes: The function should have the following attributes:
-  ​* Prompts the user for the input file name,  output file name, the number of nucleotides to trim from the 5' end, and the number of nucleotides to trim from the 3' end. +  * Exits gracefully if it can't open the the files. 
-  ​* Exits gracefully if it can't open the the files +  * Trims sequence lines and quality score lines as specified in the ''​trim_5p''​ and ''​trim_3p''​ arguments. 
-  * Prints to the terminal ​the number of reads that were processed ​(e.g. "Reads processed: 12345678"​) and the original and resulting read length (e.g. "​Original read length: 50 nt; Trimmed read length: 40 nt"), assuming all reads are the same length. +  * Returns ​the number of reads that were processed. ​The return value can be printed from the "​main"​ segment of the module. 
-\\ +
-Follow the [[ script:​template | template]] shown in class for writing your code.  The "​main"​ segment of the module ​should be used to test each of the functions+
-\\+
 \\ \\
 **Combine your functions into a single module and submit via Canvas for grading.** **Combine your functions into a single module and submit via Canvas for grading.**
assignment4.1539210082.txt.gz · Last modified: 2018/10/10 16:21 by dokuroot