# Computational biology at CSU

## DSCI 512: RNA-seq

This is an old revision of the document!

## Assignment 4

Due date: 10/16/18 at 10 am

Follow the template shown in class for writing your code. The “main” segment of the module should prompt the user for the arguments and pass them to the functions.

### Exercise 1

Write a function, `fastq_fasta(input_file, output_file)`, that converts a fastq file to a new fasta formatted file.

The function should have the following attributes:

• Trims off the '@' sign at the beginning of each ID line.
• Exits gracefully if it can't open the the files.
• Prints to the terminal the number of reads that were processed (e.g. “Reads processed: 12345678”).

The input and output files should have the following formats:

Input: a fastq file

```@NS500697:12:HN75WBGXX:1:11101:19826:1052 1:N:0:1
GCGGGNTGGAAGGTGGAGCACGATCTCGAGTGGGTTGACGTCGTGAGCGA
+
@AAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE```

Output: a fasta file with header and sequence

```>NS500697:12:HN75WBGXX:1:11101:19826:1052 1:N:0:1
GCGGGNTGGAAGGTGGAGCACGATCTCGAGTGGGTTGACGTCGTGAGCGA```

A description of fastq files is here.
A description of fasta files is here.

Note that '@' and nucleotide characters sometimes appears in the quality score line and thus to identify the header and sequence lines you will need to use line numbers. Each read is represented by 4 lines within a fastq file. The first and second lines contain the ID and sequence, respectively.

Hint: use the modulus operator (`%`) to determine if a line is a header or sequence line.
For example: `if (line_number + 3) % 4 == 0: # header line`

### Exercise 2

Write a function, `fastq_trimmer(input_file, output_file, trim_5p, trim_3p)`, that trims any number of nucleotides from the 5' end of each read and any number of nucleotides from the 3' end of each read in a fastq file and writes the results to a new fastq formatted file.

The function should have the following attributes:

• Exits gracefully if it can't open the the files.
• Prints to the terminal the number of reads that were processed and the original and resulting read length with the assumption that all reads are the same length. For example:
```Reads processed: 12345678