User Tools

Site Tools


assignment5

ASSIGNMENT 5


Due date: 10/23/18 by 10 am

The exercises can be completed using only material covered so far.

Exercise 1

Write a function, transposer(matrix), that swaps the rows and columns of a matrix and returns the results as a new matrix.

For example:

#Original matrix:
[[1,2],[3,4],[5,6]]
#Transposed matrix:
[[1,3,5],[2,4,6]]


Additional Hints (avoid reading these until you get stuck)

1) Your code will likely have a nested for loop. Use the length of the nested list within the matrix (len(matrix[0])) for the number of iterations in the outer for loop.

2) Use the length of the matrix (len(matrix)) for the number of iterations in the nested for loop.

3) Append each element from the original nested lists within the matrix (matrix[n][i], where n and i are the two iteration variables in the for loops - but which order?) to a new list within the nested for loop and then append the new list to a second new list in the outer for loop.

Exercise 2

Write a function, miRNA_counter(input_fastq_file, input_fasta_file, output_file), that counts the number of times each miRNA in a fasta file appears in a small RNA high-throughput sequencing library.

fastq file: small_RNAs.fastq
fasta file: c_elegans_miRNAs.fa

The function should:
1. Read in, line by line in a for loop, a fastq file containing the small RNA library data.

2. Create a dictionary with each key being the sequence and the value being the number of reads for that sequence (you'll need to use the get method and use the approach demonstrated in class).

3. Read in a fasta file containing the miRNA sequences, again line by line.

4. Store the miRNA name as a variable, and then calculate the number of reads for the corresponding sequence, using the get method and the dictionary of fastq sequences from above.

5. Write to an output file, the name of the miRNA and the number of reads in tab delimited format.

#input_fastq_file:
@D64TDFP1:248:C50DMACXX:5:1101:1241:2095 1:N:0:ATCACG
TGAGGTAGTAGGTTGTATAGTT
+
CCCFFFFFHHHHHJIJGHJJJJI
@D64TDFP1:248:C50DMACXX:5:1101:1371:2154 1:N:0:ATCACG
TCAATATTTGCATAGGGTATC
+
CCCFFFFFHHHHHJJJJGFHI
#input_fasta_file:
>cel-let-7
TGAGGTAGTAGGTTGTATAGTT
>cel-lin-4
TCCCTGAGACCTCAAGTGTGA
#output_file:
let-7	1
lin-4	0

Additional Hints (avoid reading these until you get stuck)
1) Your first for loop should loop through the fastq file creating a dictionary with the sequence lines (if (line_number + 3) % 4 == 1) as the keys and the numbers of times each sequence appears as the values. You can increment the values using get (fastq_dictionary[sequence] = fastq_dictionary.get(sequence, 0) + 1).

2) The second for loop should loop through the fasta file and retrieve the number of reads for each sequence and should write the miRNA name and reads to a new file (outfile.write(mirna_name[1:] + '\t' + str(fastq_dictionary.get(line, 0)) + '\n')).

assignment5.txt · Last modified: 2018/10/20 17:37 by dokuroot