Computational biology at CSU

DSCI 512: RNA-seq

Questions?

script:ex9

Exercise 9 Code

exercise9.py
```'''
9a) Write a for loop to iterate over each key in the
dictionary below and print the key, value pair in fasta
format.

fasta = {'let-7': 'TGAGGTAGTAGGTTGTATAGTT',
'lin-4': 'TCCCTGAGACCTCAAGTGTGA',
'miR-1': 'TGGAATGTAAAGAAGTATGTA'}
'''

for seq in fasta:
print('>' + seq + '\n' + fasta[seq])

'''
9b) Write a function, revcomp(sequence) that uses
dictionaries and slicing to compute the reverse
complement of a sequence.
'''

def revcomp(sequence):
comp_nt = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
comp = ''
for nt in sequence:
comp += comp_nt[nt]
revcomp = comp[::-1]
return revcomp

revcomp('ATG')

'''
9c) Write a function, fasta_converter(fasta_file), that
converts fasta formatted data containing short sequences
each on a single line for example, c_elegans_mirnas.fa.gz,
to tab delimited format. The function should read in a fasta
file (sample data here) with each sequence id and sequence as
key-value pair in a dictionary and then write the key-value
pairs to new file in tab delimited format
(this is a precursorto the homework assignment):
'''
def fasta_converter(fasta_file, output_file):
try:
infile = open(fasta_file)
outfile = open(output_file, 'w')
except:
-1

fasta_d = {}
with infile, outfile:
for line in infile:
if line[0] == '>':
name = line[1:-1]
else:
seq = line
fasta_d[name] = seq
for seq in fasta_d:
outfile.write(seq + '\t' + fasta_d[seq])

fasta_converter('cel_miRNA.fa', 'test.tab')```