User Tools

Site Tools


assignment6

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
assignment6 [2018/10/25 08:51]
dokuroot created
assignment6 [2018/10/29 09:30] (current)
dokuroot
Line 6: Line 6:
 ===Exercise 1=== ===Exercise 1===
  
-Write a function, ​fasta_to_tab(input_file,​ output_file),​ that converts a fasta file, such as c_elegans_mirnas.fato a comma separated file (csv) **using regular expressions**.+Write a function, ​''​fasta_to_csv(input_file,​ output_file)''​, that converts a fasta file, such as {{ :c_elegans_mirnas.fa.gz | c_elegans_mirnas.fa }} to a comma separated file (csv) **using regular expressions**.
 \\ \\
-The function should ​accept ​the two arguments from the commmand line.+\\ 
 +The function should ​exit gracefully if the files can't be opened.
  
   Input File   Input File
Line 19: Line 20:
  
   Output File   Output File
-  cel-let-7 TGAGGTAGTAGGTTGTATAGTT +  cel-let-7,TGAGGTAGTAGGTTGTATAGTT 
-  cel-lin-4 TCCCTGAGACCTCAAGTGTGA +  cel-lin-4,TCCCTGAGACCTCAAGTGTGA 
-  cel-miR-1 TGGAATGTAAAGAAGTATGTA+  cel-miR-1,TGGAATGTAAAGAAGTATGTA
  
 ===Exercise 2=== ===Exercise 2===
  
-Write a function ​`motif_finder(input_file,​ motif)`, that returns the number of times a sequence motif occurs in a sequence file, such as (c_elegans_chrI.fa)[http://​rna.colostate.edu/​dokuwiki/​doku.php?​id=sample_data] ​(note that the sequence is lowercase). The function should allow for any number of Ns to be present in the motif (e.g. TGANNNTCA) ​and should ​require ​the user to pass the input file name and motif to the function ​from the command line+Write a function ​''​motif_finder(input_file,​ motif)''​, that returns the number of times a sequence motif occurs in a sequence file, such as {{ :​c_elegans_chri.fa.gz | c_elegans_chrI.fa}} (note that the sequence is lowercase). 
 +\\ 
 +\\ 
 +The function should have the following attributes:​ 
 +\\ 
 +\\ 
 +1) The function should allow for any number of Ns (where N is a wild card that can be any nucleotide - A, C, G, T) specified by the user to be present in the motif (e.g. TGANNNTCA).  The user should ​specify ​the exact sequence, which may contain Ns, but the user should indicate how many Ns there are and where they occur within ​the sequence. Some possible motifs: TTTTCNGA, NATAAA, NATNAA. 
 +\\ 
 +\\ 
 +2) The function ​should exit gracefully if the file can't be opened. 
 +\\
 \\ \\
-Your program ​should count motifs that span multiple lines.+3) The function ​should count motifs that span multiple lines.
 \\ \\
  
  
 **Submit your assignment as a file upload on canvas.** **Submit your assignment as a file upload on canvas.**
assignment6.1540479070.txt.gz · Last modified: 2018/10/25 08:51 by dokuroot