User Tools

Site Tools


assignments:exam2020

Exam - DSCI510

  • Due Tuesday, September 22, 2020, 10:00 am
  • If you need more time to complete this exam, please contact Erin before September 20, 10am to make arrangements to tun the assignment in late.
  • This Exam accounts for 30% of your final grade

This exam will be a project. You will be asked to pick a standardized file type in your field. You will be asked to write a bash program that performs some operation on this standardized file type. Your program should be able to take one or more of these standardized file types and perform some operation on them. It will also report a message if the user did not execute the program properly.

Get feedback on your progress as you go You have the option to turn in Question1, Question2, and Question 3 to Erin before Tuesday, September 15th, and receive feedback on your answers. This is a good opportunity to work with Erin and David to come up with a good project.

Grading

The grade for the final will be structured like so…

5  pts - Question 1
5  pts - Question 2
5  pts - Question 3
25 pts - Question 4
   10 pts - Does the script perform the operation properly? 
   5  pts - If the user does not use the script properly, does the script output a message that describes its proper usage?
   5  pts - Your script should define at least one variable.
   5 pts - General uploading - did you provide me with the script file, an example input file, and an example output file?
5  pts - Question 5
5  pts - Question 6
50 pts - TOTAL

Extra Credit:
5 pts - bonus question #1
5 pts - bonus question #2
5 pts - bonus question #3

Hint:
minus 5 pts - use a suggested file type and script idea

Question 1 (5 pts)

Select a standardized file type in your field. You can use any flat/text file formats listed in this Wikipedia page of scientific standardized file types. OR, you can pick any other file type that is flat.

Answer the following: What file type did you select? Describe the information that is gathered in your file. If your file has columns, what is tabulated in each column?

Question 2 (5 pts)

Answer the following: What is the operation your script will perform? Why is this a useful operation to do? What is the name of the script you propose to write?

Question 3 (5 pts)

Answer the following: In pseudocode (or describing in normal English), explain how your script will work. What steps will it perform and in what order? Where will the loop be?

Question 4 (25 pts)

Write your script. Your script will do an operation on one or more of the same file types. Your script should output information from its operation 1) to the screen 2) into a re-directed file or 3) into a set output file (called 'output.txt' or something like that).

10 pts If your script works, you get 10 pts. If it half works or works in some situations, you will get partial credit.

5 pts If the user fails to provide an input file, your script should print to the screen that something is wrong and suggest the proper usage of the script.

Example:

$ bash gtf2bed.sh 
$ GTF2BED>>> ERROR >>>
  Please provide a proper input file in the form: 
     bash gtf2bed.sh input.gtf

5 pts Your script should involve the creation of at least one variable.

5 pts You will need to turn in the proper files associated with Question 4. Turn in the following:

  • Turn in your script as a .sh file
  • Turn in an example input file
  • Turn in an example of the output file you were able to generate
  • You can turn these Question 4 documents in via canvas.

Question 5 (5 pts)

Answer the following: Reflect on the code you wrote. Does it have any bugs you would like to fix before using it? If it works, how could you imaging expanding or improving it?

Question 6 (5 pts)

Answer the following: This is not related to the script, but to your own work and research. How is your data backed up? Explain the strategy you use to ensure your research files are securely backed up and retrievable in case of an emergency. If this is inadequate, what are some ideas you have to improve your strategy? You won't be graded on whether the back up is adequate or not.


Extra Credit

EC 1 (5 pts)

Do the following: Put a conditional statement or a loop in your script (beyond the conditional statement for your usage statement)?

Answer the following: Explain how this conditional and/or loop works.

EC 2 (5 pts)

Answer the following: What is your dream script? Is there an operation or a process that you imagine needing to perform for your research project, but that you do not yet have the coding ability to write? Explain what your dream script would do.

EC 3 (5 pts)

Do the following: Use date somewhere in your script. To learn more about date go here: linux date

Answer the following: How did you use date in your script?


Hint

If you just can't figure out what type of a script to write, you can do the following but it will cost you 5 points.

OPTION 1 - gtf2bed.sh

gtf2bed.sh → convert a subset of some type of .gtf feature entry into .bed format. For example, have the script find all the start-codon entries or CDS (coding sequence) entries, and re-format each of those into the .bed file format…

To read more about the .gtf file format go here: GTF file format Wikipedia For the .bed file format go here: External Link

# chrom chrom-start chrom-end
chr7    127471196    127472363
chr7    127472363    127473530
chr7    127473530    127474697

OPTION 2 - gtfUCSC2ENSEMBL.sh

gtfUCSC2ENSEMBL.sh → The UCSC genome browser writes chromosome names like so:

chrI, chrII, chrIII, etc.

Whereas the ENSEMBL database writes chromosome names like so:

chromosome1, chromosome2, chromosome3, etc.

Write a script that converts gtf files from UCSC chromosome names to ENSEMBL

HINT

Just a quick tip: Maybe try to avoid using fasta files because they wrap in a weird way that is hard to deal with using Linux/bash. These files will be easier to manipulate using Python.

assignments/exam2020.txt · Last modified: 2021/06/01 15:06 (external edit)