User Tools

Site Tools


wiki:computationalprojects

This is an old revision of the document!


Starting and Organizing Computational Projects

References

A quick review of our analysis strategy


Let's organize a yeast demonstration project

Make a new project directory:

# Log into summit
$ ssh scompile
 
# Navigate to the space where you want to put your directory. I'm putting mine in /scratch/summit/<eID>@colostate.edu/DSCI512_RNAseq
 
# Make a new directory & navigate into it:
 
$ mkdir PROJ04_yeastDemo
$ cd PROJ04_yeastDemo

Populate the project directory with relevant sub-directories:

$ mkdir 01_input
$ mkdir 02_scripts
$ mkdir 03_output

Copy relevant input files:

  • We need the .fastq files we will analyze (one for each sample since we are working with paired-end data)
  • We need a metadata file, a useful file with information on each sequencing file.
  • These will go in 01_input
  • Copy these from my directory:
$ cd 01_input
$ cp /scratch/summit/erinnish@colostate.edu/DATA_DSCI512/*.fastq .
$ cp /scratch/summit/erinnish@colostate.edu/DATA_DSCI512/metadata_aceticAcid_subset.txt .
$ ls

Explore the files.


Let's update the software and install it:

Navigate to the directory where you originally downloaded David's github repository.

$ cd /scratch/summit/<eID>@colostate.edu/DSCI512_RNAseq
$ cd PROJ01_testsummit
$ cd summit-rna-seq-setup

Pull updates from the repository.

# Update the folder
$ git pull

Copy the updated file activate.bashrc higher up so it is easier to access.

$ cp activate.bashrc /scratch/summit/<eID>@colostate.edu/ # Replace <eID> with your eID

Learn how to load the software

  • Anytime you want to load up the software, execute the line:
$ source /scratch/summit/<eID>$colostate.edu/activate.bashrc

:!: Common pitfall: This code needs to be executed from scompile. If something goes strangely, try to ssh compile and then try it again.

:!: Quick tip: Anytime we want to use any of this software in a script, we'll have to add this source command within the code so that the software is accessible.

:?: Major question: Does this really install software?

  • Not really. What it does is to link your file space to software that David has already installed in his own /projects/ directory. So for this class, we are piggy-backing off of David's installation hard work. Thank you, David!

A cool trick: aliasing

It gets really tiresome to type squeue -u $USER. Let's shorten it to scheck

$ alias scheck='squeue -u $USER'

That will let you type scheck anytime during this summit session. Every time you log in, you'll need to re-do the aliasing. Alternatively, if you want to make it permanent, you can add that line of code to the end of a file in your home directory:

/home/<eID>@colostate.edu/.bash_profile

My .bash_profile looks like this at the end:

#Aliases
alias scheck="squeue -u $USER"
 
# Remove this if you don't want to display README at login
if [ -f ~/README.mdwn ]; then
    cat ~/README.mdwn
fi

Before the changes to your updated .bash_profile will take place, you'll need to either log out and log back in again, or source your new .bash_profile with the following line of code:

$ source .bash_profile

:!: CAUTION! Be very careful updating your .bash_profile file. Make backup copies of this before you alter it. Especially if you do this on your local computer, you can mess up your computer.

Writing a Basic Pipeline

wiki/computationalprojects.1543333757.txt.gz · Last modified: 2018/11/27 08:49 by erin