User Tools

Site Tools


wiki:computationalprojects

Starting and Organizing Computational Projects

References

A quick review of our analysis strategy


Let's organize a yeast demonstration project

Make a new project directory:

# Log into summit
$ ssh scompile
 
# Navigate to the space where you want to put your directory. I'm putting mine in /scratch/summit/<eID>@colostate.edu/DSCI512_RNAseq
 
# Make a new directory & navigate into it:
 
$ mkdir PROJ04_yeastDemo
$ cd PROJ04_yeastDemo

Populate the project directory with relevant sub-directories:

$ mkdir 01_input
$ mkdir 02_scripts
$ mkdir 03_output

Copy relevant input files:

  • We need the .fastq files we will analyze (one for each sample since we are working with paired-end data)
  • We need a metadata file, a useful file with information on each sequencing file.
  • These will go in 01_input
  • Copy these from my directory:
$ cd 01_input
$ cp /scratch/summit/erinnish@colostate.edu/DATA_DSCI512/*.fastq .
$ cp /scratch/summit/erinnish@colostate.edu/DATA_DSCI512/metadata_aceticAcid_subset.txt .
$ ls

Explore the files.


Let's update the software and install it:

Navigate to the directory where you originally downloaded David's github repository.

$ cd /scratch/summit/<eID>@colostate.edu/DSCI512_RNAseq
$ cd PROJ01_testsummit
$ cd summit-rna-seq-setup

Pull updates from the repository.

# Update the folder
$ git pull

Copy the updated file activate.bashrc higher up so it is easier to access.

$ cp activate.bashrc /scratch/summit/<eID>@colostate.edu/ # Replace <eID> with your eID

Learn how to load the software

  • Anytime you want to load up the software, execute the line:
$ source /scratch/summit/<eID>$colostate.edu/activate.bashrc

:!: Common pitfall: This code needs to be executed from scompile. If something goes strangely, try to ssh compile and then try it again.

:!: Quick tip: Anytime we want to use any of this software in a script, we'll have to add this source command within the code so that the software is accessible.

:?: Major question: Does this really install software?

  • Not really. What it does is to link your file space to software that David has already installed in his own /projects/ directory. So for this class, we are piggy-backing off of David's installation hard work. Thank you, David!

A cool trick: aliasing

It gets really tiresome to type squeue -u $USER. Let's shorten it to scheck

$ alias scheck='squeue -u $USER'

That will let you type scheck anytime during this summit session. Every time you log in, you'll need to re-do the aliasing. Alternatively, if you want to make it permanent, you can add that line of code to the end of a file in your home directory:

/home/<eID>@colostate.edu/.bash_profile

My .bash_profile looks like this at the end:

#Aliases
alias scheck="squeue -u $USER"
 
# Remove this if you don't want to display README at login
if [ -f ~/README.mdwn ]; then
    cat ~/README.mdwn
fi

:!: CAUTION! Be very careful updating your .bash_profile file. Make backup copies of this before you alter it. Especially if you do this on your local computer, you can mess up your computer.

Before the changes to your updated .bash_profile will take place, you'll need to either log out and log back in again, or source your new .bash_profile with the following line of code:

$ source .bash_profile

Writing a Basic Pipeline

wiki/computationalprojects.txt · Last modified: 2018/11/27 08:49 by erin