User Tools

Site Tools


Automating RNA-seq pipelines

Last time, we wrote out each line of an RNA-seq analysis pipeline by hand. This works fine if you have a few samples and if you make no errors. As projects get bigger, the task becomes more cumbersome.

To guard against errors and to streamline large projects, we need an automation strategy. This is the heart of pipeline building.

The key is to use the information in the metadata file to instruct a series of commands to loop over each field.

Let's take a look at the metadata file for our yeast demo project:

SRR3567551_1.fastq	SRR3567551_2.fastq	sample01	CK45-1	untreated	45min	1
SRR3567552_1.fastq	SRR3567552_2.fastq	sample02	CK45-2	untreated	45min	2
SRR3567554_1.fastq	SRR3567554_2.fastq	sample03	Ac45-1	aceticAcidTreated	45min	1
SRR3567555_1.fastq	SRR3567555_2.fastq	sample04	Ac45-2	aceticAcidTreated	45min	2
SRR3567674_1.fastq	SRR3567674_2.fastq	sample09	CK200-1	untreated	200min	1
SRR3567676_1.fastq	SRR3567676_2.fastq	sample10	CK200-2	untreated	200min	2
SRR3567677_1.fastq	SRR3567677_2.fastq	sample11	Ac200-1	aceticAcidTreated	200min	1
SRR3567679_1.fastq	SRR3567679_2.fastq	sample12	Ac200-2	aceticAcidTreated	200min	2

The key is going to use loop control to loop over each element of each column and parse each sample through a series of stereotyped commands.

Let's explore pipeline automation

Make a new project directory:

# Log into summit
$ ssh -l <eID>
# switch to scompile
$ ssh scompile
#If you want to, make your alias to scheck here
$ alias scheck='squeue -u $USER'

Navigate to the space where you want to put your directory. I'm putting mine in /scratch/summit/<eID>

Make a new directory & navigate into it:

$ mkdir PROJ06_yeastDemo2
$ cd PROJ06_yeastDemo2

OK, we just started a new project. And we want to use the same input data as we used in the project PROJ04_yeastDemo. One option would be to copy and paste the input files from that project over to this one. That would work, but it would be inefficient space-wise.

A better option would be to point the PROJ06_yeastDemo2/01_input directory to the PROJ04_yeastDemo/01_input directory. We can do this using soft links, also known as a symbolic link, also known as a short cut.

Soft link usage

$ ln -s /path/to/original /path/to/link
# So if you are located within the directory where you want the link to exist, you can shorten it to...
$ ln -s /path/to/original .

Let's try it. Navigate to your PROJ06_yeastDemo2 directory and create a softlink as your input sub-directory.

# Navigate to your project directory:
$ cd PROJ06_yeastDemo2
$ ls
# create the softlink:
$ ln -s ../PROJ04_yeastDemo/01_input .
# check what you got using two different ls commands:
$ ls
$ ls -alh

Create the other sub-directories

$ mkdir 02_scripts
$ mkdir 03_output

Automating pipelines 2

wiki/automation.txt · Last modified: 2018/11/29 09:24 by erin