When completing an analysis or pausing a project, it is good practice to clean up your projects by:
Common pitfall: Data is deleted from scratch space after 90 days of having no new modifications.
We will clean up the project the project we used as a demo in the last class session. This was the project
PROJ06_yeastDemo2 and we did that demonstration on November, 29, 2018.
NOTEBOOK EXERCISE: Write in your notebook that your plan for today is to clean up the data you generated for PROJ06_yeastDemo2 on Nov. 29, 2018.
OK, now log into summit:
# Log into summit $ ssh -l <eID>@colostate.edu login.rc.colorado.edu # switch to scompile $ ssh scompile #If you want to, make your alias to scheck here $ alias scheck='squeue -u $USER'
Navigate to the space where you want performed your yeast demo last class session.
For me, this space is:
$ cd /scratch/summit/<eID>@colostate.edu/DSCI512_RNAseq/PROJ06_yeastDemo2 # Change to your location
Let's explore this project. The output was saved in
03_output and the scripts were written in
If you copied all the shell scripts from our github templates over to this directory, you should have a clean up script located in this file already.
$ ls -1 execute_RNAseq_pipeline.sh log_RNAseq_pipe_1469109.txt RNAseq_analyzer_181117.sh RNAseq_cleanup_181011.sh
Log onto Cyberduck and open the cleanup script for editing.
EXERCISE: Let's hack this code.
All you need to do is change the date section to specify the date listed on the output folder that you want to clean up:
Change this section to this:
#This is the output_directory: #DATE=`date +%Y-%m-%d` #OR DATE=2018-11-29 outputdir="../03_output/"$DATE"_output/"
Save the modified cleanup script.
EXERCISE: Let's execute this code.
Using either nano or cyberduck, edit your original execution script called
#!/usr/bin/env bash #SBATCH --job-name=test_RNAseq_pipeline #SBATCH --nodes=1 #SBATCH --ntasks=1 # modify this number to reflect how many cores you want to use (up to 24) #SBATCH --partition=shas #SBATCH --qos=normal # modify this to reflect which queue you want to use. Options are 'normal' and 'testing' #SBATCH --time=1:00:00 # modify this to reflect how long to let the job go. #SBATCH --output=log_RNAseq_pipe_%j.txt ## source /scratch/summit/<eID>@colostate.edu/activate.bashrc ## execute the RNA-seq_pipeline #bash RNAseq_analyzer_181117.sh ../01_input/metadata_aceticAcid_subset.txt $SLURM_NTASKS # modify the SECOND argument to point to YOUR metadata.file # modify the THIRD argument to indicate the number of THREADS you # want to use. This number must match the number in #SBATCH --ntasks=# ## clean up by zipping .fastq files and deleting extra files bash RNAseq_cleanup_181011.sh ../01_input/metadata_aceticAcid_subset.txt # modify the SECOND argument to point to YOUR metadata.file
Well, actually we won't back up now, but we'll talk about it.
To move your items off of summit, you can:
For more information about writing rsync script to backup and/or sync your files:
EXERCISE: After the class, save your course content to date somewhere. Keep track of what you did in your Course notebook.
NOTEBOOK EXERCISE: Write in your notebook that you synced your course content somewhere. Write how you did it (cyberduck, drag and drop, or rsync).
COMMON PITFALL: scratch space on summit is deleted after 90 days of disuse with no warning!!!!