User Tools

Site Tools


Tidying Up Projects

  • Quiz

When completing an analysis or pausing a project, it is good practice to clean up your projects by:

  • Deleting any temporary or unnecessary files.
  • Compressing any large files.
  • Saving all data to a local (or backup) space.

:!: Common pitfall: Data is deleted from scratch space after 90 days of having no new modifications.

Let's clean up

We will clean up the project the project we used as a demo in the last class session. This was the project PROJ06_yeastDemo2 and we did that demonstration on November, 29, 2018.

:!: NOTEBOOK EXERCISE: Write in your notebook that your plan for today is to clean up the data you generated for PROJ06_yeastDemo2 on Nov. 29, 2018.

OK, now log into summit:

# Log into summit
$ ssh -l <eID>
# switch to scompile
$ ssh scompile
#If you want to, make your alias to scheck here
$ alias scheck='squeue -u $USER'

Navigate to the space where you want performed your yeast demo last class session.

For me, this space is: /scratch/summit/<eID>

$ cd /scratch/summit/<eID>  # Change to your location

Let's explore this project. The output was saved in 03_output and the scripts were written in 02_scripts

If you copied all the shell scripts from our github templates over to this directory, you should have a clean up script located in this file already.

$ ls -1

Log onto Cyberduck and open the cleanup script for editing.

:!: EXERCISE: Let's hack this code.

All you need to do is change the date section to specify the date listed on the output folder that you want to clean up:

Change this section to this:

#This is the output_directory:
#DATE=`date +%Y-%m-%d`

Save the modified cleanup script.

:!: EXERCISE: Let's execute this code.

Using either nano or cyberduck, edit your original execution script called

  • Change sbatch settings
  • Comment out the analyzer command
  • Edit the cleanup command
#!/usr/bin/env bash
#SBATCH --job-name=test_RNAseq_pipeline 
#SBATCH --nodes=1
#SBATCH --ntasks=1      # modify this number to reflect how many cores you want to use (up to 24)
#SBATCH --partition=shas
#SBATCH --qos=normal     # modify this to reflect which queue you want to use. Options are 'normal' and 'testing'
#SBATCH --time=1:00:00   # modify this to reflect how long to let the job go. 
#SBATCH --output=log_RNAseq_pipe_%j.txt
source /scratch/summit/<eID>
## execute the RNA-seq_pipeline
#bash ../01_input/metadata_aceticAcid_subset.txt $SLURM_NTASKS
     # modify the SECOND argument to point to YOUR metadata.file
     # modify the THIRD argument to indicate the number of THREADS you 
     # want to use. This number must match the number in #SBATCH --ntasks=#
## clean up by zipping .fastq files and deleting extra files
bash ../01_input/metadata_aceticAcid_subset.txt
     # modify the SECOND argument to point to YOUR metadata.file

Let's back up

Well, actually we won't back up now, but we'll talk about it.

To move your items off of summit, you can:

  1. Use cyberduck or an ftp site to drag and drop the data onto your local computer.
  2. Use rsync and set up an automated script.
  3. Use backup software.

For more information about writing rsync script to backup and/or sync your files:

Backing up and Syncing with rsync

:!: EXERCISE: After the class, save your course content to date somewhere. Keep track of what you did in your Course notebook.

:!: NOTEBOOK EXERCISE: Write in your notebook that you synced your course content somewhere. Write how you did it (cyberduck, drag and drop, or rsync).

:!: COMMON PITFALL: scratch space on summit is deleted after 90 days of disuse with no warning!!!!

Intro to R

wiki/tidyingprojects.txt · Last modified: 2018/12/04 09:49 by erin