User Tools

Site Tools


Software Installation on SUMMIT

We will do most of our work in this class on the SUMMIT supercomputer system. We are doing this for a few reasons…

  1. Everyone will be on the same platform (no Mac v. PC issues)
  2. We can do things really fast by using the power of the supercomputer system
  3. You will be trained and ready to go on SUMMIT for your large-scale projects!

SUMMIT is a joint venture between Colorado State University and CU Boulder and is sponsored by those institutes and by a grant from the National Science Foundation.

However, please keep in mind that you can also do most of these analyses on your own desktop or laptop, too. For MAC people, you can install the same software using conda. For PC people, it may be a bit more challenging.

Today, we will do the following together:

  1. Log into SUMMIT
  2. Initiate a “conda virtual environment” where we can install RNA-seq software
  3. Install software within our conda environment

1. Logging into SUMMIT

:!: Exercise: Together, we will log into SUMMIT through the JupyterHub system:

  • Open Jupyterhub in a new tab…
  • Log into the system. As your Username use your CSU eID e-mail address. As your password, use your CSU Password, a comma, and the word push. For example, Tony Stark would sign in like so…

  • Click on Launch Server
  • Select the job profile “Summit interactive (12 hr)”, then click Spawn

:-) Yay! you should be online.

  • Go ahead and open a terminal:

  • Explore where you are:
$ whoami
$ pwd
$ hostname

2. Initiate a virtual conda environment

For this class, we will set up and use a conda virtual environment. What is this? conda is an open-source, cross-platform package manager. conda allows users to install and keep track of software within a LINUX environment. conda can keep track of all the dependencies each software requires. It will also keep track of different versions of software.

Software installations can be organized into multiple virtual environments each with their own combinations of installed software of different versions. In this way, you can switch between different environments for different projects. This is one of the many ways the community is trying to make research more reproducible.

As the name implies, conda works intimately with python.

:!: Exercise: First, let's ensure that conda user settings are installed in our home directory. NOTE: You'll only need to do this once. If you took DSCI510, you may have already done this, but let's just double-check as a group to ensure we're all on the same page.

  • Switch from the login node to a compile node where we can compile software. Do this like so…
$ ssh scompile
$ hostname
  • You should see a hostname of shas136 or shas137. These are the names of the SUMMIT compile node computers.
  • Next, let's test whether you have previously set up your conda environment within our home directory. Again,if you took DSCI510, this step may already be completed.
  • To test whether your conda environment is specified, ensure you are in your home directory and read what is written in the document .condarc or whether that document even exists…
$ pwd           # you should be /home/
$ ls -alh       # you may see a file called .condarc
$ more .condarc 
  - /projects/$USER/.conda_pkgs
  - /projects/$USER/software/anaconda/envs
# If you see those four lines of code, you're all set up! You may also see additional lines, that is fine.
  • If you saw four lines of code within the file .condarc that specify where your package directories should be stored and where your environmental directories should be stored, you're golden! Just wait.
  • If you didn't see those four lines of code OR if you don't have a file called .condarc, please do the next step:
$ nano .condarc
# then copy and paste the following in:
  - /projects/$USER/.conda_pkgs
  - /projects/$USER/software/anaconda/envs
# Exit out of nano using 
# CTRL + S
# Type Y
# Return
$ more .condarc     # do this to check your .condarc file

:-) Yay! If you now have a .condarc file and see the four lines of code within it specifying your package and environmental paths, you are good to go! You won't need to do this step again on SUMMIT.

:!: Exercise: Next, we'll activate and explore conda

To activate conda:

$ source /curc/sw/anaconda3/latest
$ conda deactivate
$ conda init
(base) [ ~]$

We can list all the virtual conda environments we can currently load:

$ conda env list
base                   * /curc/sw/anaconda3/2019.03
globus                   /curc/sw/anaconda3/2019.03/envs/globus
idp                      /curc/sw/anaconda3/2019.03/envs/idp
jupyterhub               /curc/sw/anaconda3/2019.03/envs/jupyterhub
py3.8                    /projects/

The output shows us the default environments that the personnel at CU Boulder have kindly initiated for us to use. The one we are currently using is marked by an asterisk. Note also, that (base) shows up before your prompt… another indication that conda is active and working. Further note, the py3.8 environment is a custom environment we started in the DSCI510 class.

:!: Exercise: Let's build our own custom environment

We want to build a custom virtual environment for this class. To do so…

$ hostname.      # Ensure first that you're on an scompile node. It should say shas136 or shas137
$ conda create -n dsci512 python==3.8
$ conda env list

You should now see a new virtual environment has appeared called dsci512

To navigate into your new environment, do this…

$ conda activate dsci512
$ conda env list # This shows you which environments are available and selected
$ conda list  # This shows the software currently installed in your active environment

:-) - Yay! you should have your environment dsci512 installed and activated.

:!: Exercise: Let's install software. For this class, we will need the software packages: fastp, bwa, hisat2, bedtools, and samtools

  • First, let's make sure we have access to the source-forge repository of software (online):
$ conda config --add channels conda-forge
# If you get a warning, that's ok
  • Next, go ahead and install the software we need:
$ conda install -c bioconda fastp bwa hisat2 bedtools samtools
  • You will be prompted whether you want to install the dependency packages. Type y
  • Go ahead and see whether the software you requested was installed successfully. If installed successfully, you should see the usage descriptions. If they weren't installed successfully, you will get an error message.
$ fastp
$ bwa
$ hisat2
$ bedtools
$ samtools
$ conda list
  • Likely, you were able to install everything with the exception of hisat2. There is a coding error in hisat2 that doesn't like the '@' symbol in your user name. So that's a bit of a bug. David wrote some code to work around this.
$ cp /projects/ .
$ bash fix_CSU.bash
$ hisat2
  • If you see the below error message, that's OK. As long as you see the other user info, it should work.
  -h/--help          print this usage message
(ERR): hisat2-align exited with value 1

:!: Yay! You have your conda environment successfully installed and activated

3. conda cheat sheet

:?: OK, next time I start up, what do I need to do?

  • Next time you log into SUMMIT, first check whether conda has started on its own.
$ ssh scompile
 (base) [ ~]$ 
  • If you don't see that '(base)' tag, initiate conda…
$ source /curc/sw/anaconda3/latest
  • If you do see the '(base)' tag, to ahead and activate your preferred environment:
$ conda activate dsci512
$ bwa
  • Now you are ready to start working with any of your already-installed programs. You won't need re-install bwa, fast, etc. They should just work.
  • If you want to install something new at this point into dsci512, just run this one line of code:
$ conda install -c bioconda <software_name_here>
  • There are over 7,000 packages ready to install. To search packages bioconda
  • If you run into problems using conda or installing software, can assist you.

4. References for today

wiki/softwareinstall.txt · Last modified: 2021/06/01 15:06 (external edit)