User Tools

Site Tools


Downloading, installing, and running Bioinformatics software on Summit

:!: Info only. NOT on the test.

Software on your home computer or laptop tends to come pre-compiled as binaries. Some bioinformatics programs have this option, but others must be installed at the source code level and compiled for your computer.

Software availability for open source applications, from easiest to hardest:

  • Precompiled binary - just download and run. (Not always feasible or available).
  • Installation manager - might have to compile, but often checks for dependencies and installs them.
  • Source - You have to compile yourself. You have to find dependencies and (probably) also compile them.

An example was how we just downloaded faFrag and ran it. We had to do some extra work to extract the necessary columns from the GFF to work with it, though.

Compiled software

Compiling a c++ suite of programs: bedtools

There is a utility in bedtools that works with GFF. But bedtools must be compiled. The page here gives the following instructions:

$ wget
$ tar -zxvf bedtools-2.25.0.tar.gz
$ cd bedtools2
$ make

It mentions using package managers for linux, but we can't do that because we're not administrators, and the current policy is to do local installation.

If you try to compile this code (by typing make) there will be an error, but this is because of the module system on summit and is not a showstopper.

You must do

$ module load intel
$ make

This will succeed because intel is the compiler that bedtools needs, and there don't appear to be other dependancies. The compiled programs are now in a subdirectory called bin.

We may have written our pipeline using the following program:

$ bin/fastaFromBed 

Tool:    bedtools getfasta (aka fastaFromBed)
Version: v2.27.1-1-gb87c465
Summary: Extract DNA sequences from a fasta file based on feature coordinates.

Usage:   bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf>

	-fi	Input FASTA file
	-fo	Output file (opt., default is STDOUT
	-bed	BED/GFF/VCF file of ranges to extract from -fi
	-name	Use the name field for the FASTA header
	-name+	Use the name field and coordinates for the FASTA header
	-split	given BED12 fmt., extract and concatenate the sequences
		from the BED "blocks" (e.g., exons)
	-tab	Write output in TAB delimited format.
		- Default is FASTA format.

	-s	Force strandedness. If the feature occupies the antisense,
		strand, the sequence will be reverse complemented.
		- By default, strand information is ignored.

	-fullHeader	Use full fasta header.
		- By default, only the word before the first space or tab 
		is used.

You must always run module load intel in any session before you can run these programs though, because the environment must match that which it was compiled under.

You may add the directory referred to by bedtools/bin to your PATH variable, or copy them into your ~/bin.

More complex compilation: ./configure and make

Some software suites are more complex, and require that you run a configuration script before compiling. These packages tend to follow something called the GNU build system, and typically ship with a lot of files. One such file, all caps, INSTALL, will tell you how to go about compilation and what common dependencies need to be met.

It also ships with a script called configure which will:

  • run code to detect missing dependencies
  • prepare set the installation destination

Since we can't install to system directories on summit, we have to install to our home directory or to our project space. Therefore, we must provide an extra argument to configure:

<bash code> $ ./configure –prefix=$HOME </code>

This will run the check on the system and make it so you can install locally. If it is successful, you continue with: <bash code> $ make </code>

If this is successful, the following will install into your $HOME/bin (and possibly make other directories if needed).

$ make install

Python modules

Some python tools need modules that are downloaded and installed. Examples are SciPy and numpy, which contain special code that you can use in your script, or may be required by a bioinformatics tool such as deeptools, picardtools, or MACS2.

As with compiled programs, python installation assumes system level access that you must bypass and put into your own directory.

On summit, you must load python as a module. To see what versions are available, do

$ module spider python
      Python Programming Language


  For detailed information about a specific "python" module (including how to load the modules) use the module's full name.
  For example:

     $ module spider python/3.5.1

Python versioning

The first number: 2 versus 3, is called the MAJOR version. Major versions are not expected to be compatible with one another. You have to know which one your script needs before loading the correct module.

If a script needs python 2.7

$ module load python/2.7.11

If it needs python 3

$ module load python/3.5.1

Installing From source

First: You must use the module command to load the correct version.

You would download the source, untar it, and cd to the directory. Then, the python equivalent to ./configure and make is

$ python --prefix=$HOME

You may get an error message about the destination not being in your PYTHONPATH. Treat the PYTHONPATH like your PATH variable. Set it inside your ~/.bash_profile and add the directory that is expecting

Installing From pip

As above: You must use the module command to load the correct version.

The program pip is a python package installer/manager. You should be able to use pip without making any directories or uncompressing any source. It should do all of this for you, out of the way.

$ pip install package --prefix=$HOME
wiki/2018bioinformatics_software_summit.txt · Last modified: 2018/09/10 15:58 by david