User Tools

Site Tools


wiki:supercomputing2

USING SUMMIT

We will walk through the use of the Summit Supercomuputer in three steps:

  1. Logging Into Summit on the Login Node
  2. Submitting Jobs to a Compute Node

1. Logging Into Summit on the Login Node

To log into the summit, we will use the command line and the command secure shell.

ssh usage
ssh <addressOfRemoteServer>
ssh [-l <yourloginname>] <addressOfRemoteServer> #that's a lower case “L”

:!: Exercise: Log into summit:

ssh csu-eID@colostate.edu login.rc.colorado.edu
# Provide eIDpassword,DUOkey
# OR 
# Provide eIDpassword,push

:!: Exercise: Explore summit:

$ whoami
$ hostname
$ pwd
$ ls
$ ls -alh
$ more README.mdwn

As a summit user, you have certain directories already set aside for your use:

summit directories

:!: Exercise: Let's explore your summit directories… <yourusernamehere> - input your username. If you have a colostate.edu address, this will be “eID@colostate.edu”

$ cd /scratch/summit/<yourusernamehere>
$ ls
$ more README.mdwn

:!: Exercise: We will be working today in the scratch space, navigate to your scratch space..

$ cd /projects/<yourusernamehere>
$ ls
$ more README.mdwn

2. Submitting Jobs to a Compute Node

To submit jobs on our local linux machines, we used to type out the command at the prompt, push return, and the job would start executing immediately. This is not how things work on the cluster. Submitting jobs has four main steps:

Step 2A. make sure you are on a login node or compile node.
Step 2B. make sure you are in the right directory
Step 2C. write a little script
Step 2D. execute the script with using the slurm sbatch command

Step2A. make sure you are on a login node or compile node

We will submit jobs from the compile node. This seems to work best for me. slurm is installed on the scompile node. It is not installed by default on the login node. If you want to submit jobs from the login node you need to install slurm using modules (see summit wiki pages).

$ ssh scompile
$ hostname #on summit, compile nodes are shas0136 shas0137

Step 2B. make sure you are in the right directory

We will do our work in your projects directory.

$ cd /projects/<youreidhere@colostate.edu
$ pwd
$ ls

Step 2C. write a little script

On the cluster, we will put our jobs in a script. We haven't covered this yet, but scripts are going to be the heart of this course from here on out. We will be using the bash scripting language. In bash, any command you can type into the prompt can be saved in a textfile and executed.

:!: Exercise Examples of little scripts

Step 2D. execute the script with using the slurm ''sbatch'' command

We will send our script to a job scheduling program slurm.

Slurm will use our requests for the number of nodes and processors we want to use and assign us to compute node(s) and processors (aka cores) where the job will run. Depending on the type of hardware we want, we may need to be patient and wait until the hardware is available for use. While we are waiting for our job to start, we will be put into a 'queue'. Summit uses a fair use queue system in which your place in the queue is a function of (1) when you submitted the job, (2) what resources you have requested, and (3) your use of the system.

Slurm is loaded up on the compile nodes already. (If you are on the login node, you need to load it as a module as slurm/summit).


Using SLURM

The slurm software has a number of commands you can use:

$ sbatch <shell script>  #submit a job
$ squeue   #check all jobs that are running
$ squeue -u $USER #check just my jobs that are running
$ squeue -j <enterJobNumber> #check just this job
$ scancel -j <enterJobNumber> #cancel just this job

:!: 1) Example: submitting a job using slurm. Let's make a job to run.

  • Copy the following text into a shell script and name it printHelloWorld.sh
#!/usr/bin/bash

# print out a friendly message
echo "Hello World!"

# rest for a little bit (15 seconds)
sleep 15

# print out the machine name
hostname
  • Execute the script using slurm like so:
$ sbatch printHelloWorld.sh
  • What just happened?
  • Can you find an output file? What is in the output file?

:!: 2) Example: submitting a slurm job with more options

We can add more options to our sbatch command:

List of slurm options

$ sbatch --nodes=1 --ntasks=1 --partition=shas --qos=normal --time=0:01:00 --output=helloWorld_output.txt printHelloWorld.sh

That's getting pretty crazy pretty fast. Instead of doing this, we can append the options inside the printHelloWorld.sh script. In this case, the printHelloWorld.sh script would look like this..

#!/usr/bin/bash

#SBATCH --job-name=helloWorld
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --partition=shas
#SBATCH --qos=normal
#SBATCH --time=0:01:00
#SBATCH --output=hello_world_output_%j.txt

# print out a friendly message
echo "Hello World!"

# rest for a little bit
sleep 30

# print out the machine name
hostname

What does all this stuff mean?

#SBATCH --job-name=helloWorld    #We want to name the job "helloWorld"
#SBATCH --nodes=1                #We request to use 1 node (computer)
#SBATCH --ntasks=1               #We request to use 1 core
#SBATCH --partition=shas         #We want to use a Haswell compute node
#SBATCH --qos=normal             #We want to be in a normal queue
#SBATCH --time=0:01:00           #We expect this job should take at most a minute. 
#SBATCH --output=hello_world_output_%j.txt       #We want to name any output files "hello_world_output_%j.txt" where <%j> will input the job's number in the name.

The new code could be sutmitted as a job like this…

$ sbatch printHelloWorld.sh

Find the output now.

For more information about sbatch commands, see job submission and slurm .

:!: example. Here is an example of how to use the testing quality of service:

#SBATCH --job-name=helloWorld    #We want to name the job "helloWorld"
#SBATCH --nodes=1                #We request to use 1 node (computer)
#SBATCH --ntasks=1               #We request to use 1 core
#SBATCH --partition=shas-testing #We want to use a Haswell compute node for testing
#SBATCH --qos=testing            #We want to be in a testing queue
#SBATCH --time=0:01:00           #We expect this job should take at most a minute. 
#SBATCH --output=hello_world_output_%j.txt       #We want to name any output files "hello_world_output_%j.txt" where <%j> will input the job's number in the name.

:!: Independent Exercise:

  • Write a shell script called startProject.sh that does the following:
    • makes a directory called 00_README
    • makes a directory called 01_INPUT
    • makes a directory called 02_SCRIPTS
    • makes a directory called 03_OUTPUT
    • makes a file in 00_README called readme_notes.txt
  • Write #SBATCH preambles for your startProject.sh shell script that does the following:
    • names the job project
    • requests 1 node
    • requests 1 ntask
    • requests a shas-testing partition
    • requests a testing quality of service (qos)
    • expects a time of 0:02:00
    • specifies an output file with the job ID in the title
  • Execute your job using the command sbatch startProject.sh

Parallel processing

wiki/supercomputing2.txt · Last modified: 2018/08/30 09:38 by erin