We will walk through the use of the Summit Supercomuputer in three steps:
To log into the summit, we will use the command line and the command secure shell.
ssh [-l <yourloginname>] <addressOfRemoteServer> #that's a lower case “L”
Exercise: Log into summit:
ssh csu-eID@colostate.edu login.rc.colorado.edu # Provide eIDpassword,DUOkey # OR # Provide eIDpassword,push
Exercise: Explore summit:
$ whoami $ hostname $ pwd $ ls $ ls -alh $ more README.mdwn
As a summit user, you have certain directories already set aside for your use:
Exercise: Let's explore your summit directories… <yourusernamehere> - input your username. If you have a colostate.edu address, this will be “eID@colostate.edu”
$ cd /scratch/summit/<yourusernamehere> $ ls $ more README.mdwn
Exercise: We will be working today in the scratch space, navigate to your scratch space..
$ cd /projects/<yourusernamehere> $ ls $ more README.mdwn
To submit jobs on our local linux machines, we used to type out the command at the prompt, push return, and the job would start executing immediately. This is not how things work on the cluster. Submitting jobs has four main steps:
Step 2A. make sure you are on a login node or compile node.
Step 2B. make sure you are in the right directory
Step 2C. write a little script
Step 2D. execute the script with using the slurm
We will submit jobs from the compile node. This seems to work best for me. slurm is installed on the scompile node. It is not installed by default on the login node. If you want to submit jobs from the login node you need to install slurm using modules (see summit wiki pages).
$ ssh scompile $ hostname #on summit, compile nodes are shas0136 shas0137
We will do our work in your projects directory.
$ cd /projects/<email@example.com $ pwd $ ls
On the cluster, we will put our jobs in a script. We haven't covered this yet, but scripts are going to be the heart of this course from here on out. We will be using the
bash scripting language. In
bash, any command you can type into the prompt can be saved in a textfile and executed.
Exercise Examples of little scripts
We will send our script to a job scheduling program
Slurm will use our requests for the number of nodes and processors we want to use and assign us to compute node(s) and processors (aka cores) where the job will run. Depending on the type of hardware we want, we may need to be patient and wait until the hardware is available for use. While we are waiting for our job to start, we will be put into a 'queue'. Summit uses a fair use queue system in which your place in the queue is a function of (1) when you submitted the job, (2) what resources you have requested, and (3) your use of the system.
Slurm is loaded up on the compile nodes already. (If you are on the login node, you need to load it as a module as
The slurm software has a number of commands you can use:
$ sbatch <shell script> #submit a job $ squeue #check all jobs that are running $ squeue -u $USER #check just my jobs that are running $ squeue -j <enterJobNumber> #check just this job $ scancel -j <enterJobNumber> #cancel just this job
1) Example: submitting a job using slurm. Let's make a job to run.
#!/usr/bin/bash # print out a friendly message echo "Hello World!" # rest for a little bit (15 seconds) sleep 15 # print out the machine name hostname
$ sbatch printHelloWorld.sh
2) Example: submitting a slurm job with more options
We can add more options to our sbatch command:
$ sbatch --nodes=1 --ntasks=1 --partition=shas --qos=normal --time=0:01:00 --output=helloWorld_output.txt printHelloWorld.sh
That's getting pretty crazy pretty fast. Instead of doing this, we can append the options inside the
printHelloWorld.sh script. In this case, the
printHelloWorld.sh script would look like this..
#!/usr/bin/bash #SBATCH --job-name=helloWorld #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --partition=shas #SBATCH --qos=normal #SBATCH --time=0:01:00 #SBATCH --output=hello_world_output_%j.txt # print out a friendly message echo "Hello World!" # rest for a little bit sleep 30 # print out the machine name hostname
What does all this stuff mean?
#SBATCH --job-name=helloWorld #We want to name the job "helloWorld" #SBATCH --nodes=1 #We request to use 1 node (computer) #SBATCH --ntasks=1 #We request to use 1 core #SBATCH --partition=shas #We want to use a Haswell compute node #SBATCH --qos=normal #We want to be in a normal queue #SBATCH --time=0:01:00 #We expect this job should take at most a minute. #SBATCH --output=hello_world_output_%j.txt #We want to name any output files "hello_world_output_%j.txt" where <%j> will input the job's number in the name.
The new code could be sutmitted as a job like this…
$ sbatch printHelloWorld.sh
Find the output now.
example. Here is an example of how to use the
testing quality of service:
#SBATCH --job-name=helloWorld #We want to name the job "helloWorld" #SBATCH --nodes=1 #We request to use 1 node (computer) #SBATCH --ntasks=1 #We request to use 1 core #SBATCH --partition=shas-testing #We want to use a Haswell compute node for testing #SBATCH --qos=testing #We want to be in a testing queue #SBATCH --time=0:01:00 #We expect this job should take at most a minute. #SBATCH --output=hello_world_output_%j.txt #We want to name any output files "hello_world_output_%j.txt" where <%j> will input the job's number in the name.
startProject.shthat does the following:
startProject.shshell script that does the following:
testingquality of service (qos)