Supercomputing, High Performance Computing (HPC), or Computing Clusters all refer to computations done on a set of interconnected processors that work together such that they can be viewed as a single computer. These High Performance Computing Clusters work at a much higher level of performance than a stand-alone computer. They are typically used for research purposes. They almost always run on a linux operating system. The top 500 compute clusters in the world are listed on www.top500.org . Today we'll be working on the Summit cluster system that is shared between universities and public institutes on the Front Range.
Today we will be using the Summit Supercomputer system.
What is Summit? Summit is a High Performance Computer system that is a joint venture between Colorado State University (CSU) and the University of Colorado Boulder(CU). Summit is housed at, and operated by, CU IT staff.
Who pays for Summit? The project is a $3.55 million venture funded by a $2.7 million award from the National Science Foundation and the remainder being supported by CSU, CU Boulder and other regional universities and institutions.
How much can I use Summit? The first year, you allocated an Initial Allocation up to 50,000 Service Units (SU). SU's equal one hour of use on one node. After either 50,000 SU's have been used or one year has passed, you can apply for a Project Allocation. If your lab is using Summit a lot, your lab can buy into Summit for higher allocations, longer job runs, and priority in the queue. For more information see Summit CSU Documentation
What are the alternatives to Summit? There are many smaller servers available for use in different departments or labs that have been paid for and are maintained by those smaller groups. Alternatively, Amazon Cloud is available as a cloud-only service that offers computing power for purchase.
What did you look for when you bought your last computer? CPU performance? Memory? Hard drive space? Super computers have all these same features, but at a higher scale. The supercomputer's power comes from the fact that it consists of many computers linked together in such a way that they can share jobs.
Here is a comparison of Erin's computer's specs versus the Summit High Performance Computer's specs:
|# “computers”||1||488 nodes|
|# CPU cores||4 cores||12,632 cores|
|memory||16 GB||70.8 TB (70,800 GB)|
|storage||512 GB||1.2 petabytes (1,200,000 GB)|
A computing cluster is one large computer made up of many smaller computers. Each smaller computer unit within a cluster is called a node. The Summit system contains different types of nodes.
How do I send a job to the compute nodes? Sending a job to the compute nodes requires a Workload Manager (sometimes called a Job Scheduler). We will be learning to use the slurm workload manager.
How do all these nodes (computers) communicate with one another? Because jobs can be shared across different processors (cores) or across different computers (nodes) the connection between these types of hardware takes on a more significant role than in a stand-alone computer. The Summit system uses Intel's OmniPath interconnect system. It's very fancy!
What are the benefits of using a compute cluster versus using my local computer?
AAAHAHAAHAHHHHH!!!! This is crazy. How can I possibly sort through all of this?