Introduction to Job Scheduling
The scheduler handles how compute resources are shared between users.
A job is just a shell script.
Use sbatch , squeue , and scancel commands to run, monitor, and cancel jobs respectively.
Request slightly more resources than you will need.
Running Example Code on a Cluster
HPC systems typically use modules to explicitly demarcate dependencies between needed software packages.
Use module avail to see what software modules you can use on many HPC systems.
Use module load to make specific software modules we need to run or compile our code accessible for use.
Use a login node responsibly by not running anything that requires too many cores, memory, or time to run.
Be sure to specify enough (but not too many!) cores your program needs.
Understanding Code Scalability
To make efficient use of parallel computing resources, code needs to be scalable.
Before using new code on DiRAC, it’s strong and weak scalability profiles has to be measured.
Strong scaling is how the solution time varies with the number of processors for a fixed problem size.
Weak scaling is how the solution time varies with the number of processors for a fixed problem size for each processor.
Strong and weak scaling measurements provide good indications for how jobs should be configured to use resources.
Always profile your code to determine bottlenecks before attempting any non-trivial optimisations.
Scalability Profiling
We can use Amdahl’s Law to understand the expected speedup of a parallelised program against multiple cores.
It’s often difficult to estimate the proportion of serial code in our programs, but a reformulation of Amdahl’s Law can give us this based on multiple runs against a different number of cores.
Run timings for serial code can vary due to a number of factors such as overall system load and accessing shared resources such as bulk storage.
The Message Passing Interface (MPI) standard is a common way to parallelise code and is available on many platforms and HPC systems, including DiRAC.
When calculating a strong scaling profile, the additional benefit of adding cores decreases as the number of cores increases.
The limitation of strong scaling is the fixed problem size, and we can increase the problem size with the core count to obtain a weak scaling profile.
key word 1
: explanation 1
key word 2
: explanation 2