This lesson is in the early stages of development (Alpha version)

Principles of Code Scaling: Glossary

Key Points

Introduction to Job Scheduling
  • The scheduler handles how compute resources are shared between users.

  • A job is just a shell script.

  • Use sbatch, squeue, and scancel commands to run, monitor, and cancel jobs respectively.

  • Request slightly more resources than you will need.

Running Example Code on a Cluster
  • HPC systems typically use modules to explicitly demarcate dependencies between needed software packages.

  • Use module avail to see what software modules you can use on many HPC systems.

  • Use module load to make specific software modules we need to run or compile our code accessible for use.

  • Use a login node responsibly by not running anything that requires too many cores, memory, or time to run.

  • Be sure to specify enough (but not too many!) cores your program needs.

Understanding Code Scalability
  • To make efficient use of parallel computing resources, code needs to be scalable.

  • Before using new code on DiRAC, it’s strong and weak scalability profiles has to be measured.

  • Strong scaling is how the solution time varies with the number of processors for a fixed problem size.

  • Weak scaling is how the solution time varies with the number of processors for a fixed problem size for each processor.

  • Strong and weak scaling measurements provide good indications for how jobs should be configured to use resources.

  • Always profile your code to determine bottlenecks before attempting any non-trivial optimisations.

Scalability Profiling
  • We can use Amdahl’s Law to understand the expected speedup of a parallelised program against multiple cores.

  • It’s often difficult to estimate the proportion of serial code in our programs, but a reformulation of Amdahl’s Law can give us this based on multiple runs against a different number of cores.

  • Run timings for serial code can vary due to a number of factors such as overall system load and accessing shared resources such as bulk storage.

  • The Message Passing Interface (MPI) standard is a common way to parallelise code and is available on many platforms and HPC systems, including DiRAC.

  • When calculating a strong scaling profile, the additional benefit of adding cores decreases as the number of cores increases.

  • The limitation of strong scaling is the fixed problem size, and we can increase the problem size with the core count to obtain a weak scaling profile.

Survey

Glossary

The glossary would go here, formatted as:

{:auto_ids}
key word 1
:   explanation 1

key word 2
:   explanation 2

({:auto_ids} is needed at the start so that Jekyll will automatically generate a unique ID for each item to allow other pages to hyperlink to specific glossary entries.) This renders as:

key word 1
explanation 1
key word 2
explanation 2