Understanding Code Scalability

Overview

Teaching: 0 min
Exercises: 0 min
Questions
  • What is code scalability?

  • Why is code scalability important?

  • How can I measure how long code takes to run?

Objectives
  • Describe why code scalability is important when using HPC resources.

  • Explain the difference between wall time and CPU time.

  • Describe the differences between strong and weak scaling.

  • Summarise the dangers of premature optimisation.

When we submit a job to a cluster that runs our code, we have the option of specifying the number of CPUs (and in some cases GPUs) that will be allocated to the job. We need to consider to what extent that code is scalable with regards to how it uses these resources, to avoid the risk of consuming more resources than can be effectively used. As part of the application process for having new code installed on DiRAC, its scalability characteristics need to be measured. This helps inform how best to assign CPU resources when configuring jobs to run with that code.

There are two primary measures of execution time we need to consider for any given code:

How can we Characterise a Code’s Scalability?

Before we consider running and using code on an HPC resource, we need to understand it’s scaling profile - so we can determine how the code will scale as we add more CPU cores to running it. That way, when we run code we can request a suitable amount of resources with minimal waste. There are two types of scaling profile we need to determine:

Once we understand these scaling profiles for our code, we’ll have an idea of the speedup capable when using multiple cores. These measurements give us good indications for how our code should be specified on DiRAC, in terms of the overall job size and the amount of resources that should be requested.

I’m a Developer, Should I Optimise my Code?

As a developer, if your code happens to take too long to run or scales badly it’s tempting to dive in and try to optimise it straight away. But before you do, consider the following three rules of optimisation:

  1. Don’t,
  2. Don’t… yet, and,
  3. If you must optimise your code, profile it first.

In non-trivial cases premature optimisation is regarded as bad practice, since optimisation may lead to additional code complexity, incorrect results and reduced readability, making the code harder to understand and maintain. It is often effort-intensive, and difficult at a low level, particularly with modern compilers and interpreters, to improve on or anticipate the optimisations they already implement. A general maxim is to focus on writing understandable code and getting things working first - the former helps with the latter. Then, once strong and weak scaling profiles have been measured, if optimisation is justified you can profile your code, and work out where the majority of time is being spent and how best to optimise it.

So what is profiling? Profiling your code is all about understanding its complexity and performance characteristics. The usual intent of profiling is to work out how best to optimise your code to improve its performance in some way, typically in terms of speedup or memory and disk usage. In particular, profiling helps identify where bottlenecks exist in your code, and helps avoid summary judgments and guesses which will often lead to unnecessary optimisations.

Profilers

Each programming language will typically offer some open-source and/or free tools on the web, with you can use to profile your code. Here are some examples of tools. Note though, depending on the nature of the language of choice, the results can be hard or easy to interpret. In the following we will only list open and free tools:

Donald Knuth said “we should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” In short, optimise the obvious trivial things, but avoid non-trivial optimisations until you’ve understood what needs to change. Optimisation is often difficult and time consuming. Pre-mature optimization may be a waste of your time!

Key Points

  • To make efficient use of parallel computing resources, code needs to be scalable.

  • Before using new code on DiRAC, it’s strong and weak scalability profiles has to be measured.

  • Strong scaling is how the solution time varies with the number of processors for a fixed problem size.

  • Weak scaling is how the solution time varies with the number of processors for a fixed problem size for each processor.

  • Strong and weak scaling measurements provide good indications for how jobs should be configured to use resources.

  • Always profile your code to determine bottlenecks before attempting any non-trivial optimisations.