Introduction to HPC Job Scheduling

Self-learning course

Site Updated On: August 05, 2024
For More Info Email: richard.regan@durham.ac.uk
Introduction to HPC Job Scheduling

General Information

Requirements: Participants must have access to a computer with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below).

Accessibility:

We are dedicated to providing a positive and accessible learning environment for all. Please get in touch you require any accommodations or if there is anything we can do to make this lesson more accessible to you.

Contact: Please email or richard.regan@durham.ac.uk for more information.


Surveys

Please be sure to complete this survey after the lesson.

Please input the date as the date you started the materials.

Post-Lesson Survey


Lesson Outline

To conduct research using any DiRAC or HPC facility, users need to use batch jobs to fairly share the available resources. It is therefore essential that users understand how a batch system like SLURM works and how to use it effectvely. This course will give a user the skills to run codes on the DiRAC system, i.e. an understanding of the batch queuing system, the role of job scripts and how to make sure that the runtime environment used by the job is correct.

Learning Outcomes

Following this course, the learner will be able to:


Schedule

1. Introduction to HPC Job Scheduling What is a job scheduler and why does a cluster need one?
How do I find out what parameters to use for my Slurm job?
How do I submit a Slurm job?
What are DiRAC project allocations and how do they work?
2. Job Submission and Management How do I request specific resources to use for a job?
What is the life cycle of a job?
What can I do to specify how my job will run?
How can I find out the status of my running or completed jobs?
3. Accessing System Resources using Modules How can I make use of HPC system resources such as compilers, libraries, and other tools?
What are HPC modules, and how do I use them?
4. Using Different Job Types What types of job can I run on HPC systems?
How do I run a job that uses a node exclusively?
How can I submit an OpenMP job that makes use of multiple threads within a single CPU?
How do I submit a job that uses Message Passing Interface (MPI) parallelisation?
How can I submit the same job many times with different inputs?
How can I interactively debug a running job?
5. Survey
Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.


Setup

To participate in this lesson, you will need access to software as described below. In addition, you will need an up-to-date web browser.

The instructions for all the software can be found on the setup page.