Why Use a Cluster?

Overview

Teaching: 25 min
Exercises: 5 min
Questions
  • Why would I be interested in High Performance Computing (HPC)?

  • What can I expect to learn from this course?

Objectives
  • Be able to describe what an HPC system is.

  • Identify how an HPC system could benefit you.

Why Use These Computers?

What do you need?

How does computing help you do your research? How could more computing help you do more or better research?

Frequently, research problems that use computing can outgrow the desktop or laptop computer where they started:

In all these cases, what is needed is access to more computers that can be used at the same time. Luckily, large scale computing systems — shared computing resources with lots of computers — are available at many universities, labs, or through national networks. These resources usually have more central processing units (CPUs), CPUs that operate at higher speeds, more memory, more storage, and faster connections with other computer systems. They are frequently called “clusters”, “supercomputers” or resources for “high performance computing” or HPC. In this lesson, we will usually use the terminology of HPC and HPC cluster.

Using a cluster often has the following advantages for researchers:

This is how a large-scale compute system like a cluster can help solve problems like those listed at the start of the lesson.

Thinking ahead

How do you think using a large-scale computing system will be different from using your laptop? Think about some differences you may already know about, and some differences/difficulties you imagine you may run into.

The Command Line

Using HPC systems often involves the use of a shell through a command line interface (CLI) and either specialized software or programming techniques. The shell is a program with the special role of having the job of running other programs rather than doing calculations or similar tasks itself. What the user types goes into the shell, which then figures out what commands to run and orders the computer to execute them. (Note that the shell is called “the shell” because it encloses the operating system in order to hide some of its complexity and make it simpler to interact with.) The most popular Unix shell is Bash, the Bourne Again SHell (so-called because it’s derived from a shell written by Stephen Bourne). Bash is the default shell on most modern implementations of Unix and in most packages that provide Unix-like tools for Windows.

Interacting with the shell is done via a CLI on most HPC systems. In the earliest days of computing, the only way to interact with early computers was to rewire them. From the 1950s to the 1980s most people used line printers. These devices only allowed input and output of the letters, numbers, and punctuation found on a standard keyboard, so programming languages and software interfaces had to be designed around that constraint and text-based interfaces were the way to do this. A typing-based interface is often called a command-line interface to distinguish it from a graphical user interface, or GUI, which most people now use. The heart of a CLI is a read-evaluate-print loop, or REPL: when the user types a command and then presses the Enter (or Return) key, the computer reads it, executes it, and prints its output. The user then types another command, and so on until the user logs off.

Learning to use Bash or any other shell sometimes feels more like programming than like using a mouse. Commands are terse (often only a couple of characters long), their names are frequently cryptic, and their output is lines of text rather than something visual like a graph. However, using a command line interface can be extremely powerful, and learning how to use one will allow you to reap the benefits described above.

The rest of this lesson

The only way to use HPC resources like DiRAC is by learning to use the command line. This introduction to HPC systems has two parts:

The skills we learn here have other uses beyond just HPC: Bash and UNIX skills are used everywhere, be it for web development, running software, or operating servers. It’s become so essential that Microsoft now ships it as part of Windows! Knowing how to use Bash and HPC systems will allow you to operate virtually any modern device. With all of this in mind, let’s connect to a cluster and get started!

Key Points

  • High Performance Computing (HPC) typically involves connecting to very large computing systems elsewhere in the world.

  • These HPC systems can be used to do work that would either be impossible or much slower or smaller systems.

  • The standard method of interacting with such systems is via a command line interface such as Bash.