Reference – DiRAC Essentials Course (Pre-alpha)

Key Points

Lesson Schedule
Why Use a Cluster?	High Performance Computing (HPC) typically involves connecting to very large computing systems elsewhere in the world. These HPC systems can be used to do work that would either be impossible or much slower or smaller systems. The standard method of interacting with such systems is via a command line interface such as Bash.
Connecting to the remote HPC system	To connect to a remote HPC system using SSH and a password, run `ssh yourUsername@remote.computer.address`. To connect to a remote HPC system using SSH and an SSH key, run `ssh -i ~/.ssh/key_for_remote_computer yourUsername@remote.computer.address`. Protect your SSH keys by managing them carefully! 2-factor authentication is a way to help ensure a user’s identity by requiring two forms of identity evidence.
Moving around and looking at things	Your current directory is referred to as the working directory. To change directories, use `cd`. To view files, use `ls`. You can view help for a command with `man command` or `command --help`. Hit `tab` to autocomplete whatever you’re currently typing.
Writing and reading files	There are many different text editors available on DiRAC. Use `nano` to create or edit text files from a terminal. Use `cat file1 [file2 ...]` to print the contents of one or more files to the terminal. Use `mv old dir` to move a file or directory `old` to another directory `dir`. Use `mv old new` to rename a file or directory `old` to a `new` name. Use `cp old new` to copy a file under a new name or location. Use `cp old dir` copies a file `old` into a directory `dir`. Use `rm old` to delete (remove) a file. File extensions are entirely arbitrary on UNIX systems. Use `scp` to transfer files from and to a remote DiRAC resource. Use `tar` to de-archive and archive sets of numerous and/or large files.
Wildcards and pipes	The `*` wildcard is used as a placeholder to match any text that follows a pattern. Redirect a command’s output to a file with `>`. Commands can be chained with `\|`
Scripts, variables, and loops	A shell script is just a list of bash commands in a text file. To make a shell script file executable, run `chmod +x script.sh`.
Using Bash Scripts in Pipes	You can include your own Bash scripts in pipes. A common and useful pattern in Bash shell is to run a program or script that generates potentially a lot of output, then use pipes to filter out what you’re really after.
Reference
Lesson Schedule
Understanding Code Scalability	To make efficient use of parallel computing resources, code needs to be scalable. Before using new code on DiRAC, it’s strong and weak scalability profiles has to be measured. Strong scaling is how the solution time varies with the number of processors for a fixed problem size. Weak scaling is how the solution time varies with the number of processors for a fixed problem size for each processor. Strong and weak scaling measurements provide good indications for how jobs should be configured to use resources. Always profile your code to determine bottlenecks before attempting any non-trivial optimisations.
Scalability Profiling	We can use Amdahl’s Law to understand the expected speedup of a parallelised program against multiple cores. It’s often difficult to estimate the proportion of serial code in our programs, but a reformulation of Amdahl’s Law can give us this based on multiple runs against a different number of cores. Run timings for serial code can vary due to a number of factors such as overall system load and accessing shared resources such as bulk storage. The Message Passing Interface (MPI) standard is a common way to parallelise code and is available on many platforms and HPC systems, including DiRAC. When calculating a strong scaling profile, the additional benefit of adding cores decreases as the number of cores increases. The limitation of strong scaling is the fixed problem size, and we can increase the problem size with the core count to obtain a weak scaling profile.
Reference
Lesson Schedule
Software Development Lifecycle	Software engineering takes a wider view of software development beyond programming (or coding). Software you produce has inherent value. Always assume your code will be read and used by others (including a future version of yourself). Additionally, aim to make your software reusable by others. Reproducibility is a cornerstone of science, so ensure your software-generated results are reproducible. Following a process makes development predictable, can save time, and helps ensure each stage of development is given sufficient consideration before proceeding to the next. Ensuring requirements are sufficiently captured is critical to the success of any project.
An Introduction to Python	We’ll be using Python for the following parts of the material, here’s an introduction / refresher.
Functions and Classes	Functions allow us to decompose a problem down into smaller tasks. Classes allow us to organise data which represents a distinct concept.
Programming Paradigms	A Paradigm describes a way of structuring reasoning about code. Different programming languages are suited to different paradigms. Different paradigms are suited to solving different classes of problems. Pure functions are functions with deterministic behaviour and no side effects. Classes allow us to organise data into distinct concepts.
Best Practices in Writing Code	Source code is designed for humans, not machines. Source code is read much more often than it is written. Always assume that someone else will read your code at a later date, including yourself. Good indentation greatly enhances code readability. Name things like variables, functions, and modules to indicate purpose. Good comments describe the reasons behind coding approaches as well as complex behaviour. Community coding conventions help you create more readable software projects that are easier to contribute to. Maintainable code is easier to understand, modify, extend, and fix. Assume any piece of code you write will be reused. Technical debt is incurred when quick solutions are prioritised over good solutions, but is paid off in the cost of maintaining the code. Change the way you write code to make maintainability a key goal.
Reference
Lesson Schedule
Test Strategy, Planning, and Running Tests	A test plan forms the foundation of any testing. We should write tests to verify that functions generate expected output given a set of specific inputs. The three main types of automated tests are unit tests, functional tests and regression tests. We can use a unit testing framework like `pytest` to structure and simplify the writing of tests. Testing program behaviour against both valid and invalid inputs is important and is known as data validation.
Development Tools	IDEs provide tools and features to help develop increasingly complex code. Debuggers allow you to set breakpoints which pause running code so its state can be inspected. A call stack is a chain of functions that have been executed prior to a certain point.
Reviewing Code	Code review is where at least one other person looks at parts of a codebase in order to improve its code readability, understandability, quality and maintainability. The first hour of code review matters the most.
Documenting Code	A huge contributor to the ability to reuse any software is documentation. Having only a short documentation document that covers the basics for getting the software up and running goes a long way, and can be amended and added to later. Documentation helps make your code reproducible. By default, software code released without a licence conveys no rights for reuse. Open source licences fall into two key categories: copyleft and permissive.
Reference
Lesson Schedule
What is Version Control	Version control is like an unlimited ‘undo’. Version control also allows many people to work in parallel.
Setting Up	Use `git config` with the `--global` option to configure a user name, email address, editor, and other preferences once per machine. GitHub needs an SSH key to allow access
Using a Repository	`git clone` creates a local copy of a repository from a URL. Git stores all of its repository data in the `.git` directory.
Tracking Changes	`git status` shows the status of a repository. Files can be stored in a project’s working directory (which users see), the staging area (where the next commit is being built up) and the local repository (where commits are permanently recorded). `git add` puts files in the staging area. `git commit` saves the staged content as a new commit in the local repository. Write commit messages that accurately describe your changes. `git log` lists the commits made to the local repository.
Exploring History	`git diff` displays differences between commits. `git checkout` recovers old versions of files.
Remote Repositories	Git can easily synchronise your local repository with a remote one GitHub needs an SSH key to allow access
Reference

Glossary

The glossary would go here, formatted as:

{:auto_ids}
key word 1
:   explanation 1

key word 2
:   explanation 2

({:auto_ids} is needed at the start so that Jekyll will automatically generate a unique ID for each item to allow other pages to hyperlink to specific glossary entries.) This renders as:

key word 1: explanation 1
key word 2: explanation 2