Lesson Schedule
|
|
Why Use a Cluster?
|
High Performance Computing (HPC) typically involves connecting to very large computing systems elsewhere in the world.
These HPC systems can be used to do work that would either be impossible or much slower or smaller systems.
The standard method of interacting with such systems is via a command line interface such as Bash.
|
Connecting to the remote HPC system
|
To connect to a remote HPC system using SSH and a password, run ssh yourUsername@remote.computer.address .
To connect to a remote HPC system using SSH and an SSH key, run ssh -i ~/.ssh/key_for_remote_computer yourUsername@remote.computer.address .
Protect your SSH keys by managing them carefully!
2-factor authentication is a way to help ensure a user’s identity by requiring two forms of identity evidence.
|
Moving around and looking at things
|
Your current directory is referred to as the working directory.
To change directories, use cd .
To view files, use ls .
You can view help for a command with man command or command --help .
Hit tab to autocomplete whatever you’re currently typing.
|
Writing and reading files
|
There are many different text editors available on DiRAC.
Use nano to create or edit text files from a terminal.
Use cat file1 [file2 ...] to print the contents of one or more files to the terminal.
Use mv old dir to move a file or directory old to another directory dir .
Use mv old new to rename a file or directory old to a new name.
Use cp old new to copy a file under a new name or location.
Use cp old dir copies a file old into a directory dir .
Use rm old to delete (remove) a file.
File extensions are entirely arbitrary on UNIX systems.
Use scp to transfer files from and to a remote DiRAC resource.
Use tar to de-archive and archive sets of numerous and/or large files.
|
Wildcards and pipes
|
The * wildcard is used as a placeholder to match any text that follows a pattern.
Redirect a command’s output to a file with > .
Commands can be chained with |
|
Scripts, variables, and loops
|
A shell script is just a list of bash commands in a text file.
To make a shell script file executable, run chmod +x script.sh .
|
Using Bash Scripts in Pipes
|
You can include your own Bash scripts in pipes.
A common and useful pattern in Bash shell is to run a program or script that generates potentially a lot of output, then use pipes to filter out what you’re really after.
|
Reference
|
|
Lesson Schedule
|
|
Understanding Code Scalability
|
To make efficient use of parallel computing resources, code needs to be scalable.
Before using new code on DiRAC, it’s strong and weak scalability profiles has to be measured.
Strong scaling is how the solution time varies with the number of processors for a fixed problem size.
Weak scaling is how the solution time varies with the number of processors for a fixed problem size for each processor.
Strong and weak scaling measurements provide good indications for how jobs should be configured to use resources.
Always profile your code to determine bottlenecks before attempting any non-trivial optimisations.
|
Scalability Profiling
|
We can use Amdahl’s Law to understand the expected speedup of a parallelised program against multiple cores.
It’s often difficult to estimate the proportion of serial code in our programs, but a reformulation of Amdahl’s Law can give us this based on multiple runs against a different number of cores.
Run timings for serial code can vary due to a number of factors such as overall system load and accessing shared resources such as bulk storage.
The Message Passing Interface (MPI) standard is a common way to parallelise code and is available on many platforms and HPC systems, including DiRAC.
When calculating a strong scaling profile, the additional benefit of adding cores decreases as the number of cores increases.
The limitation of strong scaling is the fixed problem size, and we can increase the problem size with the core count to obtain a weak scaling profile.
|
Reference
|
|
Lesson Schedule
|
|
Software Development Lifecycle
|
Software engineering takes a wider view of software development beyond programming (or coding).
Software you produce has inherent value.
Always assume your code will be read and used by others (including a future version of yourself).
Additionally, aim to make your software reusable by others.
Reproducibility is a cornerstone of science, so ensure your software-generated results are reproducible.
Following a process makes development predictable, can save time, and helps ensure each stage of development is given sufficient consideration before proceeding to the next.
Ensuring requirements are sufficiently captured is critical to the success of any project.
|
An Introduction to Python
|
|
Functions and Classes
|
|
Programming Paradigms
|
A Paradigm describes a way of structuring reasoning about code.
Different programming languages are suited to different paradigms.
Different paradigms are suited to solving different classes of problems.
Pure functions are functions with deterministic behaviour and no side effects.
Classes allow us to organise data into distinct concepts.
|
Best Practices in Writing Code
|
Source code is designed for humans, not machines.
Source code is read much more often than it is written.
Always assume that someone else will read your code at a later date, including yourself.
Good indentation greatly enhances code readability.
Name things like variables, functions, and modules to indicate purpose.
Good comments describe the reasons behind coding approaches as well as complex behaviour.
Community coding conventions help you create more readable software projects that are easier to contribute to.
Maintainable code is easier to understand, modify, extend, and fix.
Assume any piece of code you write will be reused.
Technical debt is incurred when quick solutions are prioritised over good solutions, but is paid off in the cost of maintaining the code.
Change the way you write code to make maintainability a key goal.
|
Reference
|
|
Lesson Schedule
|
|
Test Strategy, Planning, and Running Tests
|
A test plan forms the foundation of any testing.
We should write tests to verify that functions generate expected output given a set of specific inputs.
The three main types of automated tests are unit tests, functional tests and regression tests.
We can use a unit testing framework like pytest to structure and simplify the writing of tests.
Testing program behaviour against both valid and invalid inputs is important and is known as data validation.
|
Development Tools
|
IDEs provide tools and features to help develop increasingly complex code.
Debuggers allow you to set breakpoints which pause running code so its state can be inspected.
A call stack is a chain of functions that have been executed prior to a certain point.
|
Reviewing Code
|
Code review is where at least one other person looks at parts of a codebase in order to improve its code readability, understandability, quality and maintainability.
The first hour of code review matters the most.
|
Documenting Code
|
A huge contributor to the ability to reuse any software is documentation.
Having only a short documentation document that covers the basics for getting the software up and running goes a long way, and can be amended and added to later.
Documentation helps make your code reproducible.
By default, software code released without a licence conveys no rights for reuse.
Open source licences fall into two key categories: copyleft and permissive.
|
Reference
|
|
Lesson Schedule
|
|
What is Version Control
|
|
Setting Up
|
Use git config with the --global option to configure a user name, email address, editor, and other preferences once per machine.
GitHub needs an SSH key to allow access
|
Using a Repository
|
|
Tracking Changes
|
git status shows the status of a repository.
Files can be stored in a project’s working directory (which users see), the staging area (where the next commit is being built up) and the local repository (where commits are permanently recorded).
git add puts files in the staging area.
git commit saves the staged content as a new commit in the local repository.
Write commit messages that accurately describe your changes.
git log lists the commits made to the local repository.
|
Exploring History
|
|
Remote Repositories
|
|
Reference
|
|
{:auto_ids}
key word 1
: explanation 1
key word 2
: explanation 2