This lesson is still being designed and assembled (Pre-Alpha version)

Introduction to the Message Passing Interface: Reference

Key Points

Introduction to Parallelism
  • Processes do not share memory and can reside on the same or different computers.

  • Threads share memory and reside in a process on the same computer.

  • MPI is an example of multiprocess programming whereas OpenMP is an example of multithreaded programming.

  • Algorithms can have both parallelisable and non-parallelisable sections.

  • There are two major parallelisation paradigms; data parallelism and message passing.

  • MPI implements the Message Passing paradigm, and OpenMP implements data parallelism.

Introduction to the Message Passing Interface
  • The MPI standards define the syntax and semantics of a library of routines used for message passing.

  • By default, the order in which operations are run between parallel MPI processes is arbitrary.

Communicating Data in MPI
  • Data is sent between ranks using “messages”

  • Messages can either block the program or be sent/received asynchronously

  • Knowing the exact amount of data you are sending is required

Point-to-Point Communication
  • Use MPI_Send() and MPI_Recv() to send and receive data between ranks

  • Using MPI_Ssend() will always block the sending rank until the message is received

  • Using MPI_Send() may block the sending rank until the message is received, depending on whether the message is buffered and the buffer is available for reuse

  • Using MPI_Recv() will always block the receiving rank until the message is received

Collective Communication
  • Using point-to-point communication to send/receive data to/from all ranks is inefficient

  • It’s far more efficient to send/receive data to/from multiple ranks by using collective operations

Non-blocking Communication
  • Non-blocking communication often leads to performance improvements compared to blocking communication

  • However, it is usually more difficult to use non-blocking communication

  • Most blocking communication operations have a non-blocking variant

  • We have to wait for a communication to finish using MPI_Wait() (or MPI_Test()) otherwise we will encounter strange behaviour

Derived Data Types
  • Any data being transferred should be a single contiguous block of memory

  • By defining derived data types, we can more easily send data which is not contiguous

Porting Serial Code to MPI
  • Start from a working serial code

  • Write a parallel implementation for each function or parallel region

  • Connect the parallel regions with a minimal amount of communication

  • Continuously compare the developing parallel code with the working serial code

Optimising MPI Applications
  • We can use Amdahl’s Law to identify the theoretical limit in what parallelisation can achieve for performance

  • Strong scaling is defined as how the solution time varies with the number of processors for a fixed total problem size

  • We can use Gustafson’s Law to calculate relative speedup which takes into account increasing problem sizes

  • Weak scaling is defined as how the solution time varies with the number of processors for a fixed problem size per processor

  • Use a profiler to profile code to understand its performance issues before optimising it

  • Ensure code is tested after optimisation to ensure its functional behaviour is still correct

Common Communication Patterns [Optional]
  • There are many ways to communicate data, which we need to think about carefully

  • It’s better to use collective operations, rather than implementing similar behaviour yourself

Advanced Data Communication [Optional]
  • Structures can be communicated easier by using MPI_Type_create_struct to create a derived type describing the structure

  • The functions MPI_Pack() and MPI_Unpack() can be used to manually create a contiguous memory block of data, to communicate complex and/or heterogeneous data structures

Survey

Glossary

The glossary would go here, formatted as:

{:auto_ids}
key word 1
:   explanation 1

key word 2
:   explanation 2

({:auto_ids} is needed at the start so that Jekyll will automatically generate a unique ID for each item to allow other pages to hyperlink to specific glossary entries.) This renders as:

key word 1
explanation 1
key word 2
explanation 2