Software Development Lifecycle
Overview
Teaching: 0 min
Exercises: 0 minQuestions
What is a software development process?
Why is a development process important?
What are the different development stages?
Objectives
Explain some of the common issues found in academic software development
Summarise the benefits of following a process of developing software
Define the fundamental stages in software development
Express how different process stages are connected
Summarise the differences between the waterfall and agile models of software development
In this section, we will take a look at coding - or writing software - as a process of development.
Even if you are now solely a software user and do not plan to develop any code, it’s still useful to know how software is typically developed and what practices are used. You may find a good reason to get into developing code (even small, simple programs can be immensely useful), end up supervising others who will need to develop software, or become involved in projects where software is being developed: software, and its development, is becoming increasingly prevalent as a key tool for research across many fields of research.
“If you fail to plan, you are planning to fail.” - Benjamin Franklin
Typical Software Development in Academia
Traditionally in academia, software - and the process of writing it - is often seen as a necessary but throwaway artefact in research. For example, there may be research questions (for a given research project), code is created to answer those questions, the code is run over some data and analysed, and finally a publication is written based on those results. These steps are often taken informally.
The terms programming (or even coding) and software engineering are often used interchangeably. They are not. Programmers or coders tend to focus on one part of the software development process: implementation, more than any other. In academic research, often they are writing software for themselves - they are their own stakeholders. And ideally, they are writing software from a design, that fulfils a research goal to publish research papers.
Someone who is engineering software, on the other hand takes a wider view:
- The lifecycle of software: from understanding what is needed, to writing the software and using/releasing it, to what happens afterwards.
- Who will (or may) be involved: software is written for stakeholders. This may only be the researcher initially, but there is an understanding that others may become involved later (even if that isn’t evident yet). A good rule of thumb is to always assume that code will be read and used by others later on, which includes yourself!
- Software (or code) is an asset: software inherently contains value - for example, in terms of what it can do, the lessons learned throughout its development, and as an implementation of a research approach (i.e. a particular research algorithm, process, or technical approach).
- As an asset, it could be reused: again, it may not be evident initially that the software will have use beyond it’s initial purpose or project, but there is an assumption that the software - or even just a part of it - could be reused in the future.
The Levels of Software Reusability
We mentioned that having reusable software is a good idea, so let’s take a closer look at what we mean by that.
Firstly, whilst we want to ensure our software is reusable by others, as well as ourselves, we should be clear what we mean by ‘reusable’. There are a number of definitions out there, but a helpful one written by Benureau and Rougler in 2017 offers the following levels by which software can be characterised:
- Re-runnable: the code is simply executable and can be run again (but there are no guarantees beyond that)
- Repeatable: the software will produce the same result more than once
- Reproducible: published research results generated from the same version of the software can be generated again from the same input data
- Reusable: easy to use, understand, and modify
- Replicable: the software can act as an available reference for any ambiguity in the algorithmic descriptions made in the published article. That is, a new implementation can be created from the descriptions in the article that provide the same results as the original implementation, and that the original - or reference - implementation, can be used to clarify any ambiguity in those descriptions for the purposes of reimplementation
Later levels imply the earlier ones. So what should we aim for? As researchers who develop software - or developers who write research software - we should be aiming for at least the fourth one: reusability. Reproducibility is required if we are to successfully claim that what we are doing when we write software fits within acceptable scientific practice, but it is also crucial that we can write software that can be understood by others. Where ‘others’, of course, can include a future version of ourselves: coming back and understanding our own code even after only six months can be difficult!
What Do You Think?
Reflecting on software you’ve used in the past, and their good and bad qualities, can help you when selecting suitable software for a task. Additionally, for developers, this kind of reflection helps prioritise the qualities you want to aim for in your own software.
Have you used any academically-produced (or other) software in your work, or perhaps developed some yourself?
- List three things that were good about it, and three shortcomings
- Which three aspects do you think should have been given greater attention during its development (or even afterwards)?
The Software Development Process
The typical stages of a software development process can be categorised as follows:
- Requirements gathering: the process of identifying and recording the exact requirements for a software project before it begins. This helps maintain a clear direction throughout development, and sets clear targets for what the software needs to do.
- Design: where the requirements are translated into an overall design for the software. It covers what will be the basic software ‘components’ and how they’ll fit together, as well as the tools and technologies that will be used, which will together address the requirements identified in the first stage.
- Implementation: the software is developed according to the design, implementing the solution that meets the requirements set out in the requirements gathering stage.
- Testing: the software is tested with the intent to discover and rectify any defects, and also to ensure that the software meets its defined requirements, i.e. does it actually do what it should do reliably?
- Deployment: where the software is deployed and used for its intended purpose.
- Maintenance: where updates are made to the software to ensure it remains fit for purpose, which typically involves fixing any further discovered issues and evolving it to meet new or changing requirements.
The process of following these stages, particularly when undertaken in this order, is referred to as the waterfall model of software development: each stage’s outputs flow into the next stage sequentially.
Whether projects or people that develop software are aware of them or not, these stages are followed implicitly or explicitly in every software project. What is required for a project (during requirements gathering) is always considered, for example, even if it isn’t explored sufficiently or well understood.
Following a process of development offers some major benefits:
- Stage gating: a quality gate at the end of each stage, where stakeholders review the stage’s outcomes to decide if that stage has completed successfully before proceeding to the next one (and even if the next stage is warranted at all - for example, it may be discovered during requirements of design that development of the software isn’t practical or even required)
- Predictability: each stage is given attention in a logical sequence; the next stage should not begin until prior stages have completed. Returning to a prior stage is possible and may be needed, but may prove expensive, particularly if an implementation has already been attempted. However, at least this is an explicit and planned action.
- Transparency: essentially, each stage generates output(s) into subsequent stages, which presents opportunities for them to be published as part of an open development process.
- It saves time: a well-known result from empirical software engineering studies is that it becomes exponentially more expensive to fix mistakes in future stages. For example, if a mistake takes 1 hour to fix in requirements, it may take 5 times that during design, and perhaps as much as 20 times that to fix if discovered during testing.
How Should it Have Been Improved?
For your software example used in the first exercise (or perhaps another piece of software entirely), for each problem you identified, within which stage do you think that aspect should have been addressed?
What about Agile Software Development?
You may have heard the term agile in relation to software development - perhaps in regards to an agile development process, or agile practices of development. But what is meant by agile, and how does it relate to the waterfall model?
With an agile approach, the software is written in a cyclical, iterative way with a focus on delivering a working product early, and incrementally adding to it over time. Whilst the above stages are largely still undertaken, they are done within a much shorter timeframe (a timebox) and also within a smaller scope of what will be developed. Work is undertaken in sprints, with software requirements prioritised for the sprint. Following a sprint (which typically lasts between 1 and 4 weeks), progress so far is demonstrated and assessed (as part of a sprint review), and the next timebox’s requirements are then decided, at which point the process begins again within another sprint. The sprints continue until the project ends.
This process of reviewing requirements, prioritisation, and working on them is naturally continuous - with the benefit that at key stages you are repeatedly re-evaluating what is important and needs to be worked on which helps to ensure real concrete progress against project goals and requirements which - particularly in academia - may change over time. For a good overview of agile development in more detail, see this resource.
The Importance of Getting Requirements Right
The importance of gaining a solid understanding for what is required for a software project (or any project) before you begin cannot be overstated. As mentioned, going back and changing an existing implementation is an expensive process.
Requirements can be categorised in many ways, but at a high level a useful way to split them is into Business Requirements, User Requirements, and Solution Requirements. Let’s take a look at these now. As an exemplar we’ll use some hypothetical statistical analysis software for clinical trials of anti-inflammatory drugs to illustrate the differences between them.
Business Requirements
Business requirements describe what is needed from the perspective of the organisation, and define the strategic path of the project, e.g. to increase profit margin or market share, or embark on a new research area or collaborative partnership. These are captured in something like a Business Requirements document.
For adapting our clinical trial software project, example business requirements could include:
- BR1: ensure statistical quality of clinical trial reporting meets the needs of external audits
- BR2: throughput of trial analyses is able to meet high demand during peak periods
User (or Stakeholder) Requirements
These define what particular stakeholder groups each expect from an eventual solution, essentially acting as a bridge between the higher-level business requirements and specific solution requirements. These are typically captured in a User Requirements Specification.
For our software, they could include things for trial managers such as (building on the business requirements):
- UR1 (from BR1): support for statistical measures in generated trial reports as required by revised auditing standards (standard deviation, …)
- UR2 (from BR2): support for producing textual representations of statistics in trial reports as required by revised auditing standards
- UR3 (from BR2): ability to have an individual trial report processed and generated in under 20 minutes (NB: perhaps we could assume this normally takes a couple of hours to access and process the data from various sources)
Solution Requirements
Solution (or product) requirements describe characteristics that a concrete solution or product must have to satisfy the stakeholder requirements. They fall into two key categories:
- Functional Requirements focus on functions and features of a solution. For our software, building on our user requirements, e.g.
- SR1 (from UR1): statistical measures include mean average, minimum, maximum, and standard deviation of inflammation readings for each patient for each day of a trial
- SR2 (from UR2): generate a textual representation of statistics that can be imported into auditing documents
- Non-functional Requirements focus on how the behaviour of a solution is expressed or constrained, e.g. performance, security, usability, or portability. These are also known as quality of service requirements. For our project, e.g.:
- SR3 (from UR3): use local HPC-ABC resource (as an infrastructural constraint) to generate trial report within 30 minutes
From Requirements to Implementation
In practice, these different types of requirements are sometimes confused and conflated when different classes of stakeholder are discussing them, which is understandable: each group of stakeholder has a different view of what is required from a project. The key is to understand the stakeholder’s perspective as to how their requirements should be classified and interpreted, and for that to be made explicit. A related misconception is that each of these types are simply requirements specified at different levels of detail. At each level, not only are the perspectives different, but so are the nature of the objectives and the language used to describe them.
Key Points
Software engineering takes a wider view of software development beyond programming (or coding).
Software you produce has inherent value.
Always assume your code will be read and used by others (including a future version of yourself).
Additionally, aim to make your software reusable by others.
Reproducibility is a cornerstone of science, so ensure your software-generated results are reproducible.
Following a process makes development predictable, can save time, and helps ensure each stage of development is given sufficient consideration before proceeding to the next.
Ensuring requirements are sufficiently captured is critical to the success of any project.
An Introduction to Python
Overview
Teaching: 15 min
Exercises: 15 minQuestions
What are the key parts of Python we’ll need to know for the rest of the material?
Objectives
An introduction to Python, or a reminder of key features if you’ve used it before.
Python is one of the most popular languages in research computing, so even if it’s not the main language you use, it’s worth knowing about. This makes it a good choice of language for teaching purposes, so we’re going to use Python for most of the examples for the remainder of this course.
Before we can move on though, we’ll cover some of the core features of Python here. If you’re already familiar with Python, consider this a refresher. If not, then this section should cover everything you need to know for the remainder.
Data Types
An obvious place to start when learning a new language is with how it stores and represents data. In most languages, there are a range of types of data which can be represented. In Python, the most fundamental types are:
- Integers -
int
- Floating point numbers -
float
- Booleans -
bool
- Strings -
str
We create a variable and assign a value to it using the =
operator.
int_variable = 5
float_variable = 3.142
bool_variable = True
str_variable = "Hello world!"
print(int_variable)
print(float_variable)
print(bool_variable)
print(str_variable)
Unlike some other languages (e.g. C, C++, Fortran, Java), we don’t need to say what data type a variable is going to hold - a variable can actually hold different types over its lifetime, e.g.:
my_var = 5
my_var = "Hello world!"
print(my_var)
Hello world!
Also unlike these languages, there’s no way in Python to declare a variable without assigning to it.
Data Structures
But what about when we need to store more than one value? That’s where the next set of types comes in, the collection types, representing common data structures. These data structures are:
list
dict
set
Let’s start with lists.
A list is a sequence of values - it has an order and particular values can be accessed based on their position within the list.
Like many other languages, we start counting positions at 0, so the first item in a list is “item 0”, the second is “item 1”, and so on.
If we need to add a new value to the end of an existing list we can use the append
function / method.
odd_numbers = [1, 3, 5, 7, 9]
odd_numbers.append(11)
print(odd_numbers)
print(odd_numbers[3])
[1, 3, 5, 7, 9, 11]
7
For much more information on lists, see https://docs.python.org/3/tutorial/datastructures.html#more-on-lists.
Dictionaries are the second type of collection, typically used to associate a key and a value - like the index of a book:
index = {
"python": 1,
"c++": 5,
"fortran": 5,
}
index["java"] = 12
print(index["c++"])
5
In a dictionary, a key can only be present once, so in the example above we couldn’t for example:
index = {
"python": 1,
"python": 2,
"c++": 5,
"fortran": 5,
}
In this case, the value at the key "python"
will have the last value we provided for it: 2
.
The final collection type we’ll introduce here is a set. Sets are similar to dictionaries, but instead of storing key/value pairs, they only store keys. In this way, it’s simply an unordered collection of unique keys and as such good if you want to perform set operations, e.g. for doing set union and intersection operations:
set1 = {"python", "c++", "fortran"}
set2 = {"fortran", c#", "cobol", "python", "c"}
set1.union(set2)
set1.intersection(set2)
{'python', 'fortran', 'cobol', 'c#', 'c++', 'c'}
{'python', 'fortran'}
As a shorthand, you can also use set1 | set2
and set1 & set2
for doing union and intersection respectively instead.
Looping and Branching
Now we know how to store and structure data, let’s move onto processing that data. Two important ideas in most programming languages you’ll encounter are branching and looping. In Python we use these two keywords:
if
for
number = 53
if number > 50:
print("Number was greater than 50")
Number was greater than 50
In the example above, because number
is greater than 50, the condition evaluates as True
and the block of code within the if
is executed.
Code blocks are introduced in Python with a colon (:
), which we see in the example above, but we’ll also see in a couple of other contexts soon.
If the condition doesn’t evaluate as True
(i.e. it evaluates as False
), the block of code is skipped.
The expression number > 50
evaluates to True
if the value of number
is less than 50, otherwise it evaluates to False
.
We can also add more possible outcomes to the branching using elif
(short for “else if”) and else
.
With elif
we provide another condition and the block of code we provide executes if all of the previous conditions evaluate as False
, but this one evaluates as True
.
If none of the conditions evaluate as True
, then the else
block executes.
Both elif
and else
are optional - you can have zero or more elif
blocks and zero or one else
blocks.
number = 94
if number > 100:
print("Greater than 100")
elif number == 100:
print("Equal to 100")
else:
print("Less than 100")
Less than 100
In this example we see the ==
operator, used to check for equality.
We use this instead of =
because =
is already used for assignment to a variable, so most languages use ==
to check for equality instead.
Loops are another important construct, which in Python use the keyword for
:
fruit_bowl = ["apple", "banana", "cherry"]
for fruit in fruit_bowl:
print(fruit)
apple
banana
cherry
In the example above, we loop over a collection.
Each time the loop executes, the variable fruit
takes the value of the next item from the collection.
Another common way to use loops is with the range()
function.
This function takes arguments for the bounds and produces a sequence of integers ranging from the lower bound to the upper bound.
for i in range(10):
print(i)
0
1
2
3
4
5
6
7
8
9
In the above example, we just provide an upper bound, the lower bound takes the default value of 0. Notice that the values include the lower bound, but exclude the upper bound - this matches the zero-based indexing of lists.
So let’s take these and do something useful. One relatively simple numerical model is the approximation of pi, using a Monte Carlo method.
If we generate a large number of points random uniformly distributed within a square, the proportion of these points which are also within a circle of the same diameter gives us an approximation of pi, via the formula for the area of a circle:
import random
total = 1000000
inside = 0
for i in range(total):
x = random.uniform(-1, 1)
y = random.uniform(-1, 1)
r2 = x**2 + y**2
if r2 <= 1:
inside += 1
pi = 4 * inside / total
print(pi)
3.138856
Key Points
We’ll be using Python for the following parts of the material, here’s an introduction / refresher.
Functions and Classes
Overview
Teaching: 15 min
Exercises: 15 minQuestions
How can we structure our data and code to help us keep track of larger programmes?
Objectives
Use functions to structure code which performs a particular task.
Use classes to encapsulate structured data.
Functions
From our previous section, we’ve actually now got everything we need to write any program, but if we stopped here we’d quickly find that larger programs become unmanageable. What we’re missing is a way of structuring our code, so that we can separate out specific parts with specific functionality. The first and most important tool for doing this are functions.
Much like in mathematics, functions represent an operation which can be applied to some data, to receive some other data as output.
def add_one(x):
"""Add one to a number."""
return x + 1
print(add_one(3))
4
In the example above, we define a function add_one
which adds one to a number.
To define a function, we need to start with the def
keyword, then the function name, the function arguments in parentheses and the colon to start a new block.
Within the function’s code block we can do anything we could outside of a function, but in order to get any data back out of the function we need to return
it.
The arguments of a function are the values it takes as input - when we call the function inside the print()
, we provide the values of any required parameters, in this case just x
.
The value returned by the function is then passed on to print()
, just as it would be if we’d put the value there directly.
The last component of the function definition above is the docstring.
Docstrings aren’t a requirement, but can make it much easier to understand what your code is doing, especially if it’s complex or you haven’t looked at it for a while.
A docstring needs to be the first thing inside a function’s code block and should be enclosed within triple double quotes (i.e. """
).
For a more practical example, let’s convert our existing code for calculating pi into a function:
import random
def approximate_pi(num_points):
"""Monte-carlo approximation of pi by counting points within a circle."""
inside = 0
for i in range(num_points):
x = random.uniform(-1, 1)
y = random.uniform(-1, 1)
r2 = x**2 + y**2
if r2 <= 1:
inside += 1
return 4 * inside / num_points
pi = approximate_pi(1000000)
print(pi)
3.142464
Sum of Squares
We need a function which accepts an integer and returns the sum of the squares of integers up to and including this number. i.e. 1 -> 1, 2 -> 5, 3 -> 14
Which one of the functions below would accomplish this?
def sum_of_squares_a(limit): for i in range(limit): total = i * i return total def sum_of_squares_b(limit): total = 0 for i in range(limit): total += i * i return total def sum_of_squares_c(limit): total = 0 for i in range(limit + 1): total += i * i return total def sum_of_squares_d(limit): total = 0 for i in range(limit - 1): total += i * i return total
Solution
The correct answer is the function
sum_of_squares_c
.
- Function A just returns the square of
limit - 1
as it overwrites the value oftotal
each time round the loop.- Function B stops the loop one iteration too early.
- Function D stops the loop two iterations too early.
def sum_of_squares_c(limit): total = 0 for i in range(limit + 1): total += i * i return total
Function Composition
One of the main benefits of breaking our code up into functions is that it allows us to use composition. Often, we find that a task is composed of several smaller sub-tasks - an example of this can be seen when we convert temperatures between Fahrenheit, Celsius and Kelvin. Writing two functions to perform temperature conversion from Fahrenheit, we might have:
def fahr_to_celsius(fahr):
# apply standard Fahrenheit to Celsius formula
celsius = ((fahr - 32) * (5/9))
return celsius
def fahr_to_kelvin(fahr):
# apply standard Fahrenheit to Kelvin formula
kelvin = ((fahr - 32) * (5/9)) + 273.15
return kelvin
But on closer inspection, we find that the second of these functions can be broken down into two sub-tasks: firstly, convert Fahrenheit to Celsius, then convert Celsius to Kelvin. Since we already have a function which converts from Fahrenheit to Celsius, we can make use of this:
def fahr_to_celsius(fahr):
# apply standard Fahrenheit to Celsius formula
celsius = ((fahr - 32) * (5/9))
return celsius
def fahr_to_kelvin(fahr):
# apply standard Fahrenheit to Kelvin formula
kelvin = fahr_to_celsius(fahr) + 273.15
return kelvin
This breaking down of a problem into smaller components is a core concept in the practice of software engineering and many other techniques are based around this idea.
Classes
Classes are another tool available to us in Python which allow us to structure our code and our data at the same time. A class is effectively a template for a structured piece of data and the behaviour which is associated with it.
As an example here, let’s imagine some software we might need to analyse the results of a clinical trial. For each patient, we might need to keep track of:
- Their name
- Their dosage
- Some general health measurements
- Measurements of the trial outcome indicator
Using the data structures we’ve seen so far, we might implement this using a dictionary for each patient - so all of our patients would be represented in a list of dictionaries:
alice = {
"name": "Alice",
"dosage_mg": 40,
"weight_kg": 65,
"measurements": [
10, 10, 6, 4, 2, 1, 0
],
}
bob = {
"name": "Bob",
"dosage_mg": 0,
"weight_kg": 75,
"measurements": [
10, 8, 8, 9, 8, 9, 9
],
}
However, having to replicate the structure like this each time is error prone and overly verbose. By using a class, we have a better way to structure this:
class Patient:
def __init__(self, name, dosage_mg, weight_kg):
self.name = name
self.dosage_mg = dosage_mg
self.weight_kg = weight_kg
self.measurements = []
def add_measurement(self, value):
self.measurements.append(value)
alice = Patient("Alice", 40, 65)
alice.add_measurement(10)
alice.add_measurement(8)
print(alice.name)
print(alice.dosage_mg)
bob = Patient("Bob", 0, 75)
bob.add_measurement(10)
bob.add_measurement(8)
print(bob.name)
print(bob.dosage_mg)
Alice
40
Bob
0
In this example, the self.name
, self.dosage_mg
, self.weight_kg
and self.measurements
attributes are the structured data we want our class to contain.
The function add_measurement
is a behaviour that we have chosen to define for our class - something that the data can do, or something that can be done to the data.
We can then create an instance of the class by using similar syntax to calling a function.
When we create instances for Alice and Bob, we provide the values to the parameters of the __init__
method.
When a function belongs to a class like this, we often refer to it as a method.
Normal methods will have self
as their first parameter, but notice that we don’t ever provide a value for this when we call the __init__
method (implicitly) or the add_measurement
method.
This is because it gets filled in for us, to refer to the instance of the class that we’re operating on.
In the case of the line alice.add_measurement(10)
, the value of the self
parameter, will be the class instance alice
.
Adding a Method
Something we might need to calculate during our clinical trial is the dosage per body mass, often reported in units of milligrams per kilogram (mg/kg). Which of the examples below would allow us to do this?
class Patient_A: def __init__(self, name, dosage_mg, weight_kg): self.name = name self.dosage_mg = dosage_mg self.weight_kg = weight_kg def dosage_per_kg(): return self.dosage_mg / self.weight_kg class Patient_B: def __init__(self, name, dosage_mg, weight_kg): self.name = name self.dosage_mg = dosage_mg self.weight_kg = weight_kg def dosage_per_kg(self): return self.dosage_mg / self.weight_kg class Patient_C: def __init__(self, name, dosage_mg, weight_kg): self.name = name self.dosage_mg = dosage_mg self.weight_kg = weight_kg def dosage_per_kg(self): return dosage_mg / weight_kg class Patient_D: def __init__(self, name, dosage_mg, weight_kg): self.name = name self.dosage_mg = dosage_mg self.weight_kg = weight_kg def dosage_per_kg(dosage_mg, weight_kg): return dosage_mg / weight_kg
Solution
The correct solution is
Patient_B
.
- Class A doesn’t provide the
self
argument to the new method - this will cause an error when we try to call the function.- Class C forgets to use
self.
to access the two data attributes on the instance of the class.- Class D does both of the above - it looks like it expects to receive the two data attributes when the function is called, but this avoids the point of putting the method within the class.
class Patient_B: def __init__(self, name, dosage_mg, weight_kg): self.name = name self.dosage_mg = dosage_mg self.weight_kg = weight_kg def dosage_per_kg(self): return self.dosage_mg / self.weight_kg
Key Points
Functions allow us to decompose a problem down into smaller tasks.
Classes allow us to organise data which represents a distinct concept.
Programming Paradigms
Overview
Teaching: 15 min
Exercises: 20 minQuestions
How does the structure of a problem affect the structure of our code?
Objectives
Briefly describe the major paradigms we can use to classify programming languages.
Decompose the flow of data within a program into a sequence of data transformations
Use classes to encapsulate data within a more complex program
Introduction
In the episode on software lifecycles, we spoke briefly about the design of software and how the design is impacted by the problem being solved and the environment in which the software is expected to run.
One of the major topics in the design of software is Programming Paradigms.
In science and philosophy, a paradigm … is a distinct set of concepts or thought patterns …
– Wikipedia - Paradigm
Each paradigm represents a slightly different way of thinking about and structuring our code and each has certain strengths and weaknesses when used to solve particular classes of problem. Once your software begins to get more complex it’s common to use aspects of different paradigms to handle different subtasks. Because of this, it’s useful to know about the major paradigms, so you can recognise which you’re using and when it might be appropriate to switch to another.
There’s a long history behind this, but to skip straight to now, there’s currently three dominant programming paradigms:
- Procedural - where code is logically grouped into procedures that perform tasks
- Functional - a more declarative way of structuring and composing code purely around functions, avoiding concepts of shared state and mutable data, and treating functions themselves as data
- Object Oriented - which organises code around the structure of data, with data and functions that operate on that data defined within an object structure that groups these together
Let us take a look into each of these in turn, and how each can be useful.
Procedural Programming
Procedural programming is the simplest conceptually and is typically the paradigm most beginners will start with, so this is probably the style you’re most familiar with up to this point, where we group code into procedures performing a single task. In most modern languages we call these functions, instead of procedures - so if you’re grouping your code into functions, this might be the paradigm you’re using.
By grouping code like this, we make it easier to reason about the overall structure, since we should be able to tell roughly what a function does just by looking at its name. These functions are also much easier to reuse than code outside of functions, since we can call them from any part of our program.
As the structure of code here is simpler than the following paradigms, this is an appropriate choice for smaller scripts and software that we’re writing just for a single use. Aside from smaller scripts, Procedural Programming is also commonly seen in code focused on high performance, with relatively simple data structures, such as in High Performance Computing (HPC). These programs tend to be written in C (which doesn’t support Object Oriented Programming) or Fortran (which didn’t until recently). HPC code is also often written in C++, but C++ code would more commonly follow an Object Oriented style, though it may have procedural sections.
Note that you may sometimes hear people refer to this paradigm as “Functional Programming” to contrast it with Object Oriented Programming, because it uses functions rather than objects, but this is incorrect. Functional Programming places much stronger constraints on the behaviour of a function.
Functional Programming
Functional Programming is built around a more strict definition of the term “function” borrowed from mathematics. A function in this context can be thought of as a mapping that transforms its input data into output data. Anything a function does other than produce an output is known as a side effect and should be avoided wherever possible.
Being strict about this definition allows us to break down the distinction between code and data, for example by writing a function which accepts and transforms other functions - in Functional Programming code is data.
The most common application of Functional Programming in research is in data processing, especially when using Big Data. A popular definition of Big Data is data which is too large to fit in the memory of a single computer, with a single dataset sometimes being multiple terabytes or larger. With datasets like this, we can’t move the data around easily, so we often want to send our code to where the data is instead. By writing our code in a functional style, we also gain the ability to run many operations in parallel as it’s guaranteed that each operation won’t interact with any of the others - this is essential if we want to process this much data in a reasonable amount of time.
Pure Functions and Side Effects
We define a pure function as one which satisfies two criteria:
- The data returned must be the same each time the same arguments are provided
- Calling the function has no side effects
Side effects cover any action that a function performs which affects anything other than the value they return. Examples include: printing text, modifying the value of an argument, or changing the value of a global variable.
Pure Functions
Which of these functions are pure? If you’re not sure, explain your reasoning to someone else, do they agree?
def add_one(x): return x + 1 def say_hello(name): print('Hello', name) def append_item_1(a_list, item): a_list.append(item) return a_list def append_item_2(a_list, item): result = a_list + [item] return result
Solution
add_one
is pure - it has no effects other than to return a value and this value will always be the same when given the same inputssay_hello
is not pure - printing text counts as a side effect, even though it is the clear purpose of the functionappend_item_1
is not pure - the argumenta_list
gets modified as a side effect - try this yourself to prove itappend_item_2
is pure - the result is a new variable, so this timea_list
doesn’t get modified - again, try this yourself
MapReduce in Python - Comprehensions
Often, when working with data you’ll find that you need to apply a transformation to each datapoint, and/or filter the data, before performing some aggregation across the whole dataset. This process is often referred to as MapReduce, particularly when working within the context of Big Data using tools such as Spark or Hadoop. The MapReduce style of data processing relies heavily on the composability and parallelisability that we get when using functional programming. This name comes from applying or mapping an operation to each value, then performing a reduction operation which collects the data together to produce a single result.
In Python, we do have the built-in functions map
, filter
, but we’ll skip over those and go straight to the recommended approach.
If you’re particularly interested in this form of data processing, it might be worth looking up the documentation for these functions, but in general we use comprehensions instead.
integers = range(5)
double_ints = [2 * i for i in integers]
print(double_ints)
[0, 2, 4, 6, 8]
The above example uses a list comprehension to double each number in a sequence. Notice the similarity between the syntax for a list comprehension and a for loop - in effect, this is a for loop compressed into a single line.
We can also use list comprehensions to filter data, by adding the filter condition to the end.
double_even_ints = [2 * i for i in integers if i % 2 == 0]
print(double_even_ints)
[0, 4, 8]
Similarly, we have set and dictionary comprehensions, which look similar to list comprehensions, but use the set literal or dictionary literal syntax.
double_int_set = {2 * i for i in integers}
print(double_int_set)
{0, 2, 4, 6, 8}
double_int_dict = {i: 2 * i for i in integers}
print(double_int_dict)
{0: 0, 1: 2, 2: 4, 3: 6, 4: 8}
These ‘comprehensions’ cover the map and filter components of MapReduce, but not the reduce component.
For that we either need to rely on a built in reduction operator, or use the reduce
function with a custom reduction operator.
In many cases, what we want to do is to sum the values in a collection - for this we have the built in sum
function:
l = [1, 2, 3]
print(sum(l))
6
Sum of Squares
Using the MapReduce model and a list comprehension, we want to write a function that calculates the sum of the squares of the values in a list. Our function should behave as below:
def sum_of_squares(l): # Our code here print(sum_of_squares([0])) print(sum_of_squares([1])) print(sum_of_squares([1, 2, 3])) print(sum_of_squares([-1])) print(sum_of_squares([-1, -2, -3]))
0 1 14 1 14
Which of these functions has the correct behaviour?
def sum_of_squares_a(l): squares = [x * x for x in l] return sum(squares) def sum_of_squares_b(l): squares = {x: x * x for x in l} return sum(squares) def sum_of_squares_c(l): sum = 0 return [sum = sum + x * x for x in l] def sum_of_squares_d(l): squares = [x * x for x in range(l)] return sum(squares)
Solution
The correct answer is
sum_of_squares_a
.
- Function B uses a dictionary comprehension - when we attempt to
sum
this we get a sum of the dictionary keys.- Function C uses invalid syntax - we can’t use
=
assignment inside a comprehension.- Function D behaves similarly to the sum of squares code we wrote in the previous section - it doesn’t use the values from the list, but the length of the list.
def sum_of_squares_a(l): squares = [x * x for x in l] return sum(squares)
Object Oriented
In Object Oriented Programming, we first think about the structure of the data and the things that we’re modelling. For example, if we’re writing a simulation for our chemistry research, we’re probably going to need to represent atoms and molecules. Each of these has a set of properties which we need to know about in order for our code to perform the tasks we want - in this case, for example, we often need to know the mass and electric charge of each atom. So with Object Oriented Programming, we’ll have some object structure which represents an atom and all of its properties, another structure to represent a molecule, and a relationship between the two (a molecule contains atoms). This structure also provides a way for us to associate code with an object, representing any behaviours it may have.
The main tools of Object Oriented Programming are classes and the relationships between them.
Relationships Between Classes
Classes give us a tool for grouping data and behaviour related to a single conceptual object. The next step we need to take is to describe the relationships between the concepts in our code.
There are two fundamental types of relationship between objects which we need to be able to describe:
- Ownership - x has a y - this is composition
- Identity - x is a y - this is inheritance
Composition
You should hopefully remember the term composition from the section on functions, where we used composition of functions to reduce code duplication. That time, we used a function which converted temperatures in Celsius to Kelvin as a component of another function which converted temperatures in Fahrenheit to Kelvin.
In the same way, in object oriented programming, we can make things components of other things.
We often use composition where we can say ‘x has a y’ - to use a clinical trial as an example again, we might want to say that a patient has observations.
class Observation:
def __init__(self, day, value):
self.day = day
self.value = value
def __str__(self):
return str(self.value)
class Patient:
"""A patient in a clinical trial."""
def __init__(self, name, dosage_mg, weight_kg):
self.name = name
self.dosage_mg = dosage_mg
self.weight_kg = weight_kg
self.observations = []
def add_observation(self, value, day):
new_observation = Observation(day, value)
self.observations.append(new_observation)
return new_observation
def __str__(self):
return self.name
alice = Patient('Alice', 40, 65)
obs = alice.add_observation(3, 1)
print(obs)
3
Now we’re using a composition of two custom classes to describe the relationship between two types of entity in the system that we’re modelling.
Composition: What About Extending it Further?
Of course, reality is often more complex than a single relationship between two types of ‘thing’, and we can also add other classes and relationships to our model as required. Let’s consider adding a doctor that has patients (just after the
Patient
class):class Doctor: """A doctor conducting a clinical trial.""" def __init__(self, name): self.patients = [] def add_patient(self, new_patient): # A crude check by name if this patient is already looked after # by this doctor before adding them for patient in self.patients: if patient.name == new_patient.name: return self.patients.append(new_patient)
So, similarly to our
Patient
class, we specify that a doctor has patients, and have the ability to add them. But note that we’ve also added a check to ensure that we only add the patient if the patient is not already in the doctor’s list of patients. If so, we usereturn
to exit early so we don’t add the patient to the list.This illustrates the power of the object oriented paradigm: we can extend our model of reality by adding new representations that directly represent real-world entities - and the relationships and behaviours that govern them - as needed.
Inheritance
The other type of relationship used in object oriented programming is inheritance.
Inheritance is about data and behaviour shared by classes, because they have some shared identity - ‘x is a y’.
If class Y
inherits from (is a) class X
, we say that X
is the superclass or parent class of Y
, or Y
is a subclass of X
.
If we want to extend the previous example to also manage people who aren’t patients we can add another class Person
.
But Person
will share some data and behaviour with Patient
- in this case both have a name and show that name when you print them.
Since we expect all patients to be people (hopefully!), it makes sense to implement the behaviour in Person
and then reuse it in Patient
.
To write our class in Python, we used the class
keyword, the name of the class, and then a block of the functions that belong to it.
If the class inherits from another class, we include the parent class name in brackets.
class Observation:
def __init__(self, day, value):
self.day = day
self.value = value
def __str__(self):
return str(self.value)
class Person:
def __init__(self, name):
self.name = name
def __str__(self):
return self.name
class Patient(Person):
"""A patient in an inflammation study."""
def __init__(self, name, dosage_mg, weight_kg):
self.name = name
self.dosage_mg = dosage_mg
self.weight_kg = weight_kg
self.observations = []
def add_observation(self, value, day):
new_observation = Observation(day, value)
self.observations.append(new_observation)
return new_observation
def __str__(self):
return self.name
alice = Patient('Alice', 40, 65)
print(alice)
obs = alice.add_observation(3, 1)
print(obs)
bob = Person('Bob')
print(bob)
obs = bob.add_observation(4, 1)
print(obs)
Alice
3
Bob
AttributeError: 'Person' object has no attribute 'add_observation'
As expected, an error is thrown because we cannot add an observation to bob
, who is a Person but not a Patient.
We see in the example above that to say that a class inherits from another, we put the parent class (or superclass) in brackets after the name of the subclass.
Inheritance: What About Extending it Further?
Our example shows a single level of inheritance, but again we could take this further as needed. We could, for example, have a special type of
Patient
that largely behaves the same but needs to be modelled differently. In this case, we could add a class that is a subclass ofPatient
. Alternatively, we could have a different type ofPerson
captured in our model - a good example would be ourDoctor
,which we could define thus:class Doctor(Person): # Rest of class same as above ...
Key Points
A Paradigm describes a way of structuring reasoning about code.
Different programming languages are suited to different paradigms.
Different paradigms are suited to solving different classes of problems.
Pure functions are functions with deterministic behaviour and no side effects.
Classes allow us to organise data into distinct concepts.
Best Practices in Writing Code
Overview
Teaching: 0 min
Exercises: 0 minQuestions
How do I write code to ensure that others, including myself, will be able to continue to understand it?
What can I do to ensure my code can be used, modified and extended in the future?
Objectives
Explain the benefits of writing readable code.
Describe the importance of clear code indentation, formatting, and naming.
List examples of things that should and should not be code commented.
Explain why conforming to defined coding conventions is beneficial.
Explain benefits of writing maintainable code.
Describe what is meant by technical debt.
Understand approaches for writing maintainable code.
Describe questions that can be used as a checklist for whether code is maintainable.
Writing Readable Source Code
Source code is designed for humans. It may end up being processed by a machine, but it evolves in our hands and we need to understand what the code does and where changes need to be made. We may understand our code now, but what about six months or a year from now? Readable code helps us to re-aquaint ourselves with what we wrote and why we wrote it.
Our code may embody some unique aspect of our research. Readable code can help our fellow researchers to understand what we’ve done and so to assess whether this aspect of our research is correct. Or, to put it another way, would we rather have a colleague spot a problem now, or, six months later when we’ve published a paper based on flawed results produced using our software?
There’s also our image to consider. If our code is badly laid out, messy and cryptic, others will assume that it is also buggy and sloppily written. They may assume that we undertake our research in a similarly slack manner.
If we’re working in a team to develop some code then readable source code can ensure that everyone can understand the code written by everyone else. This can help improve a team’s bus factor, which is defined as the number of developers who need to be put out of action before noone understands the code.
Writing readable code costs only a little more time than writing unreadable code, but the payback is immense. Reading and understanding source code is slow, laborious and can lead to misinterpretation. It is always a good idea to keep others in mind when writing code, so a good rule of thumb is to assume that someone will always read your code at a later date, and this includes a future version of yourself!
Code Formatting
The formatting or appearance of code determines how quickly and easily the reader can understand what it does. A compiler will see no difference between this…
// Example 1: unformatted code.
public class Functions
{
public static int fibonacci(int n)
{
if (n < 2)
{
return 1;
}
return fibonacci(n-2) + fibonacci(n-1);
}
public static void main(String[] arguments)
{
for(int i=0;i<10;i++)
{
print(“Input value:”+i+” Output value:”+power(fibonacci(i), 2)+1);
}
}
}
…and this…
// Example 2: formatted code.
public class Functions
{
public static int fibonacci(int n)
{
if (n < 2)
{
return 1;
}
return fibonacci(n-2) + fibonacci(n-1);
}
public static void main(String[] arguments)
{
for (int i = 0; i < 10; i++)
{
print(“Input value:” + i +
” Output value:” +
power(fibonacci(i), 2) + 1);
}
}
}
…but the second example will be more easily understood by the reader.
Indentation makes a clear connection between blocks of code and the classes, functions or loops to which they belong. If a statement is longer than a single line on screen, indentation helps the reader understand where the statement begins and ends. White-space makes the code appear less cluttered and allows the grouping together of logically-related elements like constants or local variable declarations.
In many languages, indentation is purely cosmetic (e.g. Java or C/C++) and the number of spaces used to indent code is left to the developer to decide. However, in certain languages (e.g. Python or Occam) indentation is more restrictive because it has semantic significance: it defines a loop body or a function body.
Many programming environments, also known as Integrated Development Environments or IDES (e.g. PyCharm, Eclipse, JBuilder, NetBeans and Microsoft Visual Studio), provide support for code formatting, and many text editors can be extended with support for language-specific indentation (e.g. Microsoft Visual Studio Code).
Good formatting can impact upon design. A function with seven arguments might not be very readable on-screen, for example. To make it more readable, you could create a new data structure or class to hold some of the arguments. We could also break up a function that cannot be viewed on one screen into a number of smaller functions that can, if the function can be logically decomposed in this way. However, note that in some circumstances, such as within a functional programming paradigm, having many arguments may be unavoidable!
Naming Things
The careful selection of names is very important to understanding. Cryptic names of components, modules, classes, functions, arguments, exceptions and variables can lead to confusion about the role that these components play. Good naming is fundamental to good design, because source code represents the most detailed version of our design. Compare and contrast the ease with which the following statements can be understood:
out(p(f(v), 2) + 1)
print(power(fibonacci(argument), 2) + 1)
There are common naming recommendations. Modules, components and classes are typically nouns (e.g. Molecule, BlackHole, DNASequence). Functions and methods are typically verbs (e.g. spliceGeneSequence, calculateOrbit). Boolean functions and methods are typically expressed as questions about properties (e.g. isStable, running, containsAtom).
Naming also relates to the use of capitalisation and delimiters, which can help a reader to quickly determine if something is a function, variable or class. For example, common guidelines for C and Java include:
- Constants should be capitalised: PI, MAXIMUM_VALUE.
- Class names should start with an initial capital with the first letter of subsequent words capitalised (this is called Camel Case): Molecule, BlackHole, DNASequence.
- Functions should start with a lower-case letter with the first letter of subsequent words capitalised: spliceGeneSequence, calculateOrbit.
Similar conventions exist for other languages.
Code Comments
Source code tells the reader what the code does, whilst code comments allow us to provide the reader with additional information about it. The reader should be able to understand a single function or method from its code and its comments, and should not have to look elsewhere in the code for clarification. It can be easy to get lost in code, and others will not have the same knowledge of our project or code as we do.
The kind of things that need to be commented are:
- Why certain design or implementation decisions were adopted, especially in cases where the decision may seem counter-intuitive.
- The names of any algorithms or design patterns that have been implemented.
- The expected format of input files or database schemas.
There are some restrictions. Comments that simply restate basic code behaviour line-by-line are redundant - it’s better to focus comments on why the code is as it is, or to explain particularly complex behaviour. Of course, comments must be accurate, because an incorrect comment causes more confusion than no comment at all, so remember to update comments when you update your code!
Many languages allow you to use special types of comment to describe the functions and modules in your code, which is often a helpful discipline for increasing readability. For example, in Python these are known as docstrings: if the first thing in a function is a string that is not assigned to a variable, that string is attached to the function as its documentation. Consider the following code implementing a function for calculating the nth Fibonacci number:
def fibonacci(n):
"""Calculate the nth Fibonacci number.
A recursive implementation of Fibonacci array elements.
:param n: integer
:raises ValueError: raised if n is less than zero
:returns: Fibonacci number
"""
if n < 0:
raise ValueError('Fibonacci is not defined for N < 0')
if n == 0:
return 0
if n == 1:
return 1
return fibonacci(n - 1) + fibonacci(n - 2)
Note here we are explicitly documenting our input variables, what is returned by the function, and also when the ValueError exception is raised. Along with a helpful description of what the function does, this information can act as a contract for readers to understand what to expect in terms of behaviour when using the function, as well as how to use it. Docstrings can also be used at the start of a Python module (a file containing a number of Python functions) or at the start of a Python class (containing a number of methods) to list their contents as a reference.
Coding Conventions
As each language has its own syntax, semantics and sets of built-in commands, what constitutes readable code differs across programming language. What is readable is also affected by the opinions and preferences of the individual reader. Nevertheless, a number of language-specific coding conventions have evolved, reflecting both general and language-specific good practice.
It’s recommended that projects adopt a set of coding conventions or style guide. Not only does this promote readable code, it helps ensure that the code looks consistent, even if it the software consists of hundreds of source code files and is worked on by many developers. Projects as varied as Mozilla, Linux, Apache, GNU, and Eclipse all have their own project-specific conventions that their developers are expected to conform to. The Python language, for example, has the PEP8 style guide.
Style consistency
One of the key insights from Guido van Rossum who invented the Python language, is that code is read much more often than it is written. Style guidelines are intended to improve the readability of code and make it consistent across the wide spectrum of Python code. Consistency with the style guide is important. Consistency within a project is more important. Consistency within one module or function is the most important. However, know when to be inconsistent – sometimes style guide recommendations are just not applicable. When in doubt, use your best judgment. Look at other examples and decide what looks best. And don’t hesitate to ask!
Project-specific conventions can also embody requirements specific to our project. They promote consistency of naming across packages, components, classes, or functions: ‘All test classes must have the suffix Test, e.g. FourierUtilitiesTest’. They ensure that others know who owns the copyright on our source code: ‘All source code files must have a comment with the statement Copyright © My Organisation, 2010’. They ensure that others know about restrictions on our source code: ‘All source code files should have a comment with the text “Licensed under the Apache License, Version 2.0”.’
Code analysis tools allow our coding conventions to be defined as rules. Our source code can then be analysed against these rules to automatically check for conformance. These tools can publish reports that highlight what rules are violated and where in the code the violations occur. Popular code analysis tools are CheckStyle for Java, StyleCop for C#, Pylint or Flake8 for Python, and codetools for R. For other languages see Wikipedia’s List of tools for static code analysis. In addition, many IDEs such as PyCharm and VSCode are able to highlight common code convention and formatting issues as you type.
Writing Maintainable Software
Software always needs new features or bug fixes. Maintainable software is easy to extend and fix, which encourages the software’s uptake and use. Maintainable software allows you to quickly and easily:
- Fix a bug, without introducing a new bug as you do so
- Add new features, without introducing bugs as you do so
- Improve usability
- Increase performance
- Make a fix that prevents a bug from occurring in future
- Make changes to support new environments, operating systems or tools
- Bring new developers on board your project
More formally, the IEEE Standard Glossary of Software Engineering Terminology defines maintainability as:
“The ease with which a software system or component can be modified to correct faults, improve performance or other attributes, or adapt to a changed environment.”
The maintainability of software depends on quite a few factors. However, in general it must be easy to understand the software (how it works, what it does, and why it does it a particular way), easy to find what needs to change to achieve a given aim, easy to make those changes, and easy to check that the changes have not introduced any bugs. Writing readable code, as covered previously in this section, goes a long way to making code maintainable.
Long- or Short-Lived Code?
You or others on your project may be developing open-source software with the intent that it will live on after your project completes. It could be important to you that your software is adopted and used by other projects as this may help you get future funding. It can make your software more attractive to potential users if they have the confidence that they can fix bugs that arise or add new features they need, if they can be assured that the evolution of the software is not dependant upon the lifetime of your project.
On the other hand, you might want to knock together some code to prove a concept or to perform a quick calculation and then just discard it. But can you be sure you’ll never want to use it again? Maybe a few months from now you’ll realise you need it after all, or you’ll have a colleague say “I wish I had a…” and realise you’ve already made one!
Short-Lived Code, or Not?
Have you ever used any code or software that was intended to be “short-lived” - whether written by yourself or not? Perhaps it’s a short script or other small piece of code. Would it have benefited from greater effort to improve it for longer-term use, and if so, when should that have been considered?
What to do with Short-Lived Code?
Even short-lived code can contain useful lessons learned, perhaps about how to use a particular technology, how it was written, or importantly, how it solved a particular problem. So when short-lived code has outlived it’s original purpose, what should you do with it?
One way is to essentially archive the software, putting it into a state where it can be readily picked up again later. This usually means tidying up the code, adding comments, and in particular adding a short document (sometimes called a README) that summarises the code and how to set up and use it. Having the code stored in a code repository like GitHub is great for this, and very strongly recommended - for all code you write.
A small investment in the maintainability of your code makes it easier to pick it up after a break, and can provide you with an insurance policy should your disposable software turn out to be more useful than you originally thought.
The Cost of Neglecting Maintainability
When resources are tight, it’s easy to focus on the bare minimum needed to get the software to do what it’s meant to do and leave less pressing tasks, such as documentation, testing, and refactoring, until the end of the project. The plan often is to complete these tasks when time permits, and time rarely permits!
You can save time, in the short term, by not commenting code, not refactoring to make it more readable, not addressing compiler warnings, leaving aside tests, skipping documentation and not recording why something was implemented in a specific way. These actions all incur technical debt:
“Technical debt is the cost of additional rework caused by choosing an easy (limited) solution now instead of using a better approach that would take longer.”
And - just like financial debt - it’s a debt that gathers interest over time. Technical debt is paid off in the cost of maintenance. Software that is written without maintainability in mind requires a lot more effort to maintain than it did to develop. For this reason, many applications are replaced simply because the overhead to modify them becomes prohibitive.
Help is at hand! Developing maintainable software helps reduce technical debt. By thinking ahead and investing now you reduce the impact of changes in the future.
How to Develop Maintainable Software
Developing maintainable software is like picnicking: once you’re finished, leave your spot as you would like to find it yourself, or leave it in a better state than you found it. There are a number of principles, approaches and techniques that can help you develop maintainable software, and many of these are generally applicable to writing good software:
- Start as you mean to go on: write maintainable code from the outset, and make maintainability a key goal
- Keep it functional: write code in short, iterative cycles that aim to keep code in a working state
- Refactor your code: once your code gets messy and hard to understand, rewrite it to function the same but be easier to read
- Get it reviewed: Get others to look at your code to check it is understandable - particularly sections that are critically important
- Document your code: so you and others can understand it now and later
- Use version control: version control helps keep code and documentation up to date and synchronised, and allows you to roll back any parts of your code to previous versions if you run into trouble
- Select sustainable technologies: to avoid using libraries and other dependencies that may become outdated or even non-functional during development or use, be sure to choose technologies that have a good track record of delivering quality releases, and a sustainable, active development community.
Which Qualities do you Value?
Consider the above list from your own perspective, either as a user of software or as a developer (or both). List these in the order they are important to you.
If you’ve listed them from both perspectives, how is the ordering different? If you find there any at or near the top of both lists, that may help you prioritise what to aim for when developing code.
A Maintainability Checklist
Here’s another developer-level perspective on maintainability, which this time asks questions to help you judge maintainability of software you write:
- Can I find the code that is related to a specific problem or change?
- Can I understand the code? Can I explain the rationale behind it to someone else?
- Is it easy to change the code? Is it easy for me to determine what I need to change as a consequence? Are the number and magnitude of such knock-on changes small?
- Can I quickly verify a change (preferably in isolation)?
- Can I make a change with only a low risk of breaking existing features?
- If I do break something, is it quick and easy to detect and diagnose the problem?
Developers: How Maintainable is Your Code?
From your own perspective, answer the questions above for a piece of software, code, or script you’ve written in the past. Next, ask the questions again but, this time, adopt the perspective of someone else in your team who is completely new to your software. How did you do? What changes would help you improve it?
Key Points
Source code is designed for humans, not machines.
Source code is read much more often than it is written.
Always assume that someone else will read your code at a later date, including yourself.
Good indentation greatly enhances code readability.
Name things like variables, functions, and modules to indicate purpose.
Good comments describe the reasons behind coding approaches as well as complex behaviour.
Community coding conventions help you create more readable software projects that are easier to contribute to.
Maintainable code is easier to understand, modify, extend, and fix.
Assume any piece of code you write will be reused.
Technical debt is incurred when quick solutions are prioritised over good solutions, but is paid off in the cost of maintaining the code.
Change the way you write code to make maintainability a key goal.
Survey
Overview
Teaching: min
Exercises: minQuestions
Objectives
Key Points