Write maintainable research code with these 3 weird tricks!

Writing code is not easy.

If you are a researcher and have to write code, especially if you are not from a computer science background, then I salute you.

I have spent the last 4 years (6 years if you include my time at the Open University) studying for an MEng in Software Engineering. I can tell you with certainty that it is not an easy task. I’d like to tell you about something that is even harder than writing code: Reading and maintaining poorly written code!

When I started here, with the HPC team, I attended a presentation on “How to program in Python”. A guest lecturer and researcher was aiming the presentation at other researchers working in a similar field.

Having only done a small amount of Python before (for robotics) I was keen to learn more and attended. However, it didn’t actually teach anything about coding in Python. The presentation seemed to skirt around Python and talk about how to write research code. What the presentation missed entirely, however, was good practice and maintenance.

Poorly written code

“But my code isn’t poorly written, it works 100% of the time”

Poorly written code does not mean code that doesn’t work or is buggy. Poorly written code is code that’s hard to maintain.

It’s estimated that 80% of maintenance is not correcting mistakes but improving performance or adding features.

Creating well-written code means the code is more readable, more testable, and easier to maintain.

So imagine a scenario where a program that you have written needs updating. Maybe you simply want to change something, like an output, but you haven’t even looked at the code for months or even a years? What if you need to delegate this maintenance task to someone else?

This means that the next person has to re-read and test, all of that code again, in addition to any modifications that they make to it.

There are many reasons why developers end up writing unreadable code. People are understandably impressed by powerful things. So when we’re writing code it’s easy to think that the all-encompassing “one liner” is an awesome thing to use.

We also have hang-ups from algebra and mathematics where formulas are as concise and condensed as possible. This maths memory causes us to think that single letter variable names (I’m looking at you, x) have an appropriate place within our code.

Computers were also limited on hardware resources. Bill Gates is often quoted as saying that “640KB ought to be enough for anybody”. Thankfully we don’t have that problem on Viper’s fat nodes with 1TB RAM.

So, without further Digression, (Thank you, Rob Miles) I present you with my top 3 tips for writing maintainable software!

1. Comments

Comments are great. They should tell you exactly what is going on in the code. I know it can feel like a chore to write comments and it may feel like wasted time. But to the next person they’re invaluable. Imagine how easy an archaeologist’s job would be if every bone or fossil, they dug up, had a label attached describing what it was and how it should fit together!

A good rule is that comments should document why your code does something and not how it does it.

It is also a good idea to comment what a piece of code is supposed to be doing. So you can compare what the code does to what it’s meant to do.

That being said, however, there is something even better than writing comments. Writing code that doesn’t need comments!

2. Code that doesn’t need comments

No matter what language you write your code in, the code will tell us explicitly, in no uncertain terms, exactly what it does and how it does it. There are no ambiguities in code, however, the code’s purpose or function can be obfuscated.

Each year there is an Obfuscated code competition where the winning code is the one that is the most indecipherable.

To show how difficult, reading poorly written code is, here is a video (which displays all 20 lines of the source code) to an ASCII based Fluid Dynamics program .

You can clearly see that the code is very difficult to follow. But yours doesn’t have to be. If you find yourself doing a lot of “one liners”, having to scroll through a single piece of code, or all of your variables sharing the same scope then these suggestions to writing comment free code are well worth a read.

Choose descriptive but sensible names for methods and variables

Choosing variable and method names sounds fairly easy but a fair amount of time is often spent trying to come up with the most appropriate name.

A common usage of poorly named variables is during nested while loops. Naming variables, of an inner and outer loop, “i” and “j” can be hard to read and follow the flow. To improve their readability, and give them some meaning, name these variables something like “column” and “row”.

A general naming convention for methods is that they are named after verbs if they change something and nouns if they return a certain value. Method names should also describe what they do by using a prefix such as “get”, “set”, or “is”.

Fragment into small methods

The code you write should be easy to read and follow. One of the best ways of doing this is to break the code up into a series of aptly named methods.

Sorting your coding into a series of method calls helps to test individual parts (each method), outlines the flow of the program, and describes what the program does (BOOM).

The main culprit of a large spaghetti method is of course the main method. This method often becomes the playground for writing ad-hoc code so is a prime candidate for refactoring. It’s totally fine to snap lines of code together, to see how they should fit, but once you know that they work they should be encapsulated.

Once the main method has been refactored, it should largely be comprised of method calls. Each method should easily fit entirely onto the screen and have only a single function within the program. For example, you should ideally fragment a method that parses strings, accesses a database, does calculations, and writes to a log, into four separate methods.

A side benefit, of encapsulating code into methods, ensures that you’re properly scoping your variables. Which should result in the minimal amount of global variables.

So, try to plan your main method out and, instead of having a main method that is hundreds of lines long, try to break it up into something like the following 5 methods:

GetTheConfig()
InitialiseTheSystem(config)
System.CalculateTheThing()
System.logResults(Result)
system.shutdown()

If you’re worried about performance, don’t. The average person doesn’t have to worry, too much, about losing optimisations because a modern compiler will happily rip the guts out, of your easy to follow and manageable, code and turn it into effective and efficient machine code.

Use Classes to represent structures in your code

Most languages can cope with some Object Orientated Programming so if your program uses many related variables you should consider encapsulating them together into a data class such as a Data Transfer Object.

Magic numbers

A magic number is defined as “Unique values with unexplained meaning or multiple occurrences which could (preferably) be replaced with named constants”

Take a look at the loop below

For (int I = 0; I <16; i++)
{stuff to do}

We know that the loop will execute 16 times but we don’t know why. It could be that the developer chose ‘16’ as an arbitrary value. It could be that this ‘16’ relates to another area of the code which is also ‘16’. We won’t know without some further investigation.

Now imagine that we used a variable name like MAX_ATTEMPTS, MAX_CONNECTIONS, or CHUNK_SIZE. You should now have a greater idea about what is happening within the loop without having to write a comment.

Side effects

In programming a side effect is when a method changes a variable from outside its scope and often becomes the reason for calling the method.

int n = 0;
int next_n() { return n++; }

The code above is a good example of a side effect. In this code the next_n() method returns the result of n++ which, of course, increments the integer in the process.

Some side effects are well tolerable within a program, but they should be used with caution and you should be sure that there isn’t a better way of doing it.

3. Tidying up and general good practice

A clean code policy

Once you’ve started to re-factor your code into smarter methods it’s time to clean up.

This means removing all of those commented out lines of code, useless comments, and anything else that is reducing the clarity. You should even remove excess whitespace.

Don’t hard code execution parameters

If you have to edit your code, or a configuration file, each time you want to run your program: You need think about passing these changes as a command line argument.

I feel it’s worth noting at this point that usernames and passwords should never be hard coded. If your program uses credentials, maybe to connect to a database, then these should definitely be passed as a command line argument.

A spoon full of (syntactic) sugar

Mary Poppins

You’ll have realised by now that there were actually more than three tips. But hopefully you’ll be now putting more sugar in your code than your 2AM coffee.

Do let me know what you think in the comments below!

Leave a Reply

Your email address will not be published. Required fields are marked *