Working With Legacy Code
It's a fact of life that every software developer spends a large chunk of their career working within the confines of a so-called "Legacy" codebase. A lot of the time, the codebase isn't as bad as the Legacy label suggests. Developers often throw the term around any time we encounter something that's not "fun" or "modern".
But, even on the legacy-est of legacy codebases, there are things you can do to feel good about your work.
Conflicting Paradigms
A long time ago, I received some advice that functions and methods should strive to have only a single return point. I applied that to a few of my hobby projects, and perhaps I took the wrong lesson from the advice. As a result, I ended up with plenty of functions that contained a tangled mess of procedural code.
Since then I've come to prefer the "early exit" or "return early, return often" technique, where methods should terminate as soon as they can determine a valid course of action.
Recently, I conducted a Twitter poll based on these conflicting approaches. The poll simply asked users to choose which one they prefer.
Responders overwhelmingly prefer the "early exit" approach.
Hmm, here's a programming question that might be worth a poll. In your methods, do you prefer:
— Kevin Boyd (@Beryllium9) December 2, 2016
So, how does this map back to being fun? Because programming is fun, of course! As I've been maintaining my ancient codebases, it's often been necessary to delve into these gnarly single-return-point methods and make changes.
I've found that it's entirely possible to take a method with five levels of indenting and multiple execution branches (eww! the worst kind of legacy code!) and quickly morph it into a much more readable flow with, usually, one indent level.
In some cases, it's even been possible to reduce an entire legacy method to a single return statement.
It's worth pointing out that this technique works best if you keep the return statements at the beginning (early exits at the top) and toward the end. If you find yourself creating long methods with returns all over the place, you should probably look into breaking things up into smaller, clearly-named methods.
To The Example, And Beyond
The method shown here is loosely based on one I recently worked with. We can see
that there are a few levels of indenting, and an if/else-if/else
conditional
that results in three separate execution paths.
<?php
function showPosts()
{
$query = 'SELECT * FROM posts
JOIN board ON board.id = posts.board_id
WHERE board.active = 1';
$result = sql_query($query);
$output = array();
if ( !$result )
{
log_error('fatal', 'Whoa.');
}
else if ( sql_num_rows($result) == 0 )
{
$output['error'] = 'An error occurred.';
}
else
{
while ( $item = sql_fetch_assoc($result) )
{
$output[$item['id']] = $item;
}
}
return $output;
}
One thing that's not clear from this is that log_error
is actually an exit
point from the method - elsewhere in the codebase, log_error
will actually
render output and then call a hard exit()
to halt execution.
Another issue is that the else-if
creates a weird "error" data structure that
is very different from the array of posts that gets returned when everything is
working properly.
And while this example doesn't indent too deeply, one of the folks I talked with said that the single-return-point approach encourages a "pyramid" effect when there is too much indenting.
I think we can do better.
Let's start by simplifying the if/else-if/else
conditional. By making sure the
if
and else-if
logic branches are guaranteed to exit the method early, there
is no need to have an else
block. I've also changed the error message to say
"No results found", which reflects the logic more accurately.
if (!$result) {
return log_error('fatal', 'Whoa.');
} else if (sql_num_rows($result) == 0) {
return array('error' => 'No results found.');
}
while ($item = sql_fetch_assoc($result)) {
$output[$item['id']] = $item;
}
return $output;
Now we have to ask ourselves - why are we returning a special array for a "no results" error? Couldn't we just return an empty array?
That leads us to an important question for any change to legacy code: How deep do you want to go?
Tip: Making these kinds of changes on legacy codebases can be risky. It's a good idea to set up automated tests, either using backend tooling like phpunit (which isn't limited to pure unit tests) or front-end testing tools like Selenium (which actually runs your application and interprets the output using a real web browser). If you create a few simple "happy path" tests, you can be much more confident that your changes haven't broken any core functionality.
Chasing Rabbits
Changing the output of this method will almost certainly break other parts of the application, because there is undoubtedly code that depends on specific behaviours like this mysterious "error" data structure.
On the other hand, simply rewriting isolated chunks of logic without improving the interactions between chunks, won't do much to make the project fun or modern.
It's worth asking the rest of the team, or the project manager, what the official stance is on this kind of improvement. Will they let you dig deep to improve the overall system?
Hopefully they give you the green light to improve the flow of the code. You don't have to commit to some crazy months-long rewrite. Incremental improvements, when approached with care and planning, can bring your codebase to a place that makes it easier to implement modern best practices - even in the hairiest of codebases.
Let's assume that we're cleared to alter the logic slightly. We can return an empty array. The calling code will be altered to expect empty arrays, instead of special "error" data structures.
The modified call to log_error
does indicate more clearly that we're exiting
early, but it's a little bit odd to use a return when we know the fatal error
leads to an exit()
.
For the sake of readability, we may simply want to leave a comment explaining
that execution will halt. If we want to go one step further, a wrapper function
for log_error
could more clearly indicate that the app will exit.
Here's what our final approach might look like:
<?php
function showPosts()
{
$query = 'SELECT * FROM posts
JOIN board ON board.id = posts.board_id
WHERE board.active = 1';
$result = sql_query($query);
$output = array();
// exit with a fatal error if sql_query() returns false
if (!$result) {
log_fatal_error_and_exit('Whoa.');
}
while ($item = sql_fetch_assoc($result)) {
$output[$item['id']] = $item;
}
return $output;
}
function log_fatal_error_and_exit($message)
{
log_error('fatal', $message);
exit();
}
By modern standards, it's true that this method is still doing too much. It's
talking to the database, exiting with a fatal error (displayed to the user,
which involves multiple layers of the application), and also processing database
results. However, we've still managed to simplify it. We've cut the big
if/else-if/else conditional
to a single if
statement, we've eliminated an
entire custom data structure, and we've removed a mess of nesting.
This is good. This is progress.
So, some questions to ask when working with legacy codebases:
- Can I change this method while preserving the original behaviour?
- Am I allowed to change the original behaviour to be more consistent?
- Can unnecessary clutter and complexity be removed from this method?
- Does this method even need to exist?
That last question might sound a bit odd, but there are often methods in legacy codebases that simply don't need to exist. They might be solving the wrong problem, or solving it in the wrong place.
I've even seen methods that were completely forgotten - not used anywhere. Removing these freebies is not without risk (it helps to understand what they were originally written for), but it sure feels nice.
The more clutter and complexity you can trim out of a legacy codebase, the healthier it will be - and the happier you will be.
Published: December 5, 2016
Tags: coding, dev, development, legacy, maintenance, php