whateverthing.com

Working With Legacy Code

It's a fact of life that every software developer spends a large chunk of their career working within the confines of a so-called "Legacy" codebase. A lot of the time, the codebase isn't as bad as the Legacy label suggests. Developers often throw the term around any time we encounter something that's not "fun" or "modern".

But, even on the legacy-est of legacy codebases, there are things you can do to feel good about your work.

Conflicting Paradigms

A long time ago, I received some advice that functions and methods should strive to have only a single return point. I applied that to a few of my hobby projects, and perhaps I took the wrong lesson from the advice. As a result, I ended up with plenty of functions that contained a tangled mess of procedural code.

Since then I've come to prefer the "early exit" or "return early, return often" technique, where methods should terminate as soon as they can determine a valid course of action.

Recently, I conducted a Twitter poll based on these conflicting approaches. The poll simply asked users to choose which one they prefer.

Responders overwhelmingly prefer the "early exit" approach.

So, how does this map back to being fun? Because programming is fun, of course! As I've been maintaining my ancient codebases, it's often been necessary to delve into these gnarly single-return-point methods and make changes.

I've found that it's entirely possible to take a method with five levels of indenting and multiple execution branches (eww! the worst kind of legacy code!) and quickly morph it into a much more readable flow with, usually, one indent level.

In some cases, it's even been possible to reduce an entire legacy method to a single return statement.

It's worth pointing out that this technique works best if you keep the return statements at the beginning (early exits at the top) and toward the end. If you find yourself creating long methods with returns all over the place, you should probably look into breaking things up into smaller, clearly-named methods.

To The Example, And Beyond

The method shown here is loosely based on one I recently worked with. We can see that there are a few levels of indenting, and an if/else-if/else conditional that results in three separate execution paths.

<?php

function showPosts()
{
    $query  = 'SELECT * FROM posts
        JOIN board ON board.id = posts.board_id
        WHERE board.active = 1';
    $result = sql_query($query);
    $output = array();

    if ( !$result )
    {
        log_error('fatal', 'Whoa.');
    }
    else if ( sql_num_rows($result) == 0 )
    {
        $output['error'] = 'An error occurred.';
    }
    else
    {
        while ( $item = sql_fetch_assoc($result) )
        {
            $output[$item['id']] = $item;
        }
    }

    return $output;
}

One thing that's not clear from this is that log_error is actually an exit point from the method - elsewhere in the codebase, log_error will actually render output and then call a hard exit() to halt execution.

Another issue is that the else-if creates a weird "error" data structure that is very different from the array of posts that gets returned when everything is working properly.

And while this example doesn't indent too deeply, one of the folks I talked with said that the single-return-point approach encourages a "pyramid" effect when there is too much indenting.

I think we can do better.

Let's start by simplifying the if/else-if/else conditional. By making sure the if and else-if logic branches are guaranteed to exit the method early, there is no need to have an else block. I've also changed the error message to say "No results found", which reflects the logic more accurately.

if (!$result) {
    return log_error('fatal', 'Whoa.');
} else if (sql_num_rows($result) == 0) {
    return array('error' => 'No results found.');
}

while ($item = sql_fetch_assoc($result)) {
    $output[$item['id']] = $item;
}

return $output;

Now we have to ask ourselves - why are we returning a special array for a "no results" error? Couldn't we just return an empty array?

That leads us to an important question for any change to legacy code: How deep do you want to go?

Tip: Making these kinds of changes on legacy codebases can be risky. It's a good idea to set up automated tests, either using backend tooling like phpunit (which isn't limited to pure unit tests) or front-end testing tools like Selenium (which actually runs your application and interprets the output using a real web browser). If you create a few simple "happy path" tests, you can be much more confident that your changes haven't broken any core functionality.

Chasing Rabbits

Changing the output of this method will almost certainly break other parts of the application, because there is undoubtedly code that depends on specific behaviours like this mysterious "error" data structure.

On the other hand, simply rewriting isolated chunks of logic without improving the interactions between chunks, won't do much to make the project fun or modern.

It's worth asking the rest of the team, or the project manager, what the official stance is on this kind of improvement. Will they let you dig deep to improve the overall system?

Hopefully they give you the green light to improve the flow of the code. You don't have to commit to some crazy months-long rewrite. Incremental improvements, when approached with care and planning, can bring your codebase to a place that makes it easier to implement modern best practices - even in the hairiest of codebases.

Let's assume that we're cleared to alter the logic slightly. We can return an empty array. The calling code will be altered to expect empty arrays, instead of special "error" data structures.

The modified call to log_error does indicate more clearly that we're exiting early, but it's a little bit odd to use a return when we know the fatal error leads to an exit().

For the sake of readability, we may simply want to leave a comment explaining that execution will halt. If we want to go one step further, a wrapper function for log_error could more clearly indicate that the app will exit.

Here's what our final approach might look like:

<?php

function showPosts()
{
    $query  = 'SELECT * FROM posts
        JOIN board ON board.id = posts.board_id
        WHERE board.active = 1';
    $result = sql_query($query);
    $output = array();

    // exit with a fatal error if sql_query() returns false
    if (!$result) {
        log_fatal_error_and_exit('Whoa.');
    }

    while ($item = sql_fetch_assoc($result)) {
        $output[$item['id']] = $item;
    }

    return $output;
}

function log_fatal_error_and_exit($message)
{
    log_error('fatal', $message);
    exit();
}

By modern standards, it's true that this method is still doing too much. It's talking to the database, exiting with a fatal error (displayed to the user, which involves multiple layers of the application), and also processing database results. However, we've still managed to simplify it. We've cut the big if/else-if/else conditional to a single if statement, we've eliminated an entire custom data structure, and we've removed a mess of nesting.

This is good. This is progress.

So, some questions to ask when working with legacy codebases:

  1. Can I change this method while preserving the original behaviour?
  2. Am I allowed to change the original behaviour to be more consistent?
  3. Can unnecessary clutter and complexity be removed from this method?
  4. Does this method even need to exist?

That last question might sound a bit odd, but there are often methods in legacy codebases that simply don't need to exist. They might be solving the wrong problem, or solving it in the wrong place.

I've even seen methods that were completely forgotten - not used anywhere. Removing these freebies is not without risk (it helps to understand what they were originally written for), but it sure feels nice.

The more clutter and complexity you can trim out of a legacy codebase, the healthier it will be - and the happier you will be.

Published: December 5, 2016

Categories: coding

Tags: coding, dev, development, legacy, maintenance, php