A Biased Guide to Unit Tests

This article was written to explain my thoughts on how Unit Testing should be done. Your mileage may vary.

Introduction

Testing is important. It's really fucking important. Yes, it can be annoying, but so is checking that you have your parachute on before jumping out of a plane. Despite this, the world of thought surrounding unit testing still seems to be in its infancy compared to thought about general programming. Many very competent developers will rush through the process of unit testing, making significant errors in the process, and many other competent developers will look at the unit tests and not see anything wrong with them.

Part of this is due to the fact that it isn't necessarily very obvious what makes a good unit test. We know what makes good functional code - it should be efficient, extensible, loosely coupled, and, above all, correct (that is, it should do what we expect it to). Most of us are pretty good at spotting issues with functional code. For example, is my implementation of the Fibonacci sequence correct?

/**
* Computes the ith number in the fibonacci sequence.
*/
public static int fib(int i)
{
    if (i <= 0)
    {
        return 0;
    }
    int previous = 0;
    int current = 1;
    for (int a = 1; a < i; ++a)
    {
        current = current + previous;
        previous = current;
    }
    return current;
}

There are several things wrong here. It isn't correct (in fact, this method prints 2^(i-2) for i >= 2), it isn't clear what's going on (why are we using a for loop?) and it isn't the most efficient way of calculating the answer ( it can be done in O(1) time using the golden ratio ).

Now, the Fibonacci sequence is a fairly complex piece of code, so let's consider something much simpler. We have three classes: Person, Ticket, and BoxOffice. When BoxOffice.getTicket is called with a Person, a ticket is created with a unique id and is assigned to the person. Easy to write, easy to test, right?

class TicketTest {
    public static int PERSON_ID = 5;
    @Nested
    class GivenAPerson {


        @Nested
        class WhenGettingATicket
        {
            Person mockPerson;
            @BeforeEach
            void setup()
            {
                mockPerson = mock(Person.class);
                when(mockPerson.getId()).thenReturn(PERSON_ID);
            }

            @Test
            void thenPersonIsCorrect()
            {
                Ticket actualTicket = BoxOffice.getTicket(mockPerson);
                Ticket expectedTicket = getExpectedTicket(mockPerson);

                assertEquals(actualTicket.person, expectedTicket.person);
            }

            @Test
            void thenIdIsCorrect() {
                Ticket actualTicket = BoxOffice.getTicket(mockPerson);
                Ticket expectedTicket = getExpectedTicket(mockPerson);

                assertEquals(actualTicket.id, expectedTicket.id);
            }
        }
    }

    Ticket getExpectedTicket(Person person) {
        return new Ticket(person, 3);
    }
}

Do you see the problem? Our second test case, thenIdIsCorrect, is checking that the id is what we expect it to be. We check this by getting an expected Ticket through the getExpectedTicket method. But, in this method, we just set the id to be 3! This number pops into existence without any explanation.

Our first test case is… okay. However, we shouldn't take for granted that there is a magical method that is somehow able to correctly determine the “expected” behavior. I will explain this fallacy later, so keep that in your back pocket.

Also notice that we set up the mock person in the “WhenGettingATicket” class, instead of the “GivenAPerson” class, where it probably belongs. Also, the same code is duplicated between the two test cases, although it could be better moved to a setup method.

In this article, I will propose my definition of what makes a good unit test.

Value Hierarchies

When we write functional code, there is a hierarchy of value that defines quality.

Correct: It should do what we expect.
Efficient: It should not waste time or space.
Resilient: Changes in how this code is used, or how other code is defined, shouldn't impact our code.
Extensible: We should be able to make changes to the code without a massive amount of work.
Readable: Humans should be able to understand what is happening.

When we write testing code, however, we have different requirements.

Honest: The code should test the SUT.
Comprehensive: The code should test all edge cases.
Explicit: The code should “explain” the actual functionality of the SUT.

This difference in value means that applying best practices from one paradigm to the other causes poor outcomes. For example, when writing functional code, it is considered an antipattern to copy and paste code. Instead, a good program has similar tasks being handled in the same way. Developers will sometimes fall into the trap of thinking that this is for their benefit, because it is more convenient. While this is true, the more important reason that copying and pasting is considered an antipattern is because it is likely to introduce bugs, therefore causing problems with our most important functional requirement: Correctness. When a change needs to be made, that change should be made in exactly one location, not many different locations. Inevitably, one of those locations will be missed.

However, I would argue that, for testing, copying and pasting is not an antipattern. This is because correctness is not a concern when writing tests. While reducing repeated code can be good for comprehension in some circumstances, it can also damage comprehension. Worse still, if done incorrectly, it can also damage the honesty of our tests.

Writing Honest Tests

The Topology of Testing

Tests are, ideally, all about data. We have some SUT, an input, and a desired output (or outcome). We feed the input into the SUT and check if the result matches the desired output. Below, I have mapped out what this looks like in a pseudo-flow chart.

For example, if we were testing our “fib(i)” function from above, we could write that test like this

//Define input
int input = 0;
 
//Feed input into SUT
int actualOutput = Fibonacci.fib(input);
 
//Assert that the actualOutput matches what we expect
assertEquals(0, actualOutput);

Of course, it's not always that easy. Usually, we will need to set up complex objects in some way. For that, we introduce the Packager. The purpose of the packager is simply to build a more complex object (or series of objects) to give to the SUT.

For example, let's say we decide to package the input to the fib(i) function into a Criteria object (🤷‍♂️). No sweat, we would test it like this:

//Define input
int input = 0;
 
//Package input
FibonacciCriteria fibonacciCriteria = mock(FibonacciCriteria.class);
when(fibonacciCriteria.getNumber()).thenReturn(input);
 
//Feed packaged input into SUT
int actualOutput = Fibonacci.fib(fibonacciCriteria);
 
//Assert that the actualOutput matches what we expect
assertEquals(0, actualOutput);

Finally, we might also need to extract data from the output of the SUT. The Extractor's only job is extract data from a complex value into a simple value that is easy to test on.

For example, let's say that we want fib to now return a Response object. Again, no big deal.

//Define input
int input = 0;
 
//Package input
FibonacciCriteria fibonacciCriteria = mock(FibonacciCriteria.class);
when(fibonacciCriteria.getNumber()).thenReturn(input);
 
//Feed packaged input into SUT
FibonacciResponse packagedOutput = Fibonacci.fib(fibonacciCriteria);
 
//Extract output from packaged output.
int actualOutput = packagedOutput.getResult();
 
//Assert that the actualOutput matches what we expect
assertEquals(0, actualOutput);

This is what the ideal test looks like, outside of semantics (and trust me, I have plenty to say about semantics!). It's exactly what is needed to solve the problem, and nothing more. Enough to make a grown man cry.

Mistakes, quantified

Let's start with obvious mistakes, like this one.

//Define input
int input = 0;
 
//Package input
FibonacciCriteria fibonacciCriteria = mock(FibonacciCriteria.class);
when(fibonacciCriteria.getNumber()).thenReturn(input);
 
//Feed packaged input into SUT
FibonacciResponse packagedOutput = Fibonacci.fib(fibonacciCriteria);
 
//Extract output from packaged output.
int actualOutput = packagedOutput.getResult();
 
//Assert that the output matches the input
assertEquals(0, input);

Let's visualize this in our flowchart abstraction.

This seems silly in the way I've done it above, but I have actually seen this done. This generally occurs in what I call “pass-through” tests, which are tests in which the input and output are expected to match. It also generally occurs with some level of indirection, for example, a method might hide the details of creating data and asserting on the data.

A slightly more common mistake is this one:

//Define input
int input = 0;
 
//Package input
FibonacciCriteria fibonacciCriteria = mock(FibonacciCriteria.class);
when(fibonacciCriteria.getNumber()).thenReturn(input);
Fibonacci fibonacci = mock(Fibonacci.class);
FibonacciResponse fibonacciResponse = mock(FibonacciResponse.class);
when(fibonacciResponse.getResult()).thenReturn(0);
when(fibonacci.fib(any())).thenReturn(fibonacciResponse);
 
//Feed packaged input into SUT
FibonacciResponse packagedOutput = fibonacci.fib(fibonacciCriteria);
 
//Extract output from packaged output.
int actualOutput = packagedOutput.getResult();
 
//Assert that the output matches the input
assertEquals(0, actualOutput);

That looks like this:

From this, let's extract the first, and most important rule of testing: All assertions should be connected to the SUT.

The know-it-all function fallacy

Wouldn't it be easier to write tests if we didn't need to define both the inputs and the outputs? What if we had a magical function that could, given some input, give us the correct output? Then, we could write tests like this:

//Define input
int input = 0;
 
//Package input
FibonacciCriteria fibonacciCriteria = mock(FibonacciCriteria.class);
when(fibonacciCriteria.getNumber()).thenReturn(input);
 
//Feed packaged input into SUT
FibonacciResponse packagedOutput = Fibonacci.fib(fibonacciCriteria);
 
//Extract output from packaged output.
int actualOutput = packagedOutput.getResult();
 
//Get expected output
FibonacciResponse expectedResponse = getExpectedFibonacciResponse(fibonacciCriteria);
int expectedOutput = expectedResponse.getResult();
 
//Assert that the actualOutput matches what we expect
assertEquals(expectedOutput, actualOutput);

It looks like this:

Hey, that's great! Now, the only problem is implementing getExpectedFibonacciResponse, and –

private FibonacciResponse getExpectedFibonacciResponse(FibonacciCriteria criteria)
{
    return Fibonacci.fib(criteria);
}

Which makes our test look like:

Oh. Oh no. Well, never mind that. Let's implement this function the right way!

private FibonacciResponse getExpectedFibonacciResponse(FibonacciCriteria criteria)
{
    int i = criteria.getNumber();
    int result;
    if (i <= 0) {
        result = 0;
    }
    else {
        int previous = 0;
        int current = 1;
        for (int a = 1; a < i; ++a)
        {
            int temp = current;
            current = current + previous;
            previous = temp;
        }
        result = current;
    }
    return FibonacciResponse.builder().result(result).build();
}

Ah, all done. Now, let me just check my implementation of the original function, and -

public static FibonacciResponse fib(FibonacciCriteria fibonacciCriteria)
{
    int i = fibonacciCriteria.getNumber();
    int result;
    if (i <= 0) {
        result = 0;
    }
    else {
        int previous = 0;
        int current = 1;
        for (int a = 1; a < i; ++a)
        {
            int temp = current;
            current = current + previous;
            previous = temp;
        }
        result = current;
    }
    return FibonacciResponse.builder().result(result).build();
}

Shit.

Let's say we have a function A, and another function that computes the correct output of A, called B. If B is less correct than A, why are we using it? If it is more correct than A, why don't we replace A with B? If it is equally correct as A, what's the point in using it in assertions? In reality, there's only one magical function that knows the correct output to any input: You deciding what it should be.

This leads me to my next rules of testing: The only logical component in a test (excluding the packager and extractor) should be the SUT. and The SUT should only be connected to one side of the assertion.

Skipping the Extractor

Another idea that we might have is to skip the extractor by converting the expected output into an object that can be directly compared with the output of the SUT. That would look something like this:

//Define input
int input = 0;
 
//Package input
FibonacciCriteria fibonacciCriteria = mock(FibonacciCriteria.class);
when(fibonacciCriteria.getNumber()).thenReturn(input);
 
//Feed packaged input into SUT
FibonacciResponse packagedOutput = Fibonacci.fib(fibonacciCriteria);
 
//Create a packaged expected output.
FibonacciResponse packagedExpectedOutput = FibonacciResponse
        .builder()
        .result(0)
        .build();
 
//Assert that the actualOutput matches what we expect
assertEquals(packagedExpectedOutput, packagedOutput);

And we would represent it like this:

People generally consider this when they have outputs with a lot of values in them that need to be tested. This isn't as big of a deal as the other mistakes I have outlined above. It doesn't cause your tests to be useless. However, by using this system, you will inevitably make things harder on yourself down the road, even if the tests pass now.

The biggest problem is that we are implicitly trusting that the .equals() method will give us a meaningful result for our object. You probably know that in Java, when the .equals() method is not defined, calling .equals() will compare the object's addresses in memory. So, if I have two identical objects, but they were instantiated separately, if I compare them, they won't be equal. It's a common rookie mistake.

That's not a big deal with a simple object comprised of a few primitive fields. Just add a .equals() method (Using lombok makes this dead easy). But what if the object is comprised of other objects, themselves without a .equals() method? Again, no problem, just add a .equals() to those objects. But what if those objects are in different repositories that you own? Now it's becoming a pain, but still manageable. Just open a PR for those repositories, wait until you get the whole team to approve them, merge the code, and pull down the most recent artifact. But what if they are in an open source repository that you don't own? Well, you could figure out the process for submitting a PR, submit a PR, have a monthlong discussion about why it's necessary, get the PR merged, wait for the next release, and pull down the latest artifact. But what if it is in a closed source repository? In that case, you contact the maintainers, ask for the feature, wait for a year, check if it is added in the next release, see that it isn't, ask again, and get it in another year.

And all that's assuming that you know where the problem is! If you have a data structure that relies on hash codes to function, like a Map or Set, and somewhere along the line some jackass didn't implement .hashcode() correctly, it's anyone's guess where the problem is!

Now, you might object that if .equals() isn't implemented correctly, that could be a problem with the SUT. I agree. But it also might not be a problem with the SUT. Maybe someone has implemented a superclass differently than expected. Maybe we expect .equals() to work in a different way. But all that's besides the point, because .equals() should be tested on its own if it's so important! ¹⁾

Also, testing this way obscures specific problems with the code when they arise. If only one field is off, then the entire object is considered to be incorrect. There's no metadata about what went wrong, you only know that something went wrong.

Along those same lines, when something changes with the expected outputs, it's a slippery slope for an engineer to revert back to using the “know-it-all” function to avoid the “pain” of deciding what those outputs should be. If you already have an expected output in the same form as the actual output of the SUT, why not factor it out to a magical function that determines the correct value on the fly? In my experience, this is how the “know-it-all” function appears in codebases.

Finally, I would argue that this approach doesn't really save you much work. You still need to do the work of defining all of the outputs. Sure, it saves you the hassle of writing extractors and assertions for each field, but that isn't hard. You can generally just copy and paste the same block of code for each field.

So, save yourself the headache, and don't skip the extractor. The rule here is: One side of the assertion should be connected directly to the expected output.

Summary

The rules we have discussed in this section are:

One side of the assertion should be connected to the SUT.
The other side of the assertion should be connected directly to the output.
The only logic in the test should be in the SUT.

Writing Comprehensive Tests

I'm not going to write about this, as our understanding of testing edge cases is pretty good.

Writing Explicit Tests

When we write tests, we are writing the most complete and accurate documentation of the functionality of our SUT. If I write a javadoc comment explaining what the SUT does, the reader must take my word for it that this is what it does. For all the reader knows, I might be lying (or just incorrect). However, when we write tests, our documentation is self-proving. The reader can run the tests and be assured that this is how the SUT will act.

Despite this, since we write tests using code and code is not necessarily easy to read, we must make a special effort to ensure that our testing code is readable. Luckily, if you are following my advice from above, the logical complexity of the testing code should be low.

Using Given-When-Then Correctly

The Given-When-Then paradigm is, in most cases, the best way to organize tests. My only ask is that when you write GWT tests, you do it the right way. Otherwise, you'll just be cluttering up your codebase with unnecessary nesting.

The biggest strength of GWT is that every part of the testing topology is represented by a concept in GWT:

Given corresponds to defining and packaging input.
When corresponds to calling the SUT and extracting.
Then corresponds to extracting²⁾ and asserting an output.

A good strategy for accomplishing this is to define your input and output variables in your root class, and then use a @BeforeEach to set a variable to what you want it to be for the test. Here's what that would look like for testing the 0 case for Fibonacci.

@Nested
class FibonacciTestGWT {
    FibonacciCriteria criteria;
    FibonacciResponse response;
 
    @Nested
    class GivenAnIOf0 {
        @BeforeEach
        void init() {
            criteria = mock(FibonacciCriteria.class);
            when(criteria.getNumber()).thenReturn(0);
        }
 
        @Nested
        class WhenCallingFibonacci {
            @BeforeEach
            void init() {
                response = Fibonacci.fib(criteria);
            }
 
            @Test
            void thenResultIs0() {
                int result = response.getResult();
                assertEquals(0, result);
            }
        }
    }
}

Effective abstraction

Here's a few ideas for cleaning up your tests with abstraction. Remember, in all of this, the name of the game is readable, not convenient! All decisions regarding cleaning must be made with an eye for the reader.

Do as little packaging in your setup methods as possible. Instead, move the heavy lifting of packaging to dedicated private methods, like this:

void init() {
    criteria = createMockCriteria(0);
}
 
private FibonacciCriteria createMockCriteria(int number)
{
    FibonacciCriteria criteria = mock(FibonacciCriteria.class);
    when(criteria.getNumber()).thenReturn(number);
    return criteria;
}

If you need more than one parameter in your setup method, consider using a DTO with a builder to increase comprehension.

void init() {
    criteria = createMockCriteria(FibonacciCriteriaMockDTO
        .builder()
        .number(0)
        .build());
}
 
private FibonacciCriteria createMockCriteria(FibonacciCriteriaMockDTO fibonacciCriteriaMockDTO)
{
    FibonacciCriteria criteria = mock(FibonacciCriteria.class);
    when(criteria.getNumber()).thenReturn(fibonacciCriteriaMockDTO.number);
    return criteria;
}
 
@Builder
class FibonacciCriteriaMockDTO {
    int number;
}

Don't abstract away the actual calling of the SUT. After all, that's the most important part.
Writing helper assertion methods can be helpful, but they can also confuse readers if not done right. Follow these tips to write a good helper assert.
- Your assertion should take two arguments at most (excluding housekeeping arguments like “delta” for comparing floats)
- The first argument should be the packaged output, and the second argument should be the primitive to test.
- The name of the helper should begin with “assert”, followed by the concept being tested. Don't use the word “correct”, it's redundant.
- All asserts in the helper assertion method should be conceptually related. If possible, use just one.

@Test
void thenResultIs0() {
    assertResult(response, 0);
}
 
private void assertResult(FibonacciResponse response, int expectedResult)
{
    int result = response.getResult();
    assertEquals(expectedResult, result);
}

¹⁾

There are libraries that can do this for you, if it's a concern. Despite this, outside of when I was in training, I have never seen anyone write unit tests of a .equals() method

²⁾

This can also be done in the When step, if it is more convenient

OwenWiki

Table of Contents