This is an old revision of the document!
Table of Contents
A Biased Guide to Unit Tests
Introduction
Testing is important. It's really fucking important. Yes, it can be annoying, but so is checking that you have your parachute on before jumping out of a plane. Despite this, the world of thought surrounding unit testing still seems to be in its infancy compared to thought about general programming. Many very competent developers will rush through the process of unit testing, making significant errors in the process, and many other competent developers will look at the unit tests and not see anything wrong with them.
Part of this is due to the fact that it isn't necessarily very obvious what makes a good unit test. We know what makes good functional code - it should be efficient, extensible, loosely coupled, and, above all, correct (that is, it should do what we expect it to). Most of us are pretty good at spotting issues with functional code. For example, is my implementation of the Fibonacci sequence correct?
/**
* Computes the ith number in the fibonacci sequence.
*/
public static int fib(int i)
{
    if (i <= 0)
    {
        return 0;
    }
    int previous = 0;
    int current = 1;
    for (int a = 1; a < i; ++a)
    {
        current = current + previous;
        previous = current;
    }
    return current;
}
There are several things wrong here. It isn't correct (in fact, this method prints 2^(i-2) for i >= 2), it isn't clear what's going on (why are we using a for loop?) and it isn't the most efficient way of calculating the answer ( it can be done in O(1) time using the golden ratio ).
Now, the Fibonacci sequence is a fairly complex piece of code, so let's consider something much simpler. We have three classes: Person, Ticket, and BoxOffice. When BoxOffice.getTicket is called with a Person, a ticket is created with a unique id and is assigned to the person. Easy to write, easy to test, right?
class TicketTest {
    public static int PERSON_ID = 5;
    @Nested
    class GivenAPerson {
        @Nested
        class WhenGettingATicket
        {
            Person mockPerson;
            @BeforeEach
            void setup()
            {
                mockPerson = mock(Person.class);
                when(mockPerson.getId()).thenReturn(PERSON_ID);
            }
            @Test
            void thenPersonIsCorrect()
            {
                Ticket actualTicket = BoxOffice.getTicket(mockPerson);
                Ticket expectedTicket = getExpectedTicket(mockPerson);
                assertEquals(actualTicket.person, expectedTicket.person);
            }
            @Test
            void thenIdIsCorrect() {
                Ticket actualTicket = BoxOffice.getTicket(mockPerson);
                Ticket expectedTicket = getExpectedTicket(mockPerson);
                assertEquals(actualTicket.id, expectedTicket.id);
            }
        }
    }
    Ticket getExpectedTicket(Person person) {
        return new Ticket(person, 3);
    }
}
Do you see the problem? Our second test case, thenIdIsCorrect, is checking that the id is what we expect it to be. We check this by getting an expected Ticket through the getExpectedTicket method. But, in this method, we just set the id to be 3! This number pops into existence without any explanation.
Our first test case is… okay. However, we shouldn't take for granted that there is a magical method that is somehow able to correctly determine the “expected” behavior. I will explain this fallacy later, so keep that in your back pocket.
Also notice that we set up the mock person in the “WhenGettingATicket” class, instead of the “GivenAPerson” class, where it probably belongs. Also, the same code is duplicated between the two test cases, although it could be better moved to a setup method.
In this article, I will propose my definition of what makes a good unit test.
Value Hierarchies
When we write functional code, there is a hierarchy of value that defines quality.
- Correct: It should do what we expect.
- Efficient: It should not waste time or space.
- Resilient: Changes in how this code is used, or how other code is defined, shouldn't impact our code.
- Extensible: We should be able to make changes to the code without a massive amount of work.
- Readable: Humans should be able to understand what is happening.
When we write testing code, however, we have different requirements.
- Honest: The code should test the SUT.
- Comprehensive: The code should test all edge cases.
- Brittle: Small changes in the SUT's code should break the tests. (Deterministically)
- Explicit: The code should “explain” the actual functionality of the SUT.
This difference in value means that applying best practices from one paradigm to the other causes poor outcomes. For example, when writing functional code, it is considered an antipattern to copy and paste code. Instead, a good program has similar tasks being handled in the same way. Developers will sometimes fall into the trap of thinking that this is for their benefit, because it is more convenient. While this is true, the more important reason that copying and pasting is considered an antipattern is because it is likely to introduce bugs, therefore causing problems with our most important functional requirement: Correctness. When a change needs to be made, that change should be made in exactly one location, not many different locations. Inevitably, one of those locations will be missed.
However, I would argue that, for testing, copying and pasting is not an antipattern. This is because we want our tests to be brittle. If abstracting repeated code causes our tests to become more resilient, abstraction itself can become an antipattern.
Writing Honest and Brittle Tests
The Topology of Testing
Tests are, ideally, all about data. We have some SUT, an input, and a desired output (or outcome). We feed the input into the SUT and check if the result matches the desired output. Below, I have mapped out what this looks like in a pseudo-flow chart.
 
For example, if we were testing our “fib(i)” function from above, we could write that test like this
//Define input int input = 0; //Feed input into SUT int actualOutput = Fibonacci.fib(input); //Assert that the actualOutput matches what we expect assertEquals(0, actualOutput);
Of course, it's not always that easy. Usually, we will need to set up complex objects in some way. For that, we introduce the Packager. The purpose of the packager is simply to build a more complex object (or series of objects) to give to the SUT.
 For example, let's say we decide to package the input to the fib(i) function into a Criteria object (🤷♂️). No sweat, we would test it like this:
For example, let's say we decide to package the input to the fib(i) function into a Criteria object (🤷♂️). No sweat, we would test it like this:
//Define input int input = 0; //Package input FibonacciCriteria fibonacciCriteria = mock(FibonacciCriteria.class); when(fibonacciCriteria.getNumber()).thenReturn(input); //Feed packaged input into SUT int actualOutput = Fibonacci.fib(fibonacciCriteria); //Assert that the actualOutput matches what we expect assertEquals(0, actualOutput);
Finally, we might also need to extract data from the output of the SUT. The Extractor's only job is extract data from a complex value into a simple value that is easy to test on.
For example, let's say that we want fib to now return a Response object. Again, no big deal.
//Define input int input = 0; //Package input FibonacciCriteria fibonacciCriteria = mock(FibonacciCriteria.class); when(fibonacciCriteria.getNumber()).thenReturn(input); //Feed packaged input into SUT FibonacciResponse packagedOutput = Fibonacci.fib(fibonacciCriteria); //Extract output from packaged output. int actualOutput = packagedOutput.getResult(); //Assert that the actualOutput matches what we expect assertEquals(0, actualOutput);
This is what the ideal test looks like, outside of semantics (and trust me, I have plenty to say about semantics!). It's exactly what is needed to solve the problem, and nothing more. Enough to make a grown man cry.
Mistakes, quantified
Let's start with obvious mistakes, like this one.
//Define input int input = 0; //Package input FibonacciCriteria fibonacciCriteria = mock(FibonacciCriteria.class); when(fibonacciCriteria.getNumber()).thenReturn(input); //Feed packaged input into SUT FibonacciResponse packagedOutput = Fibonacci.fib(fibonacciCriteria); //Extract output from packaged output. int actualOutput = packagedOutput.getResult(); //Assert that the output matches the input assertEquals(0, input);
Let's visualize this in our flowchart abstraction.
 
This seems silly in the way I've done it above, but I have actually seen this done. This generally occurs in what I call “pass-through” tests, which are tests in which the input and output are expected to match. It also generally occurs with some level of indirection, for example, a method might hide the details of creating data and asserting on the data.
A slightly more common mistake is this one:
//Define input int input = 0; //Package input FibonacciCriteria fibonacciCriteria = mock(FibonacciCriteria.class); when(fibonacciCriteria.getNumber()).thenReturn(input); Fibonacci fibonacci = mock(Fibonacci.class); FibonacciResponse fibonacciResponse = mock(FibonacciResponse.class); when(fibonacciResponse.getResult()).thenReturn(0); when(fibonacci.fib(any())).thenReturn(fibonacciResponse); //Feed packaged input into SUT FibonacciResponse packagedOutput = fibonacci.fib(fibonacciCriteria); //Extract output from packaged output. int actualOutput = packagedOutput.getResult(); //Assert that the output matches the input assertEquals(0, actualOutput);
From this, let's extract the first, and most important rule of testing: All assertions should be connected to the SUT.
The know-it-all function fallacy
Wouldn't it be easier to write tests if we didn't need to define both the inputs and the outputs? What if we had a magical function that could, given some input, give us the correct output? Then, we could write tests like this:
//Define input int input = 0; //Package input FibonacciCriteria fibonacciCriteria = mock(FibonacciCriteria.class); when(fibonacciCriteria.getNumber()).thenReturn(input); //Feed packaged input into SUT FibonacciResponse packagedOutput = Fibonacci.fib(fibonacciCriteria); //Extract output from packaged output. int actualOutput = packagedOutput.getResult(); //Get expected output FibonacciResponse expectedResponse = getExpectedFibonacciResponse(fibonacciCriteria); int expectedOutput = expectedResponse.getResult(); //Assert that the actualOutput matches what we expect assertEquals(expectedOutput, actualOutput);
It looks like this:
Hey, that's great! Now, the only problem is implementing getExpectedFibonacciResponse, and –
private FibonacciResponse getExpectedFibonacciResponse(FibonacciCriteria criteria) { return Fibonacci.fib(criteria); }
Which makes our test look like:
Oh. Oh no. Well, never mind that. Let's implement this function the right way!
private FibonacciResponse getExpectedFibonacciResponse(FibonacciCriteria criteria) { int i = criteria.getNumber(); int result; if (i <= 0) { result = 0; } else { int previous = 0; int current = 1; for (int a = 1; a < i; ++a) { int temp = current; current = current + previous; previous = temp; } result = current; } return FibonacciResponse.builder().result(result).build(); }
Ah, all done. Now, let me just check my implementation of the original function, and -
public static FibonacciResponse fib(FibonacciCriteria fibonacciCriteria) { int i = fibonacciCriteria.getNumber(); int result; if (i <= 0) { result = 0; } else { int previous = 0; int current = 1; for (int a = 1; a < i; ++a) { int temp = current; current = current + previous; previous = temp; } result = current; } return FibonacciResponse.builder().result(result).build(); }
Shit.
Let's say we have a function A, and another function that computes the correct output of A, called B. If B is less correct than A, why are we using it? If it is more correct than A, why don't we replace A with B? If it is equally correct as A, what's the point in using it in assertions? In reality, there's only one magical function that knows the correct output to any input: You deciding what it should be.
This leads me to my next rules of testing: The only logical component in a test (excluding the packager and extractor) should be the SUT. and The SUT should only be connected to one side of the assertion.
Skipping the Extractor
Another idea that we might have is to skip the extractor by converting the expected output into an object that can be directly compared with the output of the SUT. That would look something like this:
//Define input int input = 0; //Package input FibonacciCriteria fibonacciCriteria = mock(FibonacciCriteria.class); when(fibonacciCriteria.getNumber()).thenReturn(input); //Feed packaged input into SUT FibonacciResponse packagedOutput = Fibonacci.fib3(fibonacciCriteria); //Create a packaged expected output. FibonacciResponse packagedExpectedOutput = FibonacciResponse .builder() .result(0) .build(); //Assert that the actualOutput matches what we expect assertEquals(packagedExpectedOutput, packagedOutput);
And we would represent it like this:
People generally consider this when they have outputs with a lot of values in them that need to be tested. This isn't as big of a deal as the other mistakes I have outlined above. It doesn't cause your tests to be useless. However, by using this system, you will inevitably make things harder on yourself down the road, even if the tests pass now.
The biggest problem is that we are implicitly trusting that the .equals() method will give us a meaningful result for our object. You probably know that in Java, when the .equals() method is not defined, calling .equals() will compare the object's addresses in memory. So, if I have two identical objects, but they were instantiated separately, if I compare them, they won't be equal. It's a common rookie mistake.
That's not a big deal with a simple object comprised of a few primitive fields. Just add a .equals() method (Using lombok makes this dead easy). But what if the object is comprised of other objects, themselves without a .equals() method? Again, no problem, just add a .equals() to those objects. But what if those objects are in different repositories that you own? Now it's becoming a pain, but still manageable. Just open a PR for those repositories, wait until you get the whole team to approve them, merge the code, and pull down the most recent artifact. But what if they are in an open source repository that you don't own? Well, you could figure out the process for submitting a PR, submit a PR, have a monthlong discussion about why it's necessary, get the PR merged, wait for the next release, and pull down the latest artifact. But what if it is in a closed source repository? In that case, you contact the maintainers, ask for the feature, wait for a year, check if it is added in the next release, see that it isn't, ask again, and get it in another year.
And all that's assuming that you know where the problem is! If you have a data structure that relies on hash codes to function, like a Map or Set, and somewhere along the line some jackass didn't implement .hashcode() correctly, it's anyone's guess where the problem is!
Now, you might object that if .equals() isn't implemented correctly, that could be a problem with the SUT. I agree. But it also might not be a problem with the SUT. Maybe someone has implemented a superclass differently than expected. Maybe we expect .equals() to work in a different way. But all that's besides the point, because .equals() should be tested on its own if it's so important! 1)
Also, testing this way obscures specific problems with the code when they arise. If only one field is off, then the entire object is considered to be incorrect. There's no metadata about what went wrong, you only know that something went wrong.
Along those same lines, when something changes with the expected outputs, it's a slippery slope for an engineer to revert back to using the “know-it-all” function to avoid the “pain” of deciding what those outputs should be. If you already have an expected output in the same form as the actual output of the SUT, why not factor it out to a magical function that determines the correct value on the fly? In my experience, this is how the “know-it-all” function appears in codebases.
Finally, I would argue that this approach doesn't really save you much work. You still need to do the work of defining all of the outputs. Sure, it saves you the hassle of writing extractors and assertions for each field, but that isn't hard. You can generally just copy and paste the same block of code for each field.
So, save yourself the headache, and don't skip the extractor. The rule here is: One side of the assertion should be connected directly to the expected output.
Writing Explicit Tests
When we write tests, we are writing the most complete and accurate documentation of the functionality of our SUT. If I write a javadoc comment explaining what the SUT does, the reader must take my word for it that this is what it does. For all the reader knows, I might be lying (or just incorrect). However, when we write tests, our documentation is self-proving. The reader can run the tests and be assured that this is how the SUT will act.
Despite this, since we write tests using code and code is not necessarily easy to read, we must make a special effort to ensure that our testing code is readable. Luckily, if you are following my advice from above, the logical complexity of the testing code should be low.
Given-When-Then
The Given-When-Then paradigm is, in most cases, the best way to organize tests. It is my personal favorite way to write unit tests. My only ask is that when you write GWT tests, you do it the right way. Otherwise, you'll just be cluttering up your codebase with unnecessary nesting.
The biggest strength of GWT is that every part of the testing topology is represented by a concept in GWT:
- Given corresponds to defining and packaging input.
- When corresponds to calling the SUT and extracting.
- Then corresponds to extracting2) and asserting an output.
A good strategy for accomplishing this is to define your input and output variables in your root class, and then use a @BeforeEach to set a variable to what you want it to be for the test. Here's what that would look like for testing the 0 case for Fibonacci.
@Nested class FibonacciTestGWT { FibonacciCriteria criteria; FibonacciResponse response; @Nested class GivenAnIOf0 { @BeforeEach void init() { criteria = mock(FibonacciCriteria.class); when(criteria.getNumber()).thenReturn(0); } @Nested class WhenCallingFibonacci { @BeforeEach void init() { response = Fibonacci.fib(criteria); } @Test void thenResultIs0() { int result = response.getResult(); assertEquals(0, result); } } } }






