2024
Writing tests is an art form that developers often neglect. It stems from failing to consider how a test will be used, as tests are often considered simply a hurdle that needs to be overcome, a box-ticking exercise. This is the same common mistake that leads to the false assurance of 100% test coverage being equivalent to being certain your code does what it should (Simply put: it doesn’t). Test coverage simply checks if the test passes over each line in your code (or each statement/branch if you look at other metrics). It does not check if your code contains bugs.
Consider this simple function:
function doubleInput(input) {
return input * 2 || 1;
}
If we write a test that check this function doubles the input, giving it a value of 1, we can test it gets 2 back, and this passes. All coverage metrics report 100%. But there’s clearly a bug here, because if we pass 0, we get 1 back, since 0 can be coerced to false. What’s more, what happens when we pass a string? Or a boolean? Or an object?! Clearly the metrics haven’t captured these bugs, so we can’t rely on metrics alone for effective test coverage.
Similarly, how we write tests is critical to how useful they are, and we must carefully consider what a test’s responsibilities are. The obvious initial answer is that they need to fail when there is a problem, and pass when there is not. But this isn’t the whole story.
Imagine you’re debugging a failing test. In this scenario , what you need from the test is not simply whether it passes or not, what you need is more information on why it hasn’t passed. What value did it expect, what did it receive? Where in the code did this happen? Is it the test that’s wrong or the code?
This is debugging, and it’s a damn sight easier of you have tests that make the answers to these questions clear. Considering tests as a box-ticking exercise often leads us to create unclear or black-box test cases. These cases may pass, they may correctly test the functionality, but they are impossible to reason with, and pulling out enough information from them is a pain at best, and is the cause of many developer’s loathing of debugging.
The most common flaws I have seen in my career are Test Combinations, Prescriptive Requirements and Iteration. For the purposes of examples, I’m going to be using the popular JavaScript testing framework Jest, but this applies to other test tools in other languages as well.
Combination tests are very common if you are rushing through test-writing. They are bred from not thinking about the different parts of code that can go wrong, and are born when you first start writing out all your test cases. A test combination is a test that verifies more than one thing at the same time. Engineers often muddy the water by claiming that these unit tests are in fact integration tests, and so inherently test multiple things. There is a difference between considering a higher abstraction level for your test, and combining multiple requirements into a single test.
They can also be born when tests are separated from multiple angles, for example, testing the functionality of a module and testing the results from different types of input.
Consider a function that is supposed to return an object with min and max keys that correspond to the minimum and maximum values given as arguments.
function getRange(input1, input2, input…) {
// The implementation isn’t important
…
return { min, max };
}
We need to test various things, but some of them are that we get numbers back, and we also need to check that the range is correct. Often test cases need to assume certain things in order to pass, and it can be tempting to test these all at once in one test case, because doubling up would be repeating the same test, right? A test case for testing that 2 numbers are actually added together could be:
test(
'it returns the min and max range',
() => {
const result = getRange(6, 4,2);
expect(typeof result).toBe(‘object’);
expect(result).toHaveProperty(‘min’);
expect(result).toHaveProperty(‘max’);
expect(result.min).ToEqual(2);
expect(result.max).ToEqual(6);
}
);
Here we’ve tested loads of the functionality in one test. Great right? Wrong. What we’ve actually done here is combined tests together to make one less-obvious test. What is this test actually testing? It does indeed test the min and max ranges ae the correct values, but it also tests that we have an object returned, and that we have min and max properties. It’s tempting to write tests like this because otherwise, this test will error out with something like “cannot get property ‘min’ of undefined”. But in the case of tests, the error based on assumptions is useful in itself. It tells us that this test failed because something else went wrong. Another test checking that the result was in fact an object will give you more detailed information, so what you will end up with is a lot of tests complaining that they couldn’t complete, and then one test saying what actually went wrong, making following the breadcrumb trail easier. What’s more, tests passing when something they rely on is broken is a sign of dormant bugs, and something that should be looked into, so tests behaving like this is another useful tool for finding issues before they’re ever released.
I have seen QA Engineers combine tests in their test plans, to the point that it’s highly unclear what they would even say if this test failed. But these Engineers also have inherent feedback tools. Their test cases are poor because they know they will make up for it with human intelligence, their ability to explain issues. This can, however, be lost when test cases are passed from one Engineer to another, and so is no excuse for the poor plan. I’m not a QA Engineer however, so I’m considering this point a digression.
The solution to this one is pretty simple. You split the tests. Your first few tests should almost always be utterly inane. Is the result the right type? Does it have the right keys? Are the values of each key the right type?
It’s repetitive, it’s boring, but it will make your life easier in the end. Usually, the longest-lived test cases are the most obvious ones. You’re writing tests that you assume will always be the case, and these are the failures that are the most annoying to debug.
Overly-prescriptive Requirements is similar to Combination Tests, as it comes from the same assumption that testing multiple things at once is more efficient and therefore better. Result narrowing is often the result of over-zealous implementation of testing modules with complex interfaces - more specifically complex outputs. Imagine a module we’re going to call the generator. This module takes some state and uses it to construct a set of commands to pass to a UI to draw things. The state can be relatively simple for this. Imagine an app that is basically a set of non-interactive static web pages. The state is just the current page route, of which we have only a handful. But the content of the page is very complex, especially if you consider what it takes to draw even the simplest of UI components. Every text character is a complex path to draw. A button is multiple calls for borders, fills, gradients, etc.
With such a daunting interface to test, it can be tempting to write a case for each route, and just save the draw calls for an output you know is correct to compare to others later. While relatively quick to implement, this sort of testing is a huge burden on future development. Literally EVERY change to how a page appears visually will require the developer to update the test with a new master draw call object. This object probably has thousands of entries, and is therefore completely incomprehensible. Every time a developer updates, what they’re actually doing is just telling the test to be quiet. All this test does is alert a developer that the changes they made some alteration to the visual appearance of the page. Any of the Where, How, Why information is buried in a pile of obscurity.
Even if we had been a bit more diligent, testing only the result of each individual draw call separately, a draw call is a complex construct in itself, with detailed paths, colours, stroke styles that doing a full object comparison in a single test makes problematic. It opens us up to overly-demanding.
Let’s take the improved example of drawing a line
test(‘draws a line’, () => {
expect(getLine(state)).toEqual({
// complex draw call
…
});
Imagine that some drawing engine correction occurred, and suddenly instead of getting 39.9 for a line start point we now get 40. This was actually a bug fix, and we could never actually draw 39.9 pixels because it was always rounded. But now we get a failure, because the way we constructed the condition is restrictive. Imagine that we want to add opacity to the way we draw a line. Suddenly all our test cases for lines fail. Not because anything broke, but because the tests demanded such a strict set of requirements that they couldn’t handle an extra - maybe completely unused - property.
In these cases, really all the drawing object testing should happen in unit tests inside the module. If we wanted to test the interface, we would only test for cases like “does it only return things that can be drawn”, or “does it call the correct submodules”. Writing test requirements that are faithful to the actual restrictions is time consuming. It will make your tests longer to write, but it will make working with them in the future faster and easier.
In many instances, you will probably need to create lots of tests that do more or less the same thing. In normal code, you would often create a for loop to iterate through some list and perform the task over and over. Why not create some sort of test case array, that you can feed in and call test.each on? But the same rules don’t apply in many test frameworks. Imagine you have one failing case in your array. Test tools allow you to separate and test just one test at a time, a useful feature. But what we have just created is not many tests, it is in fact, only one. Now, to check if our test case passed or if the result even changed, we need to run the other 1000 or so tests in that test as well. We need to trawl through the 999 “test passed” lines in your terminal and find the one that we want.
Iteration is perhaps a slightly contentious one, as it directly contradicts the engineering DRY principles. Disclaimer: There are some tools that handle iterative tests in a better way, but these are few and far between.
The solution to this is to make sure that you separate out your tests. Avoiding any .each function and writing the tests out manually is usually the best way to go. To make the repetition less severe, you can abstract out any setup to helper functions. Because of the scale at which some of these tests are implemented, it is not always practical to do this. But if you feel the need to write out tests iteratively, this is most likely a red flag that there are issues with your design. Either you aren't confident enough that your tests cover the requirements, so you repeat them, or you have done something worse, like coupling your state to the iterations.
There are of course pressures that lead to tests being written like this. Often, the most critical and first tests to be written are end-to-end ones, as they cover the actual user journeys through the application. But good test suites, when they fail, will give multiple failures. Parent test cases will fail because one of their children failed. This breadcrumb trail will lead you down into the code until you are left with a very small area from which you know the problem came from. And from 20 lines of code, finding that issue is hopefully trivial.