Previously On Locally Sourced: The Entropy Essays are a series of essays about how programming practices inspired by Extreme Programming such as testing, pair programing, and object-oriented design play out on modern web projects. The first one was about test speed. And eventually we’ll get to why they are called Entropy Essays.
I want you to stop for a second and think: “why are you writing this test?”
Not “why are you writing tests in general?", why are you writing this next test, right now?
There is a somewhat famous marketing anecdote where in order to understand customer behavior, McDonald’s reframed the question “how can we get you to buy more milkshakes?” to “what are you hiring this milkshake to do?”
So, you there, about to write a test. What are you hiring this test to do? If the test was a person, what would its job description be?
You may have some off-the-cuff answers. It’s validating business logic, or making change easier, or something.
Actually, you are hiring the test for one reason:
You are hiring the test to fail.
That’s the test’s job description: to detect when a specified condition about the code that should be true is not true and warn you in that case – and only in that case.
A test that can never fail has no value (“can never fail” is different from “doesn’t fail because the underlying code never changes”).
A test should have a condition under which it is the only test in your suite that fails. A test that only fails in conjunction with other tests that fail at the same time has no value.
A test that doesn’t accurately describe a specific condition about the code that is its failure mode has, at best, minimal value.
One consequence of thinking about testing this way is that you start each test by thinking about and documenting what conditions would make the test fail, not what conditions would make the test succeed. The test description might be “it should fail if null is not handled properly” rather than “it should pass with a null argument”.
That’s a simplistic example, but In general, focusing on test failure rather than test success should lead you to write fewer duplicate or overlapping tests, and should make the code easier to diagnose on test failure, since each test should be written with a clear failure mode in mind. And the documentation of each test should give you a hint as to what the failure condition is.
Test-Driven Development fits into this plan perfectly. A test written in TDD is normally very explicit about failure modes, because a TDD test should fail when written. In a strict TDD process, it’s usually really clear what the next failure mode is because that failure mode is something that you want the code to do that the code is not currently doing. In contrast, when writing tests after the fact, you tend to write tests that are expected to pass when written, making it somewhat harder to factor the functionality into individual tests.
In a TDD process a test has a secondary job, which is to explore and document the design of the code to be written. This means that some TDD tests are more valuable as part of the that secondary job and are no longer valuable once they pass, usually because there are future tests that cover the failure mode. (For me, if I’m writing very strict TDD, it’s common that my first few tests do things like establish the existence of a class, if those fail, a lot of other tests are going to fail.) Once the code gets to green, part of the refactoring step is determining if all the tests still have value and removing the ones that don’t. Deleting useless tests is a great thing to do, but naturally the further away you get from the context when the test is written, the more reluctant people are to delete on the theory that it is no longer clear how necessary the test is.
How much are you paying the test?
Of course, you are not just hiring the test to do a job, you are also paying the test. Or, at least, you are paying for the test. The currency is time and effort. You pay for the test in two phases, once when you write it, and then continually over time as the test continues to be run by other developers. (To really push the analogy, you’ve got development time == signing bonus, and run time == salary.)
That said, developer time when writing the test is much more expensive than server run time. But server run time is forever, and with enough of server run time, that time does start to cost developer time.
This suggests that the least valuable tests in your test suite are likely long integration tests that mostly duplicate a lot of shorter-running unit tests, and therefore are unlikely to fail on their own. Tests that are literally, as they might say in the UK, made redundant. The place for integration tests is to test the seams that aren’t covered by unit tests.
With a little back-of-the-envelope estimation, I get a few implications:
- There is probably much more variance in test run time than in test develop time. Individual Ruby tests in a Rails app can go from microseconds to minutes, which is something like seven orders of magnitude. Test develop time likely ranges from minutes to hours, which is probably two or three orders of magnitude.
- Is it worth it to spend time speeding up tests? Let’s say it takes two hours to cut a tests run time from say, 10 seconds to 2 seconds? Just making up numbers here. If time was equivalent, spending two hours to gain 8 seconds makes itself up if the test runs 900 times on a server. How many times does your CI build run a day? I have no idea? 100? (so you make up the time in less than two weeks). Seems like it’d be worth it. From 2 seconds to 0.2 seconds? Seems like it’d be worth it over time.
- There is a long term cost to developer time in that future tests need to be understood and potentially fixed by future developers. Being clear about what the expected failure case should be is a good way to simplify that future developer’s job, just a little bit.
When you write tests, think about the job you are hiring the test to do, and write a test that will do the job clearly and quickly.