Wednesday, December 19, 2018

Which Is Better: Big Tests or Little Tests?

Two people arguing: "I only use NUTS!" and "I only use BOLTS!" Both are foaming at the mouth.

As an organization starts to get more healthy, a focus is placed on building a meaningful test suite for their product or products. Should you write big tests that cover large swaths of your system or should you write little tests that give you precise, reliable feedback on individual parts?

The largest test I can imagine is one that validates an entire value stream for a completely-assembled system. This test seems valuable because a single artifact covers so much. However, if anything changes, every "big test" that depends on that thing changes. So it's hard to use a big test for exhaustive testing of all the parts.

The smallest test I can imagine verifies that a single behavior in a single class works as the development team expected. This kind of test seems valuable because it provides precise, fast, granular feedback. However, you run the risk of missing the details if you test all the details. So you can't really get any feedback on how the parts assemble into the whole.

It's important to remember that these tests are options available to you, not alternatives from which you must make a selection. They work together better than either path works, alone.

Large tests, such as tests for real user experiences against a GUI test that all the parts of a system line up and accumulate into something meaningful. Small tests tell you exactly how each part of a system should function and whether or not does so.

The most effective strategy is a series of batteries of tests. Your small, "unit" tests tell you all the little parts work. That runs first and you don't waste time on tests that show behaviors, classes, or modules in combination if one of them doesn't work in isolation. Having this in place means you don't need to test any individual behaviors in integration - you only need to test the integration, itself. Each successive "rank" of tests depends on the preceding rank passing before it runs and adds a little bit more confidence in how the system works before you run the next rank.

Ultimately, you get to the user-facing or acceptance test. This test should primarily prove neither that the system is built out of parts that work nor that those parts align correctly. Instead, it should prove that all the pieces in alignment produce the actual experience or effect that was intended.
A series of filters through which every change must pass. The first is "parts", with "broken" headed for a trashcan and "working" headed to the next filter. The next filter is "Alignment?" with "disjoint" going to a garbage can and "coherent" going to the next. The final filter is "Value?" with "missing" going to a garbage can and "expected" going onward to unknown destinations.