Saturday, December 14, 2019

Making my tests multi-bound

To multi-bind a test is to make it so that a single test enforces the same semantic requirement against different portions of a system.

This has been going on for a long time but not everyone noticed it. Teams that use the Gherkin syntax to define a behavior are likely to exploit those definitions in two ways:
  1. Bind automatically with Specflow or Cucumber.
  2. Treat as a manual test script.
This is an early form of multi-binding.

I'm talking about something more than that and it's the foundation of stability in all my pipelines. So we need to discuss it, first.

In my environment, any given test can be executed all of the following ways:
  1. As a unit test.
  2. As an integration test with no networking (everything runs in a single process).
  3. As an integration test for the backend with mocked data.
  4. As a gate between the blue and green environments of the backend.
  5. As an integration test for the Android client, with a local server and a mocked database.
  6. As an integration test for the WebGL client, with a local server and a mocked database.
  7. As a gate between the blue and green environments of one of my WebGL deployments.
There's nothing to stop me from adding more ways to run tests if I need them. This seems pretty good for what amounts to a one-man show.

Not every test applies in every context but many tests apply in most of the contexts.

Consider this test:
Scenario: Invalid moves are rejected by server
Given client A authenticated with C
And client B authenticated with C
And game X was started on client A
And invalid move M was spoofed by client A
When game X is continued on client B
Then client B does not show the effect of move M

It's pretty straightforward.

A traditional design with Specflow might look something like this.

A typical SpecFlow design
Specflow consumes your feature file and parses your "given"s, "when"s, and "then"s. It finds the appropriate bindings and invokes them with parsed parameters. In a traditional application of a tool like this, those bindings would then directly couple to whatever they test. In fact, I've seen recommendations around Cucumber that people make their test data classes and their production classes the same thing.

I test this at both the unit level and the API component level.

Why would I do this? Simple: I want the fast feedback of a unit test and the complete coverage of an API-level integration test. I don't want any of the sacrifices that come with only having one or the other and I don't want to pay for the duplication embodied by writing separate unit and integration tests.

Honestly, I periodically think about enrolling this test as a client test, too. The only thing that holds me back is that I want to keep the execution time of my client test libraries down. As I type that, I realize that this might be a case for periodic runs.

...but I digress.*

If I were going to try to do multi-binding with the above design, I would just replicate my bindings and run my tests over and over again with different bindings libraries. While that is an option, I don't consider it to be a viable one. It forces me to replicate a lot of binding code just so I can point it in different places.

The bindings, after all, represent your testing-domain-specific language. Copy-paste-modify is at least as bad for them as it would be for production code.

With as many variations as I needed to support with Dwindle, the duplication would have been doubly bad because I not only needed different entry points, but I need different mixtures of the same entry points.

Fundamentally, multi-binding is about managing variation. This should come as no surprise because this is the fundamental problem in all software development.

The main tool software developers have for managing variation is design patterns thinking. Patterns thinking tells us, among other things, to consider what driver for variation exists in the problem we are addressing, as opposed to the code we have today.

The thing that varies between unit tests and API integration tests is the context in which the test is run. Patterns thinking says to encapsulate that variation. That is, make sure the tests don't know which test context they are in.

So step 1 is to define an abstraction behind which variation can be encapsulated:

A design that supports multi-binding without redundant bindings
In this design, the bindings still represent the domain-specific language around testing. However, they have a test context** object injected and have no way of knowing which kind of test context to use. The test context implementations provide different entry points to the bindings in a way that the bindings must treat abstractly.

As a result, I can plug in as many different ways to test as I like without needing to modify my bindings.

There are a lot of technical steps that have to be done inside the box for the API test context but that's not the topic of this post. Also, there are tons of posts out there about how to test a local deployment of a RESTful API.

To summarize, the unit test context does this:
  • Instantiate the code for the server with the data layer mocked in memory.
  • Create mocked network access objects that directly invoke the server code.
  • Define a way to create clients as in-memory objects that use the game logic but mock the UI.
While the API test context does this:
  • Create a folder structure for mocked data.
  • Fire up a local Azure Functions instance with the API, configured to talk to the on-disk database.
  • Define a way to create clients as in-memory objects that use the game logic but mock the UI.
One could argue that those contexts do the wrong thing but all that would really mean is that they are named incorrectly. After all, if there is a "right" thing to be doing that I'm not doing, I can just add another context. As you can see from the earlier list of the ways I can execute a test, I already have.

This is just the beginning of the conversation. There are technical details in making each test mode work. There are many more design decisions that make something like this flexible and maintainable for a sophisticated system. There is how to fit these things into your pipeline and there is how to conveniently run various test modes locally.

You get the idea.

However, this entry is already getting really long. So I'm going to need to talk about the client tests and the deployment gating contexts in another entry.

* Someday, I will digress on a dark and stormy night.

** This may be an unfortunate choice of names, as MsTest already has a TestContext class. Ignore that. You all know I'm bad about that so carry on living your life.