Wednesday, April 8, 2020

Some updates to my NuGet packages

A while ago, I released a few SpecFlow plugins on NuGet to make it easier to reuse SpecFlow feature files.

I decided that one of the features of one of the plugins really ought to be its own plugin. So I extracted it to this package: HexagonSoftware.SpecFlowPlugins.FilterCategoriesDuringGeneration.

The new plugin is a lot more powerful than the old tag-stripping feature and works in a much more meaningful way.

The old feature literally stripped tags out of a scenario or feature. That means assumptions made about tags present in generated tests could be violated. If you wrote a binding that was coupled to the presence of a tag and then stripped that tag out (because you use it as a control tag), the binding would cease to be found.

The new plugin doesn't have that problem. It works by decoupling test-category generation from the tags in a scenario or feature file.

It has two opposing features: category-suppression and category-injection.

Suppression (right now) works by applying a set of regular expressions to each tag. If any of these regular expressions matches, category-generation is suppressed for that tag.

Alternatively, injection adds one or more tags to every test generated for a particular project.

The exact details can be view in the readme.txt that pops up when you install the package but I'll give a brief example, here. As with all my SpecFlow plugins, it is controlled by a .json file.

Here is an example from one of my test assemblies:

{
  "injected-categories": [ "Unit Tests" ],
  "suppress-categories": {
    "regex-patterns": [ "^BindAs:", "^Dialect:", "^Unbind:" ]
  }
}

The regex patterns for suppressed-categories each select a different prefix. If that's not obvious to you, here's my favorite source for exploring the regular expression syntax. Anyway, this allows me to simply and concisely select who families of tags for which category-generation should be suppressed.

The injected categories are injected directly as categories into generated tests verbatim.

What this does, when paired with the other plugins that allow for (tag-based) selective sharing of scenarios among multiple test assemblies, is allow me to "rewrite" the categories for a generated assembly.

Even though I use tags to select certain dialects into particular scenarios and to control which scenario manifest in which gate, this is what my unit test list looks like when it's grouped by category, like so.

A clean categories list despite my use of dozens of tags for various control purposes

As you can see above (and, in more detail, below), un-rewritten tags still show up as categories. So a single test might end up in two categories but only when that's what I really want.

WillTakeDeathStepQuickly and WillTakeKillShotQuickly both show up in Unit Tests and Slow

As with all things on NuGet.org, this plugin is free. So, if you think it can help you, you don't have much to lose by downloading it and giving it a try.

Friday, April 3, 2020

Here's a little time-saver for managing NuGet packages in your solution

This entry assumes you use Microsoft Visual Studio. If you don't use that, please bookmark this entry so you can revisit it when you've come to your senses.

I like to tend to my NuGet dependencies a lot and my preferred view for doing so is the "Manage NuGet Packages for Solution" window. This can be accessed from the context menu (below), the tools menu (further below), or the search bar (third below).

The first option is pretty quick except that you have to scroll up in your Solution Explorer. So it's kind of disruptive. You also have to fish through a meaty context menu but it's near the top and in a predictable way. So, with the power of habit, this one can be made reasonably cheap.

The tools menu is about as useful as any menu bar in a large application like Visual Studio (Visual Studio desperately needs a ribbon, like Office has). I never use it.

I've been trying to train myself to type Ctrl+Q and then find the option but, for some reason, it just won't take. Even with my trick for retraining muscle memory, I just couldn't make myself do it the new way.

The reason I couldn't make myself adopt the new way is that the new way and the old way both suck. The right way to do this is to have a button I can push without any preceding navigation steps. Like a custom toolbar.

I'm assuming you know how to make a custom toolbar in Visual Studio. Actually, I'm just assuming you are smart enough to figure it out because it's super easy.

What's not easy was finding the damn command because it has a nonsense name in the toolbar editor dialog. That's not really the fault of the dialog - a lot of commands show up correctly. It's something about the way the NuGet commands are configured.

Here is the name: cmdidAddPackagesForSolution




It shows up correctly in the toolbar, just not in that editor dialog.

If you ever need to find a command that is improperly named, the trick is to pretend to edit the menu, first.

Go into the Customize dialog for the toolbar you are trying to edit and then switch to editing the menu group where the command is already located.




When you've done that, you'll be able to see what the command's name is.





Thursday, April 2, 2020

Recent improvements to my NuGet packages

I've improved my NuGet packages for sharing feature files and constraining test-generation.

As of 2020.4.2.5, the following changes have been made:

  • Feature files are imported to a hidden folder under your base intermediate objects path (usually obj\). This helps make it more clear that the imported feature files are build-artifacts, not maintainable source code.
  • Control tags for other test libraries can be stripped out of the test-generation process. This will help prevent miscategorization.
  • Shared scenario templates/outlines which were being broken by the incredibly-offensive importance of this class now work properly.

Fun.

Sunday, March 29, 2020

The road to pipeline-as-code on Azure DevOps

I've been setting up and maintaining continuous integration builds for a long time. Back when I started, pipeline as code was the option.

Then newer, "better" options that were mostly configured via a GUI were offered.

Now the industry has come full circle and realized that pipelines are automation and automation is code. Ergo, pipelines should be code.

Dwindle was born before that realization had swept its way through the majority of the industry. Most importantly, before the option of a pipeline as code was even a little mature in the Azure DevOps offering.

Now, however, it seems the idea is pretty widely accepted. Again, more importantly, it's accepted - even evangelized - by Azure DevOps. The trip for "that's a good idea" to near feature-parity with the GUI-based pipelines has been dizzyingly fast.

So I had an opportunity to convert my UI-configured pipeline in a code- (YAML-)configured pipeline.

That effort was not without its obstacles. The "get the YAML for this job" button doesn't really work perfectly. Service connections are persnickety. Although: they were kind of difficult to deal with in the UI-configured world, so maybe that's not fair to attribute to switching to YAML.

Most significantly, though, the unification of builds and releases into single pipelines represents a nontrivial (but good) paradigm shift in how Azure DevOps expects developers to shepherd their code out to production.

Previously, I had to introduce some obnoxious kludges into my system that I have replaced with simple, first-class features of the modern pipeline-definition system.

For instance, I used to enter "red alert" work items into a work-tracking system whenever validating a blue environment failed. These red alert work items would prevent any promotions of any environment until they were cleared, which happened automatically at the end of the successful validation of a replacement promotion-candidate. This meant, among other things, that my pipeline was coupled to my work tracking system.

As a result, validations happened in a haphazard way. One promotion could, theoretically, occur if validation along another stream of deployment/promotion failed later.

Likewise, the way I had to marshall build artifacts was a pain. I used to have to download them in one build only to re-upload them so that the next build could access them from its triggering pipeline. That's a lot of wasted time.

Stages, dependencies, and pipeline artifacts changed all that. Pipeline artifacts allow me to upload an artifact one time and download it wherever I need it. Stages and dependencies allow me to ensure all of the following:

  • All tests - unit, server, and client -  happen before any deployments to any blue environments.
  • All deployments to blue happen before any environment-validation steps.
  • All validation happens before any promotion.

The Environments feature makes it easy to track and see at a glance which deployments and promotions have occurred. They also give you a place to introduce manual checks. For instance, because I'm resource-constrained, I only want deployments to blue environments to happen when I approve. Likewise, I only want to promote to any green environments after I've given the okay.

The transition has been largely positive. As a single-person operation, it's made my life easier since I completed it.

As I said, though, it came with challenges.

I will be exploring those challenges and their corresponding rewards in near-future blog entries.

Here are two new NuGet packages for SpecFlow you might enjoy

Hey all. I made my first two packages on nuget.org, today. I must say, it's a lot easier than it was the last time I looked into it. No .nuspec file is required. Uploading at the end of an Azure DevOps pipeline is a snap. The NuGet part of the problem is officially painless.

If you want to cut to the chase, the packages are here:

  1. HexagonSoftware.SpecFlowPlugins.ImportSharedFeaturesDuringGeneration
  2. HexagonSoftware.SpecFlowPlugins.GenerationFilter

I think it makes sense to explain what they do, what they are for, and how to use them, though.

Import Shared Features During Generation


The former is not really a plugin for SpecFlow so much as it is an extension of the .csproj MsBuild ecosystem. It allows you to designate a reference to a set of feature files. Each such reference points to an external folder (presumably full of feature files) and maps it to a folder inside the project.

This is accomplished by editing the .csproj file and adding a fairly-standard-looking reference to it. Here's an example:

<SpecFlowFeaturesReference
  Include="..\HexagonSoftware.SpecFlowPlugins.ImportSharedFeaturesDuringGeneration.Import"
  ImportedPath="SharedFeatures" />

That will cause all the feature files under ..\HexagonSoftware.SpecFlowPlugins.ImportSharedFeaturesDuringGeneration.Import (relative to the project root, of course) to copied to SharedFeatures (again, project-relative) prior to test-generation.

The internal folder (in this case, SharedFeatures) is completely controlled by the reference. Everything in it is destroyed and rebuilt every build. For my own sanity, I add those folders to my .tfignore (or .gitignore, if I'm feeling masochistic).

Unfortunately, at this time, the best way I was able to get it to work was by making a folder under the project root. In the future and have the files actually be a part of the project while generation occurs. This has a little to do with how difficult it was to access the internals of the SpecFlow generation task from a plugin and a lot to do with how difficult actually getting a reference to the task assembly is.

I'll probably crack it eventually.

I'm sure there are many cases I haven't considered but, of course, the ones I have considered are tested.

Generation Filter


This plugin allows you to control which parts of a feature file are actually generated into tests. You do this by specifying sets of tags that include or exclude scenarios or features from generation.

The tag selection is controlled by a JSON file at the root of your project. The JSON file must be named "specflow-test-filter.json".

Its format looks something like this:

{
  "included-tags": [ "@In" ],
  "excluded-tags": [ "@Stripped" ]
}

As it should be, exclude tags always trump include tags.

Why Both?


These two plugins work together nicely. The first one allows me to reuse feature files. The second allows me to generate a subset of the scenarios within a project. As a result, I can create the SpecFlow equivalent of a "materialized view" within my test suite. Each test assembly can be a subset of all the tests I use for Dwindle.

Before, I relied on environment variables to select tests and set the test mode. Now, the place that a feature file is instantiated sets all the context required.

This worked perfectly in my automated gates. Maybe it was even convenient. At the very least it teetered on the edge of convenience but it was a real pain in the ass for my local development environment.

For one thing, I had to fiddle with environment variables if I wanted to switch between unit, client, or API tests. I was able to kludge my way past that, though: I have three different test runners installed in my Visual Studio instance and each one is configured differently.

Another problem - probably a more important one - is that it made it hard for me to push a button and run all the tests. As I migrate over to using these plugins, I'll be able to run all my tests in a single batch.

Before, the only tests that were convenient to run locally were the unit tests. Those were so fast that I could run them any time I wanted. Everything else was a batch of tests ranging from 30 seconds (just long enough to be infuriating) to ten minutes.

When I'm done migrating to this structure, I'll have a choice. I can right-click and run whatever test I want in whatever context I desire. I can select a ten-minute batch and go get some coffee. I can set up a run that takes an hour and go for a swim or walk.

I'll probably circle back on this with an experience report when the migration is done. My automated gates will all have to change (at little). I'm guessing, in the course of the migration, I'm going to need to add a few more features and refine the workflows a little, too.

Maybe it won't help you the way it (already) helps me but I figured I should build this as NuGet packages for the next person with the same problem.

Thursday, March 26, 2020

What if we rethought what a process framework should be in the first place?

Okay. Now I want to make a crazy suggestion.

If you've been tracking what I've been saying, this week, then you've probably read as far as this entry, I which I suggest that the basis for bad technical-decision-making is probably a poor understanding of the relative costs of various decisions. I also made the argument that large, complex corporate budgets and success-metrics tend to confound the problem and make it even harder for some decision-makers to make good long-term decisions.

Now I want to explore process frameworks and how they play into this.

I'm not a deep expert on any of these process frameworks but I know plenty about several of them. In fact, I'm probably as much of an expert as a lot of the people who present themselves as experts. I just don't claim to be.

I'm going to list a few frameworks but I think my argument is true for all of the frameworks of which I am aware:

  • Scrum
  • SAFe
  • Kanban

That's enough to make the point. At least from context, I've heard evangelists for each of these process frameworks make the argument that businesses need to change how they measure.

My first experience with this was when scrum trainers told my team to use Story Points for estimation rather than hours. The idea being that we could empirically determine what a story point means for capacity planning and get better at estimating.

Great.

It didn't work, of course, but great.

It still represents an early attempt to shake up how software projects are measured.

Then came Business Value Points. That oughta do the trick, right? Of course, it didn't, but at least people were trying.

Story Points and Business Value Points both generally failed to have a positive impact (within the scope of my observation) because they were attempts to make the traditional spreadsheets work. They were stand-ins for other numbers that were already being used and empirically backing into a meaningful value was supposed to fix...something...but it couldn't fix the corrupt idea at the heart of it all; the model of software development that is, to this day, essentially descended from manufacturing.

To address those ideas, new processes and mentalities were brought to bear. Plan this way, keep these things in a backlog, manage that kind of work in process, favor this kind of effort over that one, et cetera.

Then there were people arguing for no estimates. There were (are?) proponents within seemingly every framework. There seemed to be a lot of them in Kanban. So much so that I began to associate the two in my mind.

I'm not sure if that was ever an official part of Kanban for software development. Nor do I care. What I do know is that it was another attempt to address the measurement problem in software development and it was brought to bear as a supporting change for some other goal (an "agile" process, usually).

Scrum and SAFe both have explicit mechanisms for their "inspect and adapt" principle - which many believe to be the core of their efficacy. I view this as an attempt to bring qualitative measurement techniques into software development.

The picture I'm painting is that luminaries within the software development industry have been attempting to improve how and what organizations measure for...well...decades, now.

Yet, it always seems like it came as a sideshow, though. It always seemed like there was a shrink-wrapped process that (some people claimed) was going to solve everything and, also, maybe, kinda, you might need to tweak how you measure a little bit.

What if we flipped it on its head? What if we defined a new paradigm for managing software development organizations with a radically different message?

The message, right now, is "You know what's going on, we'll help you do a better job."

I think the message should be "You're going a good job, let us help you know what's really happening."

I think what we need is not process frameworks with a side of metrics. We need metrics frameworks and you can buy a process framework Ă  la carte if you decide you need one.

We could start with "Keep managing how you manage but change how you measure" and see where that gets us.

I think it might get a lot more traction because it's more respectful of the skills that most software development leaders do have. Most of them do know how to manage large groups of people and work with large complicated budgets.

Even if it didn't get widely adopted, though, I think the organizations where a metrics framework landed would benefit a lot more than the ones that are trying to adopt or grow a new software development lifecycle. They'd get more out of the effort because we'd be addressing the real kernel of the problem: Not what we do but what we see.

We need to start working with leadership teams, and maybe even boards, to help them define metrics that set software "projects" (products) up for success. We know these things are going to last a lot longer than a quarter, so quarter-to-quarter measurements aren't going to do the trick. Let's get together as software professionals, define a standard for how success and progress can really be measured in software development, and then start pushing that.

The industry needs to change how it measures and plans and there's no way it will be able to do so without our help.

Wednesday, March 25, 2020

A deeper exploration of why people might make bad technical decisions

Yesterday, I wrote about budgetary forces impacting technical decision-making.

Everyone just reading that sentence probably is thinking "Well... duh!"

However, I don't mean it in the way a lot of people do.

My argument was that big, padded budgets create a sense of being able to afford bad decisions, allowing technical debt to pile up. I also posited that small, tight budgets might help people see that they need to make good decisions if they want to stay afloat.

Of course, at the end of the blog entry, I noted that this was just one piece of the puzzle. It's not like every big budget results in bad practices. Likewise, not all of the small budgets in the world result in better decision-making.

It should also be noted that this is all anecdotal and based on my own experiences. Really, that should have been noted in yesterday's entry.

Oh well.

As I considered my position, I realized there was a major crack in the argument: Large organizations can't afford to make bad decisions. The large, complicated budgets only make them think they can. As a result, they often defer sustainability-related investments until the technical debt is so bad that they are in crisis mode. Yet, the margin is almost always called, the crisis almost always comes, and the result is almost always bad.

So it is definitely not about being able to afford an unsustainable decision. Actually, just that phrase is inherently self-contradictory. It reduces to either "afford an unaffordable decision" or "sustain unsustainable decisions", both of which are nonsense statements.

Budget-size is probably not what's at work, here. It's just a clue pointing to the real driver: understanding.

Consider this: A small organization on a shoestring budget that makes good decisions about sustainable practices only does so because its membership understands they can't really afford to make bad decisions. If proper software design wasn't perceived as something that was going to pay off, an organization wouldn't do it.

For every one of those small teams with healthy technical practices, there are probably dozens we never hear of because they are crushed under the weight of their own poor choices.

Why did they make those poor choices? Do people intentionally undermine their ability to succeed knowing full well that's what they're doing? Not usually. Again, those small organizations that winked out of existence too fast for anyone to notice were undone by their lack of understanding.

Now let's look at the big organizations. They go months or years before the crisis hits them and then sunk costs often make them drag on for a lot longer after that. Are there really people sitting around in the leadership of those companies, greedily rubbing their hands together and muttering "This project is going to be toast in three to five years! Heh, heh, heh."

Well. Maybe that happens every once in a while.

Most of the time, though, it's probably just that the decision-makers simply didn't understand the long-term ramifications of the decisions they are making.* They don't understand that what they are doing is going to create a debt that cannot possibly be paid when no more forbearances can be obtained.

Furthermore, you will occasionally find very large firms that really do all the things they are supposed to do - keep the test suites comprehensive and meaningful, regularly reconcile design with requirements, properly manage handoffs, continuously improve their processes, et cetera. From what I can tell, it seems like this is often a result of a handful of higher-up leaders who have a deep understanding of what works in software development.

So all four of the main cases seem, to me, to be dependent on understanding that there is a problem. Large budgets just muddy the waters for decision-makers who don't already have a deep enough understanding of how software development works.

To fix the problem, leaders need to know that pressuring developers to defer writing tests or refactoring to a proper design (among other things) will be the seeds of their undoing.

Why don't they know that, already?

I think that question brings us to a real candidate for a root cause.

So many organizations - especially large organizations - claim to be "data-driven". They need numbers to make their decisions. Not only do they need numbers, but they need numbers fast. It seems like a lot of leaders want to see the results of making a change in weeks or months.

Therein lies the problem.

For large organizations, the consequences of poor technical practices take months or years to produce intolerable conditions. Why should damage that took years to manifest be reversible in a matter of weeks or months? It's not possible.

So long as the way we measure progress, success, failure, and improvement in software development is tied to such incredibly short windows in time, those metrics will always undermine our long-term success. Those metrics will always create the impression that the less-expensive decision is actually the more expensive one and that you can squeeze a little more work out of a software development team by demolishing the very foundations of its developers' productivity.

Not all data are created equal. Data-driven is a fine way to live so long as the data doing the driving are meaningful.

We need better metrics in software development or, at the very least, we need to abolish the use of the counterproductive ones.

Tuesday, March 24, 2020

Why is it hard to make sustainable practices stick?

I've been thinking about this problem for a while.

It's hard to make sustainable software development practices stick. The larger an organization is, the harder it seems to be.

Why?

There are many possible answers and probably all of them have at least a little validity.

  • It's way easier to learn a computer language than it is to learn how to properly attend to design.
  • Refactoring (true refactoring) is hard to learn, too.
  • TDD requires significant effort to figure out how to apply meaningfully.
  • BDD requires organizational changes to make its biggest impacts.
  • Larger organizations have politics.
  • Larger organizations have communications and handoffs.
  • Larger organizations have often deadlines that aren't linked with capacity or value.

All of those are at least somewhat true, some of the time, but none of them smells very much like a root cause. Maybe there isn't a single cause and that's why it's hard to pin one down. In fact, that's probably true, but I still feel like the above list (and its ilk) is a list of symptoms of some other problem, not a list of first-class problems.

Yesterday's blog entry helped me make a little progress in the form of a new hypothesis we can at least invalidate.

As I said, I automate everything because I know I can't afford to do otherwise. In fact, that's the reason why I apply modern software design principles. It's why I refactor. It's why I have a test-driven mindset and why I write tests first whenever I can. I'm on a shoestring budget and I know I can't spare anything on wasted effort, so I work as efficiently as I can.

What if the reason why larger organizations ostensibly tend to struggle with code quality, design patterns, refactoring, TDD, agile workflow, and lean product definition is as simple as the inverse statement? I know I don't have the budget to work inefficiently, so I work efficiently. Larger organizations have the budget to work inefficiently, so they don't work efficiently?

It sounds crazy. At least, it would have to my younger self.

"People are working in an unsustainable way just because they think they can afford it? What?!?" That's what young Max would have said. Today Max really only has a resigned shrug to offer in dissent.

So, because Old Max put up such feeble resistance, let's explore the idea a little more.

A small organization is usually comprised of* a tight-knit group of individuals. While they may not all be experts in every area of work, the impact of any person's decisions can be plainly seen in day to day activities. This means that the costs of bad decisions are not abstracted debts to be paid someday. They are real problems being experienced at the moment.

Pair that with the tight budget that smaller companies usually have, and you get a recipe for action: the plainness of the problem helps people know what should be done and the necessities of a small budget provide the impetus to do it.

Contrast that with a large organization.

In a large organization, consequences or often far-removed from actions. If you create a problem, it may be weeks, months, or even years before you have to pay the cost. That's if it doesn't become someone else's problem altogether, first. Fixing a systemic problem such as, say, not being test-driven can be imagined as this high-minded "nice to have" that probably won't really work because you'd need everyone to be on board, which will never happen because everyone else feels the same way.

At the same time, pockets are often quite deep in large organizations. While a show may be made of pinching pennies in the form of (for instance) discount soap in the bathrooms, they tend to spend a lot on wasted effort. They are able to do this because they have some array of already-successful and (typically) highly profitable products they can exploit to fund new efforts. Furthermore, in addition to being very large, corporate budgets seem like they are usually very complex. Large costs can sometimes seem smaller than they really are because they are impacting many different line items.

Pair those two together and you get fatalistic ennui making everything seem academic with a budgeting apparatus that consistently says "we've got bigger fish to fry".

I'm pretty sure this is one piece of the puzzle but I think there's more to it. For instance, there are many small organizations with shoestring budgets that still make bad decisions about sustainability. There are also counterexamples in the form of large companies that tend to make decisions that are good, or at least better than those of their competitors.

However, this writing is now quite long. So I'm going to end it, here, and discuss another factor in tomorrow's entry.

*: That is one of the correct usages of "to comprise". Look it up.

Monday, March 23, 2020

My pipeline brings all the builds to the prod

The title is a reference to a song I only know exists because of this clip from Family Guy.

Dwindle is a one-person operation, right now. Eventually, I might pull my wife into it but she doesn't have any time to dedicate to anything new, right now.

We have a two-year-old, a four-year-old, various other business interests, a house that seems to require constant repairs, and, until the recent panic, a reasonably busy consulting practice.

So time is at a premium.

For now, it's just me and I don't get a full forty hours a week to work on Dwindle. Before I got sick, I was lucky to have twelve. Obviously, last week, my time available to work was right around zero hours.

Still, in those twelve hours per week, I managed to build and maintain an automated pipeline that carries Dwindle all the way from check-in to automated deployment and, ultimately, promotion to production environments.

The pipeline covers everything...

  • Building and testing binaries for the core logic of the game.
  • Building and testing the backend API.
  • Building the clients.
  • Acceptance/integration testing the clients.
  • Deploying to a "blue" or "staging" environment.
  • Validating the blue/staging deployments.
  • Promoting from blue to "green" or "production" environments.
  • Cleaning up old deployments where applicable.

It manages parallel deployments in different environments:

  • Azure Functions & Storage for the backend.
  • Google Play for the Android app.
  • Kongregate for the browser version.

It keeps everything in sync, ensuring each of the following:

  • No blue deployments occur until all tests for every component have passed
  • No deployment validations occur until all blue deployments are completed
  • No release to production begins until all promotion candidates have been validated.

This is no mean feat for a Unity application. It's more work than, say, a web application or even a normal Windows or mobile app. Unity makes every traditional software development task harder - probably because they are solving a different problem than the traditional app-development problem.

Even an automated build of a Unity game is a lot harder than automated builds of Xamarin or native-coupled apps would be. Acceptance testing your game is hard, too. Everything that isn't making a pretty shape on the screen is much more difficult than it would be with a more traditional tech stack.

I did it anyway. I did it when it was hard and seemed like it would only get harder. I did it when it looked like I had an endless churn of tasks and felt like solving the next problem would just beget two more.

Even when a little voice in the back of my head began to whisper "Hey... maybe it's not so bad to do that part manually," I did it. I pushed past the little voice because I knew it was lying to me.

If I can do it, alone and beset on all sides by two very-high-energy youngers, you can do it while you're sitting at your desk in your large, reasonably-well-funded corporate office (or home office).

...but that's not very important, is it?

We shouldn't do things just because we can, right?

I need a legitimate reason and I have one. It's not just that I can do it. It's that I absolutely need a completely automated pipeline.

I couldn't possibly afford to build Dwindle into something successful if I was spending all my time manually testing and manually promoting builds. I'm not saying I will make Dwindle a financial success but my chance would nil if I was just wasting all my time on those things. Most of my time would go to validating a handful of changes. I wouldn't have any time left over to hypothesize, innovate, or develop new features.

The marginal cost of investing in proper automation is negative. While this may be impossible when talking about manufacturing, it's one of the most basic principles of software development: Investing in things related to quality lowers costs.

So I built a fully automated pipeline with a mix of integration and unit tests for a very simple reason: You spend less to have a fully automated pipeline than you do without one.

...and if I can spend less to have one alone, you certainly can do it with a team.

Friday, March 20, 2020

Parsing SVG documents into useful layout specifications

In a previous post, I discussed using SVG documents to specify layouts in a Unity app. More recently, I started delving into that subject and laid out a list of things to cover:

  1. How to make the SVG easy to interpret visually.
  2. How to convert the SVG into a set of rules.
  3. How to select the right rules given a query string.
  4. Generating an SVG that explains what went wrong when a test fails.

Among the many things I deferred was how to actually parse the SVG into a set of layout constraints.

Both #2 and #3 are pretty simple and they are strongly related, so I'll handle them in this text.

Thursday, March 19, 2020

Unit testing versus integration testing: I'm starting to reconsider my position

An impossible structure in an assembly diagram as might come with cheap furniture.


For a long time, I have rejected integration tests as overly costly and with very little benefit. Friends of mine have a similar argument about unit testing. They say that it does very little and slows you down.

I'm still not sold on the "unit testing slows us down" position. The negative side effects people mention (like impeding refactoring) line up more with testing implementation details than with unit testing, itself.

However, in the modern era, I'm starting to come around on integration testing.

First, I'll lay out my reasoning for why unit testing is better:
  1. Properly specifying how each individual behavior works allows you to build a foundation of working behaviors.
  2. Defining behaviors relative to other behaviors (rather than as aggregations of other behaviors) stems proliferation of redundancy in your test code.
  3. How two or more behaviors are combined in a certain case is, itself, another behavior; see #1 and #2.
I still believe all of that. If I had to choose between only unit tests or only integration tests, I would choose only unit tests.

In addition to providing better feedback on whether or not your code (the code you control) works, they also help shape your code so that defects are harder to write in the first place. By contrast, when something you wrote breaks, an integration test failing unlikely as it's very hard to create exhaustive coverage with integration tests. Furthermore, even if an integration test does fail, it's not very helpful, diagnostically-speaking.

Yet, I don't have to choose. I can have both. As we've seen in previous posts and will continue to see, I can have one scenario be bound as both.

So what is it that integration tests tell us above and beyond unit tests. My unit testing discipline makes sure that I almost never break something of mine without getting quick feedback to that effect. What do integration tests add?

The spark of realization was a recent discovery as to why a feature I wrote wasn't working but the evidence has been mounting for a while, now.  It took a lot of data to help me see the answer even though to you it may prove shockingly simple and maybe even obvious.

Over the last year or so, a theme has been emerging...

  • A 3rd party layout tool has surprising behavior and makes my layouts go all wacky. So I have to redo all my layouts to avoid triggering its bugs.
  • "Ahead of time" code fails to get generated and makes it so I can't save certain settings. I have to write code that exercises certain classes and members explicitly from a very specific perspective in order to get Unity actually compile the classes I'm using.
  • A Google plugin breaks deep linking - both a 3rd-party utility and Unity's built-in solution. I have to rewrite one of their Android activities and supplant the one they ship to make deep linking work.
  • A "backend as a service" claims that its throttling is at a certain level but it turns out that sometimes it's a little lower. I have to change how frequently I ping to something lower than what they advise in their documentation.
  • A testing library is highly coupled to a particular version of .NET and seems to break every time I update.
  • Et cetera. The list goes on...and on...and on.

Unit tests are good at catching my errors but what about all the other errors?

When you unit test and do it correctly, you isolate behaviors from one another so that each can be tested independently from another. This only works because you are doing it on both sides of a boundary and thus can guarantee that the promises made by a contract will be kept by its implementation.

That breaks down when you aren't in control of both sides of the contract. It seems like we live in unstable times. You simply can't count on 3rd party solutions to keep their promises, it seems.

This realization led me to a deeper one. It's about ownership. If you own a product, your customer only cares that it works. They don't care about why it doesn't work.

Telling them "a 3rd-party library doesn't function as promised and, as a result, deep links won't work for Android users. I'm working on it," sounds worse than just saying "Deep links don't work for Android users, I'm working on it." What they (or, at least, I) hear is "Deep links don't work for Android users. Wah, wah, wah! It's not my fault! I don't care about your inconvenience. I only care about my inconvenience. Feel sorry for me."

Even though you don't own 3rd-party code, to your customers, you may as well. You own the solution/product/game in which it is used and you own any failures it generates.

That extends well beyond whether or not the 3rd-party component/service works. It includes whether or not a component's behavior is represented correctly. It includes whether or not you used it appropriately. It even includes the behavior of a component or service changing in a surprising way.

So I finally realize the point of integration testing. It forces you to deal with the instability and fragility of the modern software development ecosystem. It's not about testing your code - that's what unit tests are for - it's about verifying your assumptions pertaining to other people's code and getting an early warning when those assumptions are violated.

Integration testing - whether it's how multiple microservices integrate or how all your components are assembled into an app - is essential. Just make sure you are using it to ask the right questions so you can actually get helpful answers:

"Is this thing I don't control properly functioning as a part of a solution I offer?"

Wednesday, March 18, 2020

The pains of being flexible

I ran a build with some pretty simple refactors as the only changes. I expected it to pass. The only reason the build even ran was that I checked in my changes.

Yet the build didn't pass. It failed with a bizarre error. SpecFlow was failing to generate code-behind files.

Some research made it clear that this was really a function of SpecFlow not working very well with .NET 3.1.200. The surprising thing about that was that I didn't remember switching to 3.1.200.

The fault actually was mine. At least, it was in a circuitous way. It was an artifact of a decision I made while defining my pipeline:

  - taskUseDotNet@2     displayName'Use .NET 3.1'     inputs:       version3.1.x       installationPath'$(Agent.ToolsDirectory)\dotnet\3.1'

I intentionally chose to allow updates automatically with that little ".x" and it burned me.

Sure enough, a tiny change "fixed" my broken pipeline:

  - taskUseDotNet@2     displayName'Use .NET 3.1'     inputs:       version3.1.102       installationPath'$(Agent.ToolsDirectory)\dotnet\3.1'

I'll definitely leave it that way until I have a reason to change it.

What I don't know, at this time, is if I'll go back to allowing the maintenance version to float, when I finally do make a change.

Is it better to have the stability of a known version or to get the fixes in the latest version without having to do anything?

Right now, my inclination is toward faster updates, still. For an indie game-development shop with a disciplined developer who jumps on problems right away, it's probably better to get the updates in exchange for the occasional quickly-fixed disruption.

That said, if I get burned like this a few more times, I might change my mind.

Tuesday, March 17, 2020

Making an SVG shape specification easy to interpret visually.

In my most recent post, I deferred describing how I parsed an SVG document to another post.

There are multiple subtopics:

  1. How to make the SVG easy to interpret visually.
  2. How to convert the SVG into a set of rules.
  3. How to select the right rules given a query string.
  4. Generating an SVG that explains what went wrong when a test fails.

I will attempt to address them all in separate posts. Like the rest of the world (at the time of this writing), I'm recovering from illness. So I'll do an easy one, now, and the rest will have to wait.

First up, how to make the SVG easy for a person to understand.

This part is all about SVG, itself. I started out with a single document that looked pretty raw - just some white rectangles on a black background. Over time I accumulated more documents and evolved them into something more palatable.

Those mostly involved the use of stylesheets and definitions but also, after much experimentation, I discovered that polyline was the most effective tool to create the shapes I wanted. I'll explain why in a bit.

First, let's look at a single polyline element:

<polyline id=".inner" points="80,192 80,1000 1860,1000 1860,192 80,192 80,193" />

That's a rectangle with its first two vertices repeated. For the test, I only need the first three points - the rest of the parallelogram is inferred.

However, to create the visual effect of a bounding box with little arrowheads pointing inward at the corners, I needed the extra points. At least, I couldn't figure out how to do it without the extra points.

I could only get the orientation of markers on inner vertices to be correct. Everything else pretty much just looked like a random direction had been chosen. As a result, I needed 4 inner vertices, which means I needed six of them, total (start, inner x 4, end).

The other structures I needed were some defined shapes to use as vertex-markers.

<defs>
    <marker id="marker-outer" viewBox="0 0 10 10" refX="5" refY="10"
        markerWidth="5" markerHeight="5"
        orient="auto">
        <path d="M 5 10 L 2 0 L 8 0 z" class="label" />
    </marker>
    <marker id="marker-inner" viewBox="0 0 10 10" refX="5" refY="0"
        markerWidth="5" markerHeight="5"
        orient="auto">
        <path d="M 5 0 L 2 10 L 8 10 z" class="label" />
    </marker>
</defs>

Once I have a rule-definition (my polyline, in this case) and the definition of the marker, I can use a stylesheet to marry the two and create the visual effect.

<style>
    *[id='inner'],
    *[id$='.inner'],
    *[id='outer'],
    *[id$='.outer']
    {
        stroke-width: 5;
        stroke: white;
        stroke-dasharray: 30 5;
        fill:none;
    }

    *[id='inner'],
    *[id$='.inner']
    {
        marker-mid: url(#marker-inner);
    }

    *[id='outer'],
    *[id$='.outer']
    {
        marker-mid: url(#marker-outer);
    }

    /* SNIPPED: Stuff used at runtime for other purposes */
</style>

Finally, to create a frame of reference, I converted a background image to base 64 and embedded it in the SVG document as the first element.

All of those steps create an effect like this:


Thankfully, most of those steps don't need to be repeated.

Sadly, it seems that SVG pushes you in the direction of redundancy. You can externalize your stylesheet but not every renderer will respect it. I couldn't find a way to reliably reuse the markers, either. The background image could be externalized but then I'd be relying on hosting for the specification to render properly.

There's a bunch of copy and paste but it's not on the critical path for the test. It just affects how the test looks to developers. So I tolerate it.

I could write a little generator that runs just before build time but then I wouldn't be able to easily preview my work.

C'est la vie.

At least, this way, I can quickly interpret my specification just by opening it in a browser or editor.

Wednesday, March 11, 2020

Pinning down layout for a Unity game

As I work to find an audience for Dwindle - the hardest problem I've ever worked on, by the way - I'm experimenting with different layouts. Specifically, I want games run on PC (including the browser) to look more like one would expect.

This means that I need to support both landscape and portrait layouts.

Tool Trouble


I've found an asset to help with this but it doesn't work perfectly. Especially: sometimes its behavior is a little surprising to developers and it was not set up very well for Unity's nested prefab feature. I use nested prefabs very heavily.

The result is that the tool works but, if I'm not very careful (which is most of the time), it can destabilize quickly.

I need tests that let me know what I've done something that triggers the layout asset I'm using and causes it to go berserk.

What to Test?


I have two lines of defense here: switching from invoking buttons programmatically to simulating clicks and directly testing the layout.

The former ensures that the buttons I'm invoking are actually clickable in the viewable area of the game - an assumption I could make up until recently.

The latter is a gigantic pain in the ass but was ultimately worth the effort.

I'm going to start laying out some of what I did on that front, here.

Defining a Meaningful Layout Requirement


It starts with a good specification and that means I need to know what a good specification for layout actually is.

Not being a UI/UX expert, I settled on an SVG graphic that can be used to codify the specification and can be inspected visually.

I first had to decide what my requirements for the requirements were. The game is playable on a variety of Android devices as well as in a web browser. Additionally, it can be configured differently within those different form factors (for instance, there are no embedded ads when played on Kongregate).

So the tests I write need to either be (a) very flexible around changes in device form-factor/configuration for a channel or (b) finely tuned to a specific execution environment.

Neither of those is particularly appetizing options - UI testing never is. However, given the choice between too much flexibility or too much coupling, I'll take the former.

Right now, I define layouts in terms of two kinds of constraints. Not every constraint is defined for every requirement I specify. The two kinds of constraints are:
  • Inner boxes - boxes that must be contained by the UI structure being examined.
  • Outer boxes - boxes that must contain the UI structure being examined.
Note that this still gives me the ability to have a rigid definition if I want: just make the inner and outer boxes the same box.

Remember, I'm not trying to set up a test-driven layout, here. For a small project with one developer who is also the product owner and product manager, that would probably be overkill.

What I'm trying to set up is a system of alarms that lets me know when my layout asset has gone haywire and borked my game. That's a much less demanding task and the above constraints suit it nicely.

Codifying Layout Requirements


To represent these constraints, I chose to define a narrow convention within the SVG format:
  • Anything can be in the document.
  • Only rect, polygon, and polyline elements are actually parsed and used as specification directives. Everything else is disregarded.
  • Dot-delimited id attributes represent a logical path within a specification file to any given constraint.
  • The final step in a constraint's path represents the kind of rule applied to the shape in question.
  • Additionally: For polylines and polygons, the first three coordinates of the path are treated as points on a parallelogram that defines the corresponding constraint. (I never use anything other than a rectangle but that's a kind of parallelogram, right).
For instance:
  • A rect element with an id of "cheese.tacos.halfoff.inner" represents a box that must be inside whatever UI element is being tested.
  • A circle element with an id of "important.control.outer" will be ignored because it is a circle.
  • A polyline element with an id of "handle.bars.notsupported" will not be treated as a rule because the system cannot map "notsupported" to a constraint.
This gives me a fair amount of freedom to create a specification that, visually, tells a person what are the rules for a particular shape while allowing a test to use the exact same data to make the exact same determination.

Blogger makes it pretty hard to host SVGs - as do most companies, it seems. However, below is an embedded Gist for a specification. You can grab the code and look at it, yourself if you like.


Also, here's a rendering of the image, in case you don't want to go to the trouble of grabbing the code, saving it, and opening it up in a separate browser window:

I had to do a screen grab to do this...why does the world hate SVG?
A jpg-ified version of the SVG
If you look at the gist, you can see what I did to make the little wedges without having to handcraft each one.

This leaves much to be desired but it also allows me to solve a lot of my problems.

Parsing it and computing the expected positions for an actual screen's size is pretty straightforward arithmetic. This entry is long, so I'll save that for another time so I can focus on a more interesting problem.

Defining a Test


Of course, this is just raw data. It's not a test. I need to add some context for it to be a truly useful specification. Here's a pared-down feature file.

Feature: Layout - Portrait
Background:
Given device orientation is portrait
Scenario: Position of application frame artifacts
Given main menu is active
# SNIPPED: Out of scope specifications
Then shapes conform to specifications (in file "Frame Specification - Buttons - Portrait.svg"):
| Layout Artifact | Rule |
| application return to main menu | home |
| main menu new single player game | start single player |
| main menu new multiplayer game | start multiplayer |
| main menu resume game | start resume |

What I've described, so far, allows me to set up the expected side of the tests. How do I bind it? How do I get my actuals?

It turns out the quest to do this gave me the tools to solve other problems (like using real taps instead of simulated ones).

Like everything that isn't painting a pretty picture for a first-person shooter, Unity doesn't make it easy to do this. It's not that any of the steps are difficult to do or understand. It's difficult to find what the steps are and string them together.

I take that to mean I'm doing something for which they have not planned, which is probably because not very many people are trying to do it.

For one thing, there are two complementary ways of determining an object's on-screen shape. One way works for UI elements. The other works for game objects that are not rendered as part of what Unity calls a UI. That's another problem, I'll set aside for the moment, though. Let's just stick with UI.

I already have a way of getting information about the game out of the game. That's probable another blog entry for later. Anyway, it allows me to instrument the game with testable properties and then query for those properties from my test.

Getting the Shape of Onscreen Objects


First, I need something to pass back and forth:


[DataContract]
public struct ObjectPositionData
{
  [DataMember]
  public Rectangle TransformShape;
  [DataMember]
  public Rectangle WorldShape;
  [DataMember]
  public Rectangle LocalShape;
  [DataMember]
  public Rectangle ScreenShape;
  [DataMember]
  public Rectangle ViewportShape;
}

[DataContract]
public struct Rectangle
{
  [DataMember]
  public Vector Origin;

  [DataMember]
  public Vector Right;

  [DataMember]
  public Vector Up;
  /* SNIPPED: Mathematical functions */
}

[DataContract]
public struct Vector
{
  [DataMember]
  public float X;
  [DataMember]
  public float Y;
  [DataMember]
  public float Z;
  /* SNIPPED: Mathematical functions */
}

Then I needed a way of populating it. For UI elements. That is about turning a RectTransform into its onscreen coordinates. Most importantly, I need the "viewport" coordinates, which mapped to the square (0,0)-(1,1). I could also use the screen coordinates, but both the screen coordinates and the viewport coordinates need a little massaging to be useful for the test infrastructure and the transformation for the viewport coordinates is trivial: ynew = 1 - yoriginal.

Extracting these coordinates is, of course, an obnoxious process. Everything has to be done through the camera and everything has to be treated like it's 3D - even when it's not. So, it starts by finding the "world corners" of a rectangle.

var WorldCorners = new Vector3[4];
Transform.GetWorldCorners(WorldCorners);

This seems like a weird way of doing it, to me, but I'm sure they have their reasons. It's probably something to do with performance. All the access to these kinds of properties must be from the same thread, anyway, so it's probably safe to do something like keep that array around and reuse it from one frame to the next.

Anyway. The world corners aren't good enough because those are in a 3D space that is in no way associated with the actual (or virtual) device being used in the test.

So we have to convert those corners to viewport corners and that has to be done using a camera object.

new ObjectPositionData
{
  /* SNIPPED: Other shapes */
  ViewportShape = MakeRectangle(WorldCorners.Select(V => ToTrueViewportPoint(Camera, V)))
}

static Vector3 ToTrueViewportPoint(Camera Camera, Vector3 V)
{
  var Result = Camera.WorldToViewportPoint(V);
  Result.y = 1 - Result.y;
  return Result;
}

Things like how I marshal and unmarshal the ObjectPositionData struct between the game and my tests are going to be discussed in a different article.

Binding the Test


Now I can pull that out in my test bindings. So let's start looking at those.

[DataContract]
public sealed class ShapeFileRow
{
  [DataMember] public string LayoutArtifact;
  [DataMember] public string SpecificationsFile;
  [DataMember] public string Rule = "";
}

// This one is called for the feature file shown above
[Then(@"shapes conform to specifications \(in file ""([^""]*)""\):")]
public void ThenShapesConformToSpecificationsInFile(string SourceAssetName, Table T) //5
{
  var Rows = T.CreateSet<ShapeFileRow>();
  foreach (var Row in Rows)
    ThenShapeConformsToLayoutSpecificationFrom(Row.LayoutArtifact, Row.Rule, Row.SpecificationsFile ?? SourceAssetName);
}

[Then(@"shape (.*) conforms to layout specification (.*) from ""([^""]*)""")]
public void ThenShapeConformsToLayoutSpecificationFrom(string ShapeKey, string SpecificationKey, string SourceAssetName)
{
  SpecificationKey = Regex.Replace(SpecificationKey, @"\s+", "."); // 1
  var ShapeSpecificationsContainer = TestAssets.Load(SourceAssetName, ShapeSpecifications.Reconstitute(Artifacts)); // 2
  var Requirement = ShapeSpecificationsContainer.GetViewportRequirement(SpecificationKey); // 3
  var Actual = Client.GetViewportShape(ShapeKey); // 4

  Requirement.Validate(Actual); // 6
}

There are several things of note (corresponding with numbers in comments, above).
  1. Because I'm not sure if SVG IDs allow spaces, I build my SVG ID scheme around dot-delimitation. However, that's ugly in a feature file. So I swap whitespace with dots et voilĂ .
  2. I grab test assets like the SVG spec out of assembly resources using a helper. That helper is omitted from this entry for length.
  3. Getting the shape specification object out of the document is, as already mentioned, omitted for length.
  4. Getting the viewport shape involves using the test-command marshaling system to exercise the code mentioned earlier.
  5. The table-based assertion is just syntactic sugar over the top of an individual assertion I use elsewhere.
  6. The various validate methods enforce the constraints mentioned earlier: inner boxes must be contained by actual boxes and outer boxes must contain actual boxes.

That's It, for Today


This feels like a really long entry, at this point. I've tried to be a lot more detailed, as some have requested. However, there's just no way to include all the details in one entry - any more than there would be in a single section of a single chapter of a book.

So I'm going to have to drill into more details in subsequent writings.