Wednesday, March 25, 2020

A deeper exploration of why people might make bad technical decisions

Yesterday, I wrote about budgetary forces impacting technical decision-making.

Everyone just reading that sentence probably is thinking "Well... duh!"

However, I don't mean it in the way a lot of people do.

My argument was that big, padded budgets create a sense of being able to afford bad decisions, allowing technical debt to pile up. I also posited that small, tight budgets might help people see that they need to make good decisions if they want to stay afloat.

Of course, at the end of the blog entry, I noted that this was just one piece of the puzzle. It's not like every big budget results in bad practices. Likewise, not all of the small budgets in the world result in better decision-making.

It should also be noted that this is all anecdotal and based on my own experiences. Really, that should have been noted in yesterday's entry.

Oh well.

As I considered my position, I realized there was a major crack in the argument: Large organizations can't afford to make bad decisions. The large, complicated budgets only make them think they can. As a result, they often defer sustainability-related investments until the technical debt is so bad that they are in crisis mode. Yet, the margin is almost always called, the crisis almost always comes, and the result is almost always bad.

So it is definitely not about being able to afford an unsustainable decision. Actually, just that phrase is inherently self-contradictory. It reduces to either "afford an unaffordable decision" or "sustain unsustainable decisions", both of which are nonsense statements.

Budget-size is probably not what's at work, here. It's just a clue pointing to the real driver: understanding.

Consider this: A small organization on a shoestring budget that makes good decisions about sustainable practices only does so because its membership understands they can't really afford to make bad decisions. If proper software design wasn't perceived as something that was going to pay off, an organization wouldn't do it.

For every one of those small teams with healthy technical practices, there are probably dozens we never hear of because they are crushed under the weight of their own poor choices.

Why did they make those poor choices? Do people intentionally undermine their ability to succeed knowing full well that's what they're doing? Not usually. Again, those small organizations that winked out of existence too fast for anyone to notice were undone by their lack of understanding.

Now let's look at the big organizations. They go months or years before the crisis hits them and then sunk costs often make them drag on for a lot longer after that. Are there really people sitting around in the leadership of those companies, greedily rubbing their hands together and muttering "This project is going to be toast in three to five years! Heh, heh, heh."

Well. Maybe that happens every once in a while.

Most of the time, though, it's probably just that the decision-makers simply didn't understand the long-term ramifications of the decisions they are making.* They don't understand that what they are doing is going to create a debt that cannot possibly be paid when no more forbearances can be obtained.

Furthermore, you will occasionally find very large firms that really do all the things they are supposed to do - keep the test suites comprehensive and meaningful, regularly reconcile design with requirements, properly manage handoffs, continuously improve their processes, et cetera. From what I can tell, it seems like this is often a result of a handful of higher-up leaders who have a deep understanding of what works in software development.

So all four of the main cases seem, to me, to be dependent on understanding that there is a problem. Large budgets just muddy the waters for decision-makers who don't already have a deep enough understanding of how software development works.

To fix the problem, leaders need to know that pressuring developers to defer writing tests or refactoring to a proper design (among other things) will be the seeds of their undoing.

Why don't they know that, already?

I think that question brings us to a real candidate for a root cause.

So many organizations - especially large organizations - claim to be "data-driven". They need numbers to make their decisions. Not only do they need numbers, but they need numbers fast. It seems like a lot of leaders want to see the results of making a change in weeks or months.

Therein lies the problem.

For large organizations, the consequences of poor technical practices take months or years to produce intolerable conditions. Why should damage that took years to manifest be reversible in a matter of weeks or months? It's not possible.

So long as the way we measure progress, success, failure, and improvement in software development is tied to such incredibly short windows in time, those metrics will always undermine our long-term success. Those metrics will always create the impression that the less-expensive decision is actually the more expensive one and that you can squeeze a little more work out of a software development team by demolishing the very foundations of its developers' productivity.

Not all data are created equal. Data-driven is a fine way to live so long as the data doing the driving are meaningful.

We need better metrics in software development or, at the very least, we need to abolish the use of the counterproductive ones.