Thursday, March 26, 2020

What if we rethought what a process framework should be in the first place?

Okay. Now I want to make a crazy suggestion.

If you've been tracking what I've been saying, this week, then you've probably read as far as this entry, I which I suggest that the basis for bad technical-decision-making is probably a poor understanding of the relative costs of various decisions. I also made the argument that large, complex corporate budgets and success-metrics tend to confound the problem and make it even harder for some decision-makers to make good long-term decisions.

Now I want to explore process frameworks and how they play into this.

I'm not a deep expert on any of these process frameworks but I know plenty about several of them. In fact, I'm probably as much of an expert as a lot of the people who present themselves as experts. I just don't claim to be.

I'm going to list a few frameworks but I think my argument is true for all of the frameworks of which I am aware:

  • Scrum
  • SAFe
  • Kanban

That's enough to make the point. At least from context, I've heard evangelists for each of these process frameworks make the argument that businesses need to change how they measure.

My first experience with this was when scrum trainers told my team to use Story Points for estimation rather than hours. The idea being that we could empirically determine what a story point means for capacity planning and get better at estimating.

Great.

It didn't work, of course, but great.

It still represents an early attempt to shake up how software projects are measured.

Then came Business Value Points. That oughta do the trick, right? Of course, it didn't, but at least people were trying.

Story Points and Business Value Points both generally failed to have a positive impact (within the scope of my observation) because they were attempts to make the traditional spreadsheets work. They were stand-ins for other numbers that were already being used and empirically backing into a meaningful value was supposed to fix...something...but it couldn't fix the corrupt idea at the heart of it all; the model of software development that is, to this day, essentially descended from manufacturing.

To address those ideas, new processes and mentalities were brought to bear. Plan this way, keep these things in a backlog, manage that kind of work in process, favor this kind of effort over that one, et cetera.

Then there were people arguing for no estimates. There were (are?) proponents within seemingly every framework. There seemed to be a lot of them in Kanban. So much so that I began to associate the two in my mind.

I'm not sure if that was ever an official part of Kanban for software development. Nor do I care. What I do know is that it was another attempt to address the measurement problem in software development and it was brought to bear as a supporting change for some other goal (an "agile" process, usually).

Scrum and SAFe both have explicit mechanisms for their "inspect and adapt" principle - which many believe to be the core of their efficacy. I view this as an attempt to bring qualitative measurement techniques into software development.

The picture I'm painting is that luminaries within the software development industry have been attempting to improve how and what organizations measure for...well...decades, now.

Yet, it always seems like it came as a sideshow, though. It always seemed like there was a shrink-wrapped process that (some people claimed) was going to solve everything and, also, maybe, kinda, you might need to tweak how you measure a little bit.

What if we flipped it on its head? What if we defined a new paradigm for managing software development organizations with a radically different message?

The message, right now, is "You know what's going on, we'll help you do a better job."

I think the message should be "You're going a good job, let us help you know what's really happening."

I think what we need is not process frameworks with a side of metrics. We need metrics frameworks and you can buy a process framework à la carte if you decide you need one.

We could start with "Keep managing how you manage but change how you measure" and see where that gets us.

I think it might get a lot more traction because it's more respectful of the skills that most software development leaders do have. Most of them do know how to manage large groups of people and work with large complicated budgets.

Even if it didn't get widely adopted, though, I think the organizations where a metrics framework landed would benefit a lot more than the ones that are trying to adopt or grow a new software development lifecycle. They'd get more out of the effort because we'd be addressing the real kernel of the problem: Not what we do but what we see.

We need to start working with leadership teams, and maybe even boards, to help them define metrics that set software "projects" (products) up for success. We know these things are going to last a lot longer than a quarter, so quarter-to-quarter measurements aren't going to do the trick. Let's get together as software professionals, define a standard for how success and progress can really be measured in software development, and then start pushing that.

The industry needs to change how it measures and plans and there's no way it will be able to do so without our help.

Wednesday, March 25, 2020

A deeper exploration of why people might make bad technical decisions

Yesterday, I wrote about budgetary forces impacting technical decision-making.

Everyone just reading that sentence probably is thinking "Well... duh!"

However, I don't mean it in the way a lot of people do.

My argument was that big, padded budgets create a sense of being able to afford bad decisions, allowing technical debt to pile up. I also posited that small, tight budgets might help people see that they need to make good decisions if they want to stay afloat.

Of course, at the end of the blog entry, I noted that this was just one piece of the puzzle. It's not like every big budget results in bad practices. Likewise, not all of the small budgets in the world result in better decision-making.

It should also be noted that this is all anecdotal and based on my own experiences. Really, that should have been noted in yesterday's entry.

Oh well.

As I considered my position, I realized there was a major crack in the argument: Large organizations can't afford to make bad decisions. The large, complicated budgets only make them think they can. As a result, they often defer sustainability-related investments until the technical debt is so bad that they are in crisis mode. Yet, the margin is almost always called, the crisis almost always comes, and the result is almost always bad.

So it is definitely not about being able to afford an unsustainable decision. Actually, just that phrase is inherently self-contradictory. It reduces to either "afford an unaffordable decision" or "sustain unsustainable decisions", both of which are nonsense statements.

Budget-size is probably not what's at work, here. It's just a clue pointing to the real driver: understanding.

Consider this: A small organization on a shoestring budget that makes good decisions about sustainable practices only does so because its membership understands they can't really afford to make bad decisions. If proper software design wasn't perceived as something that was going to pay off, an organization wouldn't do it.

For every one of those small teams with healthy technical practices, there are probably dozens we never hear of because they are crushed under the weight of their own poor choices.

Why did they make those poor choices? Do people intentionally undermine their ability to succeed knowing full well that's what they're doing? Not usually. Again, those small organizations that winked out of existence too fast for anyone to notice were undone by their lack of understanding.

Now let's look at the big organizations. They go months or years before the crisis hits them and then sunk costs often make them drag on for a lot longer after that. Are there really people sitting around in the leadership of those companies, greedily rubbing their hands together and muttering "This project is going to be toast in three to five years! Heh, heh, heh."

Well. Maybe that happens every once in a while.

Most of the time, though, it's probably just that the decision-makers simply didn't understand the long-term ramifications of the decisions they are making.* They don't understand that what they are doing is going to create a debt that cannot possibly be paid when no more forbearances can be obtained.

Furthermore, you will occasionally find very large firms that really do all the things they are supposed to do - keep the test suites comprehensive and meaningful, regularly reconcile design with requirements, properly manage handoffs, continuously improve their processes, et cetera. From what I can tell, it seems like this is often a result of a handful of higher-up leaders who have a deep understanding of what works in software development.

So all four of the main cases seem, to me, to be dependent on understanding that there is a problem. Large budgets just muddy the waters for decision-makers who don't already have a deep enough understanding of how software development works.

To fix the problem, leaders need to know that pressuring developers to defer writing tests or refactoring to a proper design (among other things) will be the seeds of their undoing.

Why don't they know that, already?

I think that question brings us to a real candidate for a root cause.

So many organizations - especially large organizations - claim to be "data-driven". They need numbers to make their decisions. Not only do they need numbers, but they need numbers fast. It seems like a lot of leaders want to see the results of making a change in weeks or months.

Therein lies the problem.

For large organizations, the consequences of poor technical practices take months or years to produce intolerable conditions. Why should damage that took years to manifest be reversible in a matter of weeks or months? It's not possible.

So long as the way we measure progress, success, failure, and improvement in software development is tied to such incredibly short windows in time, those metrics will always undermine our long-term success. Those metrics will always create the impression that the less-expensive decision is actually the more expensive one and that you can squeeze a little more work out of a software development team by demolishing the very foundations of its developers' productivity.

Not all data are created equal. Data-driven is a fine way to live so long as the data doing the driving are meaningful.

We need better metrics in software development or, at the very least, we need to abolish the use of the counterproductive ones.

Tuesday, March 24, 2020

Why is it hard to make sustainable practices stick?

I've been thinking about this problem for a while.

It's hard to make sustainable software development practices stick. The larger an organization is, the harder it seems to be.

Why?

There are many possible answers and probably all of them have at least a little validity.

  • It's way easier to learn a computer language than it is to learn how to properly attend to design.
  • Refactoring (true refactoring) is hard to learn, too.
  • TDD requires significant effort to figure out how to apply meaningfully.
  • BDD requires organizational changes to make its biggest impacts.
  • Larger organizations have politics.
  • Larger organizations have communications and handoffs.
  • Larger organizations have often deadlines that aren't linked with capacity or value.

All of those are at least somewhat true, some of the time, but none of them smells very much like a root cause. Maybe there isn't a single cause and that's why it's hard to pin one down. In fact, that's probably true, but I still feel like the above list (and its ilk) is a list of symptoms of some other problem, not a list of first-class problems.

Yesterday's blog entry helped me make a little progress in the form of a new hypothesis we can at least invalidate.

As I said, I automate everything because I know I can't afford to do otherwise. In fact, that's the reason why I apply modern software design principles. It's why I refactor. It's why I have a test-driven mindset and why I write tests first whenever I can. I'm on a shoestring budget and I know I can't spare anything on wasted effort, so I work as efficiently as I can.

What if the reason why larger organizations ostensibly tend to struggle with code quality, design patterns, refactoring, TDD, agile workflow, and lean product definition is as simple as the inverse statement? I know I don't have the budget to work inefficiently, so I work efficiently. Larger organizations have the budget to work inefficiently, so they don't work efficiently?

It sounds crazy. At least, it would have to my younger self.

"People are working in an unsustainable way just because they think they can afford it? What?!?" That's what young Max would have said. Today Max really only has a resigned shrug to offer in dissent.

So, because Old Max put up such feeble resistance, let's explore the idea a little more.

A small organization is usually comprised of* a tight-knit group of individuals. While they may not all be experts in every area of work, the impact of any person's decisions can be plainly seen in day to day activities. This means that the costs of bad decisions are not abstracted debts to be paid someday. They are real problems being experienced at the moment.

Pair that with the tight budget that smaller companies usually have, and you get a recipe for action: the plainness of the problem helps people know what should be done and the necessities of a small budget provide the impetus to do it.

Contrast that with a large organization.

In a large organization, consequences or often far-removed from actions. If you create a problem, it may be weeks, months, or even years before you have to pay the cost. That's if it doesn't become someone else's problem altogether, first. Fixing a systemic problem such as, say, not being test-driven can be imagined as this high-minded "nice to have" that probably won't really work because you'd need everyone to be on board, which will never happen because everyone else feels the same way.

At the same time, pockets are often quite deep in large organizations. While a show may be made of pinching pennies in the form of (for instance) discount soap in the bathrooms, they tend to spend a lot on wasted effort. They are able to do this because they have some array of already-successful and (typically) highly profitable products they can exploit to fund new efforts. Furthermore, in addition to being very large, corporate budgets seem like they are usually very complex. Large costs can sometimes seem smaller than they really are because they are impacting many different line items.

Pair those two together and you get fatalistic ennui making everything seem academic with a budgeting apparatus that consistently says "we've got bigger fish to fry".

I'm pretty sure this is one piece of the puzzle but I think there's more to it. For instance, there are many small organizations with shoestring budgets that still make bad decisions about sustainability. There are also counterexamples in the form of large companies that tend to make decisions that are good, or at least better than those of their competitors.

However, this writing is now quite long. So I'm going to end it, here, and discuss another factor in tomorrow's entry.

*: That is one of the correct usages of "to comprise". Look it up.

Monday, March 23, 2020

My pipeline brings all the builds to the prod

The title is a reference to a song I only know exists because of this clip from Family Guy.

Dwindle is a one-person operation, right now. Eventually, I might pull my wife into it but she doesn't have any time to dedicate to anything new, right now.

We have a two-year-old, a four-year-old, various other business interests, a house that seems to require constant repairs, and, until the recent panic, a reasonably busy consulting practice.

So time is at a premium.

For now, it's just me and I don't get a full forty hours a week to work on Dwindle. Before I got sick, I was lucky to have twelve. Obviously, last week, my time available to work was right around zero hours.

Still, in those twelve hours per week, I managed to build and maintain an automated pipeline that carries Dwindle all the way from check-in to automated deployment and, ultimately, promotion to production environments.

The pipeline covers everything...

  • Building and testing binaries for the core logic of the game.
  • Building and testing the backend API.
  • Building the clients.
  • Acceptance/integration testing the clients.
  • Deploying to a "blue" or "staging" environment.
  • Validating the blue/staging deployments.
  • Promoting from blue to "green" or "production" environments.
  • Cleaning up old deployments where applicable.

It manages parallel deployments in different environments:

  • Azure Functions & Storage for the backend.
  • Google Play for the Android app.
  • Kongregate for the browser version.

It keeps everything in sync, ensuring each of the following:

  • No blue deployments occur until all tests for every component have passed
  • No deployment validations occur until all blue deployments are completed
  • No release to production begins until all promotion candidates have been validated.

This is no mean feat for a Unity application. It's more work than, say, a web application or even a normal Windows or mobile app. Unity makes every traditional software development task harder - probably because they are solving a different problem than the traditional app-development problem.

Even an automated build of a Unity game is a lot harder than automated builds of Xamarin or native-coupled apps would be. Acceptance testing your game is hard, too. Everything that isn't making a pretty shape on the screen is much more difficult than it would be with a more traditional tech stack.

I did it anyway. I did it when it was hard and seemed like it would only get harder. I did it when it looked like I had an endless churn of tasks and felt like solving the next problem would just beget two more.

Even when a little voice in the back of my head began to whisper "Hey... maybe it's not so bad to do that part manually," I did it. I pushed past the little voice because I knew it was lying to me.

If I can do it, alone and beset on all sides by two very-high-energy youngers, you can do it while you're sitting at your desk in your large, reasonably-well-funded corporate office (or home office).

...but that's not very important, is it?

We shouldn't do things just because we can, right?

I need a legitimate reason and I have one. It's not just that I can do it. It's that I absolutely need a completely automated pipeline.

I couldn't possibly afford to build Dwindle into something successful if I was spending all my time manually testing and manually promoting builds. I'm not saying I will make Dwindle a financial success but my chance would nil if I was just wasting all my time on those things. Most of my time would go to validating a handful of changes. I wouldn't have any time left over to hypothesize, innovate, or develop new features.

The marginal cost of investing in proper automation is negative. While this may be impossible when talking about manufacturing, it's one of the most basic principles of software development: Investing in things related to quality lowers costs.

So I built a fully automated pipeline with a mix of integration and unit tests for a very simple reason: You spend less to have a fully automated pipeline than you do without one.

...and if I can spend less to have one alone, you certainly can do it with a team.