On Reversibility

Jan 14, 2026 8 min read

Reversibility is the most important quality of good software decisionmaking.

This is not a new idea.

Reversibility is more important than knowing “will this work at all?”. If you make a wrong decision but can reverse it, you’re fine. If you make decisions you can’t reverse and your strategy is “hope it’s the right call”, you’re building systems on the power of prayer.

What we talk about when we talk about reversibility #

Reversibility is not the same as the ability to iterate/improve on incomplete solutions. Iteration is well and good, but if you equate the ability to improve a system with the ability to go back to the way things were before you had that system, your outcomes will land somewhere between “endless struggles while continually wondering if this could have been done a different way” and “existentially annihilating hamster-wheel of layering forward-(not-quite-)fixes onto a cautionary monument to the sunk cost fallacy.”

Reversibility means undoing the thing you changed, un-building the system, going back to how things were before you deployed the bad solution. Reversibility does not mean trying another solution to the problem while the wrong solution is deployed; it means putting the car in reverse.

Reversibility does not mean committing less-than-fully to an approach. It means executing on that approach in such a way that it is possible (maybe not easy, but possible) to go back to the state of the system/business/product/whatever before that approach was taken.

Reversing should not be understood as failure–or, if it is, that word should be destigmatized from all “this should never have been done/how dare you” baggage–but rather as a property systems should have. In the same way that a healthy organization wouldn’t deploy a database on the assumption that it will never be down, and in the same way that a healthy organization would practice blameless troubleshooting of downtime when it does occur, reversing a decision should be understood as an inevitable system property to be minimized but not eliminated–think “hardware failure”, not abject dereliction of change control.

Once you have backed out of the swamp, you can decide what to do instead. The right answer to “what to do instead” might be to do nothing and deal with leaving the problem unsolved.

These are all valid outcomes of a healthy architectural decisionmaking environment:

“We are going to return to a state that had a problem that we don’t have now, because the thing that solved that problem caused other problems that were not worth the cost of the solution.”
“Now that our thinking and free time aren’t clouded by the costs of the bad solution, we’ll go back to the drawing board and try a different approach.”
“We are rolling back this decision and decide that we will not solve the original problem. Not now, and maybe not ever!” A common reason for reversing a decision is that the underlying problem was not fully understood. Trying and failing is a good way to understand the problem more fully. The corollary is that sometimes the lesson learned is that the problem is not worth solving. This is particularly acute when deciding to automate working (but slow/expensive/fallible) non-automated processes. Sometimes the right call is “we are going to continue paying a person to generate this report by hand to the tune of a few hours of work a month.”

First order effects of prioritizing reversibility #

You can reverse bad decisions.

Second-order effects of prioritizing reversibility #

You will tend to make decisions that–whether they’re ever reversed or not–have well-understood rationale.(1) (1)Prioritizing reversibility is a simple and high-leverage way to achieve this; there are plenty of other ways to encourage shared understanding as well.
There will be fewer sacred cows; no historical bad old days that it is “unthinkable” to go back to. Going back and trying a different direction will always be part of the conversation.
It will help preserve the values of humility and curiosity rather than blame and superiority (and those values are still important if you’re the only person working on a system!).
It’ll get easier to pull apart sources of cost: which costs come from the problem being solved, and which are self-imposed/sunk-cost-fallacy-based costs imposed by a bad solution? “Cost” is interpreted broadly here; this advice applies equally to businesses, and systems without financial consideration.

Case study: Testing #

I’ve been part of a few groups working on complex software systems which didn’t have any automated tests. Manual tests existed. So did many hand-runnable programmatic unit/integration/system tests, but which tests were run when/how was up to the people releasing changes. This is a surprisingly common situation, and not just in small projects.

Each time, the following scenario played out:

Problem identified: no automated tests. Identified as a problem either due to released defect rates or it being a bad practice outright.
Solution chosen: use automated CI testing for changes being released; people should write tests along with features.
Solution implemented: CI runs automated tests and people write them.
New problem identified: product owners are confused as to why releases take a long time. Engineers answer “we had to spend a long time writing/fixing tests, even though the thing being released was ready before then.”
New problems identified: flaky tests, poor quality tests, slow tests. The usual growing pains of automated-testing adoption.
Solution chosen: switch automated testing system used (framework, CI system, autogenerate test code, etc.)
Solution implemented: new automated testing system used.
New problem: much rejoicing in engineering because the new system was better than the old system, which swiftly ends because, improvement or not, there is still plenty of slowness/flakiness/quality issues in the new testing system. Product still unhappy.
Solution chosen: change testing system again/use TDD/add coverage metrics as a KPI/etc.
End state: things are not great.

So why did this end in a not-great state? There are plenty of well-known lessons to take away: track defect/rollback rates; communicate expectations to product/stakeholders before implementing new engineering processes; make data-driven decisions based on CFR/false-failure-rate/whatever; presence-of-tests != culture-of-testing, and so on.

Those are all good and important things.

Decision reversibility is more important than all of them.

It’s more important because plenty of people will not or cannot do all of the “correct” solutions(2) (2)Raise your hand if you think that that shipping slower is worth it for quality through automated testing. Cool. Now keep your hand up if you think that all the different parts of your business will agree on how much slower is an acceptable tradeoff for a testing culture. Yeah, that’s what I thought. . If you can only operationalize one good-decisionmaking practice,(3) (3)“Only able to operationalize at most one good decisionmaking practice” is common situation which organizations are rarely self-aware enough to notice that they’re in. pick reversibility.

In those testing-related anecdotes, the long-term fix was surprisingly similar:

Reverse: go back to the bad old days of not running automated tests. Yes, really. With all the tradeoffs that entails: higher predictability in release cadence, higher ownership of feature delivery by engineers, lower reliability, more fear associated with releases, etc.
Honestly assess what went wrong with the attempted solutions. Honesty is impossible to achieve if the attempted solution is still in place. Reversing first makes discussions of what went wrong/what to do instead less fraught, urgent, and fragile.
Decide what alternative solutions–if any–should be tried. This is also extremely hard to do when partial/defective solutions (and people invested in forward-fixing them) are already in place.

In most of those cases, better automated testing systems were built after reversing. They worked out. In a couple of cases, teams chose to remain without automated testing. That worked out, too. Yes, really.

Lessons #

Reversibility means reversibility. Not agility/iteration ability. Not forward fixing, going back to the way things were before.

“Reverse and try again” is not always the right call, but having the ability to do so is. Put another way: being able to go back to a previous state is hugely beneficial to your organization/software, even if you rarely actually do it, for the same reason that the test that always passes is valuable, too.

Reversibility is a value, not a process. As such, it can be adopted informally (by creating a culture that values people who value reversibility) or formally (ADRs, rollback plans, stability levels, back-compat guarantees, etc.).

Reversibility applies at all scales:

A deployment system that allows for rapid rollback of individual commits/deployments is better than one that doesn’t.
Infrastructure changes which can easily be un-done are better than ones that can’t be (“If switching to MongoDB doesn’t pan out, Postgres is still there and we can spend a lot more money and scale it up while we figure out what we want to do for database scalability”).
Architectural decisions which can be easily un-done are better than ones that can’t be (“We are going to use CQRS for everything…but we’re going to keep all the prior databases/flows around untouched and do double-writing(4) (4)This rhymes with the “infrastructure” bullet above. It turns out that reversibility-first decisionmaking frequently involves double-writing. until we are sure that CQRS is adding value(5) (5)It isn’t. ”).
Corporate reorganizations which can be un-done are better than ones that can’t be (“You mean infosec for our SaaS product shouldn’t report to the head of workstation IT? Well let’s put the old org chart back then!”).

Nobody makes good decisions while committed to a bad-but-already-shipped solution. No, you’re not special–nobody. Between split focus, sunk-costing, ego, fear, and incentives, this is always a bad move. Yes, plenty of successful groups do this and still succeed. Lots of people smoke and live a long time, too.

Prioritize reversibility even if you have limited resources/maturity/ability to commit to changes. Especially then.

Going back to the old state is never, ever as bad as you think. It’ll be fine. I prommy. Things worked in the old state, however badly, after all.