I’ve sat in a lot of infrastructure architecture discussions over the years, and there’s a pattern I’ve noticed that doesn’t get talked about enough: the conversations that look like they’re about technology are often actually about something else entirely. Who controls what. Who trusts whom. How much autonomy a team deserves. What the organization’s real risk tolerance is, as opposed to the stated one.
Take the decision between a shared infrastructure platform and team-owned infrastructure. On the surface, it’s an engineering question — operational efficiency, cost pooling, standardization versus flexibility. And those things are real. But underneath that conversation is almost always a question about power. A shared platform means a central team owns the stack and other teams are tenants. Team-owned infrastructure means each team makes its own choices and bears the consequences. Those aren’t just different operational models; they’re different statements about how much you trust individual teams to make good decisions, and how much you’re willing to pay for that trust in the form of inconsistency.
I’ve seen organizations choose shared platforms not primarily because they were more efficient (they often weren’t, especially early on) but because leadership didn’t fully trust the constituent teams to make sound decisions at speed. That’s a legitimate concern in some cases. But if you don’t name it, you end up having a technical debate about Kubernetes versus managed container services when the real conversation is about organizational confidence. And the technical debate never resolves the underlying issue — it just papers over it.
The organizations that made the cloud-versus-on-premise call badly almost always framed it as a pure cost comparison — lift-and-shift economics, capital versus operating expense — without accounting for what cloud is actually selling. Managed services abstract operational work. Elastic compute changes the economics of capacity planning. The pricing models for things like reserved capacity are designed to reward commitment and penalize uncertainty. None of that maps cleanly onto an on-premise mental model, and teams that bring their on-premise operating assumptions into cloud environments end up with the costs of cloud without the benefits.
But there’s a cultural piece too. Cloud adoption often accelerates teams that were ready to move fast and constrains teams that weren’t — not because of the technology, but because cloud requires a kind of ownership that’s different from owning physical infrastructure. You can spin up a thousand compute instances in five minutes. That’s extraordinary. It also means that if you haven’t built cost awareness into your engineering culture, your bill will tell you a story you weren’t expecting at the end of the month.
The managed-versus-self-hosted question is where I see the most magical thinking. Engineering teams consistently underestimate the ongoing cost of operating something they’ve built or hosted themselves, and they consistently overestimate how differentiated their requirements actually are. A few years ago, one of the engineering teams in my org pushed hard to self-host a component because the managed version “didn’t quite fit.” When we dug into what “didn’t quite fit” actually meant, it came down to about two configuration options the managed service didn’t expose at the time. The team went ahead with self-hosting, and for the next several quarters they carried operational burden that had nothing to do with delivering value to the product’s users. The managed service added those configuration options six months later. The technical argument had been real. The business argument wasn’t, and nobody wanted to say so in the room.
Self-hosting is sometimes the right call — but usually not for the reasons teams think. The “it doesn’t quite fit” argument almost always holds up against the managed service today and collapses against the managed service in six months. The point isn’t that one model wins — it’s that these decisions should be made with honest accounting of what they encode, not just what they cost to provision.
When I’m in a conversation about infrastructure now, I try to ask: what assumptions about our organization does this choice bake in? If we move to a shared platform, what do we do about the teams that need to move faster than the platform can serve them? If we give every team full infrastructure autonomy, what breaks at the seams between teams, and who owns that? These aren’t reasons to avoid the decision. They’re the questions that make the decision an actual decision rather than an artifact of whichever technical argument was most persuasive in the room.
Infrastructure is where organizational beliefs get made concrete. It’s worth being deliberate about which beliefs you’re encoding.
Further Reading
- The Relationship Between Cloud Architecture and Business Agility
- Containers, Microservices, and the Monolith Nobody Admits They Have
- How I Think About Technical Debt as a Business Tradeoff
- Making Architectural Decisions Without Being the Smartest Person in the Room
- What Building Products for Enterprises Taught Me About Simplicity