The Org-Wide Cost of Getting Metrics Wrong

When you’re running a large engineering organization — multiple teams, multiple managers reporting to you — every metric you put on a weekly review becomes a directive. Not because you framed it that way, but because your managers are smart people who understand what you’re paying attention to. The moment something lands on your leadership dashboard, three or four teams start orienting around it simultaneously.

I learned this slowly, and then all at once.

A few years ago, I decided my org had a deployment frequency problem. The data supported it — teams were shipping to production every few weeks when the industry benchmark for high-performing teams was multiple times a week. So I made it visible. I added deployment frequency to our engineering health dashboard. I talked about it in quarterly reviews. I asked my managers to track it.

Within two quarters, the numbers looked great. Every team in the org had improved. I was pleased with myself for about a month, until I started pulling on a thread I noticed in skip-level conversations. Engineers were describing a pattern I didn’t recognize: they were spending meaningful time figuring out how to structure releases — not for technical reasons, but to move the metric. Incomplete features going out behind flags. Work artificially split into smaller units that created coordination overhead downstream. Healthy engineering judgment about when something was ready to ship quietly replaced by a different question: will this help the number?

The metric had improved. I had no idea whether the engineering had.

What made this worse at my level was that the dysfunction wasn’t contained to one team. Every manager in my org had gotten the same signal at the same time, and every team had responded to it. I’d effectively synchronized a bad incentive across multiple engineering teams simultaneously. A single manager making this mistake affects their team. A director making it affects the entire org — and the recovery is slower, because undoing a norm you’ve established at that level takes longer than establishing it did.

This is what I’ve come to understand about metrics that I didn’t fully reckon with earlier in my career: at scale, the choice of what to measure is a form of organizational design. It’s not neutral. The metrics you put in front of your managers shape what their teams talk about, what gets prioritized in planning, what engineers think you actually care about. You can have a very good strategy document and a thoughtful set of values and still undercut all of it with a dashboard that signals something different.

The change I made wasn’t to stop measuring things. It was to separate the metrics I use to diagnose from the metrics I share organizationally. A diagnostic is something I look at with my leadership team when I’m trying to understand whether something is healthy. A shared metric — something that goes on a dashboard my managers track or comes up in a business review — is a statement about what the org should care about. Those are two different categories, and treating them the same way is how you get multiple engineering teams quietly working on the wrong problem.

What I pay attention to now, and share carefully: customer impact metrics, because those are hardest to fake. Outcomes over outputs — whether what we built actually did what we thought it would. Engineering health signals that I interpret in context with the people who understand what’s behind them, not metrics that publish to a board and invite optimization.

The harder part of accepting this is what you lose. You don’t get clean dashboards. You don’t get numbers that go consistently up and to the right in ways that are easy to present in a business review. What you get instead is a leadership team that’s had the harder conversation about what’s actually working, and an engineering org that hasn’t been trained to optimize for the wrong thing.

That trade is worth it. I just wish I’d understood it before I scaled the bad metric across the whole org.

Further Reading#

Further Reading