Thesis
Most CTO metric systems do not fail because the numbers are wrong. They fail because the organization quietly turns measurement into theatre. Dashboards make engineering look legible, but if the measures are too far away from actual customer value, technical learning, and operational risk, the team learns to improve the dashboard instead of the system.
The dangerous version is not a CTO who measures nothing. It is a CTO who measures just enough to create confidence while missing whether the product, architecture, and team are improving. Velocity goes up. Story counts look stable. Pull requests move. Deployments happen. Meanwhile defects repeat, product bets do not get clearer, engineers avoid the riskiest work, and leadership starts mistaking motion for progress.
The argument
Software delivery metrics are useful when they preserve the conversation they were designed to start. They become harmful when they replace that conversation.
A healthy metric answers one narrow question: where should we look next? An unhealthy metric answers a political question: who looks productive, who looks slow, and which department can claim improvement this quarter?
That distinction matters because engineering work is not factory output. The highest-leverage work often reduces visible activity: deleting a risky dependency, simplifying a release path, clarifying product ambiguity before code starts, removing an unnecessary integration, or saying no to a feature that would create permanent operational drag. A metric system obsessed with visible activity will punish exactly the work that improves the system.
Why CTO dashboards drift into theatre
1. The metric becomes the target
Goodhart's law is the core failure mode: when a measure becomes a target, it stops being a good measure. In engineering, this shows up when teams learn which number leadership wants to see and optimize for that number locally.
If cycle time is rewarded, teams split work into smaller tickets without reducing real delivery risk. If story points are rewarded, estimation becomes negotiation. If pull-request volume is rewarded, engineers create review noise. If deployment frequency is rewarded without quality context, shipping thin changes becomes safer than solving deeper product or architecture problems.
The metric still improves. The system does not.
2. Activity metrics are easier to collect than outcome metrics
The easiest engineering numbers come from tools: commits, pull requests, tickets, deployments, code review counts, incident counts, build times, and lead time. These are not useless. But they are mostly signals about the delivery system, not proof that the business or product is improving.
The harder questions are usually the CTO questions:
- Did this work reduce customer pain?
- Did it reduce operational risk?
- Did it make future changes cheaper?
- Did it clarify an important product uncertainty?
- Did it improve the team's ability to make good decisions without escalation?
Those questions require judgment. Tool dashboards can support that judgment, but they cannot replace it.
3. The team learns what not to say
A metric system also creates a speech environment. If the dashboard is used for accountability theatre, engineers learn to keep bad news out of the system until it is unavoidable. Risk gets renamed as "edge cases." Unclear requirements become "almost done." Architecture concerns become "refactoring later." Incidents become one-off mistakes instead of system feedback.
The CTO then sees green dashboards and red reality.
The problem is not that engineers are dishonest. The problem is that leadership has created a measurement environment where surfacing uncertainty makes people look unproductive.
4. Local optimization hides system failure
Engineering metrics often improve inside one team while the end-to-end system gets worse. A backend team can improve throughput while product discovery remains weak. A platform team can improve deployment frequency while customer-facing teams spend more time coordinating. A feature team can finish tickets faster while QA, support, and operations absorb the cost.
The dashboard says the team improved. The organization experiences more drag.
This is why CTO metrics need a system boundary. If the boundary is too small, the metric creates local winners and global waste.
What the better research points toward
The strongest modern engineering measurement guidance has moved away from one-dimensional productivity scoring.
DORA metrics are valuable because they combine speed and stability: deployment frequency, lead time for changes, change failure rate, and failed deployment recovery time. The point is not to worship the four numbers. The point is to prevent the common lie that speed and quality must be traded off blindly. If throughput rises while failure rate and recovery time get worse, the system is not simply "more productive."
The SPACE framework makes a similar correction by arguing that developer productivity needs multiple dimensions: Satisfaction and well-being, Performance, Activity, Communication and collaboration, and Efficiency and flow. Its practical warning is that no single metric can represent developer productivity. A team can look active while being burned out, blocked, or producing low-value output.
Both approaches push CTOs toward a portfolio of signals and away from a single productivity score. That matters because the moment leadership compresses engineering health into one number, teams start managing the number.
The CTO-level failure pattern
The usual failure pattern looks like this:
- Leadership asks engineering to prove productivity.
- The CTO chooses metrics that are easy to collect and easy to explain.
- Managers start reviewing the numbers in status meetings.
- Teams learn which numbers create praise or pressure.
- Local behavior shifts toward improving visible measures.
- The deeper system problems stay untouched because they are harder to measure.
- Leadership concludes the team is improving because the dashboard improved.
This is how a company can have better metrics and worse engineering judgment at the same time.
What CTOs should measure instead
The answer is not to abandon measurement. The answer is to measure in a way that keeps judgment alive.
1. Use metrics as prompts, not verdicts
A metric should trigger a question before it triggers a conclusion.
Bad: "Cycle time is up, the team is slow."
Better: "Cycle time is up. Is the work larger, more ambiguous, more cross-team, blocked by review, or carrying hidden quality work?"
Bad: "Deployment frequency is down, productivity is down."
Better: "Deployment frequency is down. Did we reduce unnecessary releases, batch risky work, hit environment friction, or lose confidence in tests?"
Metrics should narrow attention, not end the conversation.
2. Pair speed with quality and recovery
Every speed metric needs a counterweight. If you track lead time, track escaped defects, change failure, rework, or incident recovery. If you track throughput, track whether the work moved a product or risk metric. If you track deployment frequency, track confidence in rollback, observability, and support impact.
A team that ships faster while creating more rework is not faster. It has moved the cost downstream.
3. Measure flow at the system boundary
The useful boundary is not always the engineering team. Sometimes it is idea-to-learning, customer-problem-to-resolution, incident-to-learning, or decision-to-production.
A CTO should care less about whether engineers are busy and more about whether the organization can turn important decisions into reliable changes without accumulating hidden risk.
4. Protect invisible high-leverage work
The metric system must leave room for work that reduces future effort but does not look productive today:
- deleting code
- removing features
- simplifying architecture
- improving tests where they reduce real fear
- writing operational runbooks
- clarifying ownership
- reducing handoffs
- improving observability
- resolving chronic incidents at the root
If the dashboard cannot see this work, leadership has to name it explicitly so teams are not punished for doing it.
5. Review metrics with stories
The best CTO review is not a dashboard review. It is a dashboard plus examples review.
For every metric movement, ask for one concrete story:
- What changed in the system?
- What did we learn?
- What work became easier?
- What risk went down?
- What customer or team pain changed?
- What behavior might this metric be accidentally encouraging?
The story keeps the metric connected to reality.
The practical diagnostic
If your CTO metrics are making the team look productive while nothing improves, you will see these signs:
- Delivery numbers improve, but product outcomes do not.
- Engineers are busier, but decisions are not clearer.
- Ticket throughput rises, but rework stays high.
- Teams optimize estimates instead of uncertainty.
- Incidents repeat with different names.
- Pull requests move faster, but architectural risk accumulates.
- People avoid work that would hurt short-term metrics but improve the system.
- Status meetings focus on explaining numbers instead of changing conditions.
The fix is to stop treating productivity as a scoreboard and start treating it as a system health investigation.
The Serious CTO take
A CTO's job is not to make engineering look productive. It is to build an organization where the right work becomes easier to do and the wrong work becomes harder to hide.
Metrics can help. But if your metrics mostly prove that people are moving, they are not CTO metrics. They are motion detectors.
The question is not: "Are engineers producing enough?"
The question is: "Is our engineering system getting better at turning important judgment into safe, valuable change?"
If the dashboard cannot answer that, it should not be trusted as the story of engineering productivity.
Works Cited / Further Reading
- Nicole Forsgren, Jez Humble, and Gene Kim, Accelerate: The Science of Lean Software and DevOps.
- DORA, "DORA's software delivery metrics: the four keys." https://dora.dev/guides/dora-metrics-four-keys/
- Nicole Forsgren et al., "The SPACE of Developer Productivity: There's more to it than you think." https://research.google/pubs/the-space-of-developer-productivity-theres-more-to-it-than-you-think/
- ACM Queue, "The SPACE of Developer Productivity." https://queue.acm.org/detail.cfm?id=3454124
- Charles Goodhart, "Problems of Monetary Management: The U.K. Experience"; commonly summarized as Goodhart's law.
- Atlassian, "How to measure developer productivity." https://www.atlassian.com/blog/productivity/measure-developer-productivity
Generation note
This research draft was generated by the scheduled Serious CTO research processor after live web extraction was unavailable in the cron environment. It uses established, named software-delivery measurement sources and visible source URLs so the publishing step can preserve citations.
Comments
Post a Comment