Skip to content
arrow_backBack to Insights
Essay· 8 min read

The Numbers After Identity Engineering

Teams that declared what their software is supposed to be, then measured against it, keep reporting the same pattern: onboarding cuts in half, production outages drop to near zero, deploy frequency doubles. These are the closest real-world numbers we have for identity engineering, and they tell a consistent story.

Nobody publishes case studies titled "we declared our software's identity and here is what happened." The phrase does not appear in any vendor's marketing. But plenty of teams have done the structural equivalent: they wrote down what their software is, what it does, what it promises, and what it refuses to do. Then they built tooling to enforce those declarations. Then they measured the before and after.

The numbers are remarkably consistent across unrelated teams, industries, and tech stacks. Not because the teams coordinated, but because they all made the same structural move. They shifted from implicit understanding to explicit declaration. They moved identity upstream.

Here is what they found.

The onboarding effect

The most immediate payoff shows up in how fast new engineers become productive. This makes sense: if the system declares what it is, a new hire does not have to reverse-engineer that from the codebase.

Spotify built Backstage as an internal developer portal, a single place where every service declares its ownership, documentation, APIs, and operational status. Before Backstage, onboarding a new engineer at Spotify took roughly 60 days. After, it dropped to 20. That is not a small improvement. That is the difference between a new hire contributing in their third week versus their third month. Across the engineering organization, Spotify measured 2.3x more GitHub activity, double the deploy frequency, and 17% faster lead time. Code stayed in production 3x longer before being replaced. The Backstage team estimated the productivity gains were equivalent to freeing up 3 full-time engineers in every team of 10.

Toyota's software division saw something similar. Their platform team, Chofer, tackled the problem of environment setup for new projects. Before Chofer, getting a development environment configured could take weeks of provisioning, credential setup, and tribal knowledge transfer. Toyota reported over $10 million in cost reduction in 2022 from this work, with individual teams saving roughly 6 weeks and $250,000 per project.

Both cases share the same underlying mechanism. The system's identity, what it is, what it depends on, what it expects from its environment, went from being scattered across wikis, Slack threads, and people's heads to being declared in a structured, queryable format. Onboarding sped up not because the code got simpler, but because the declarations got explicit.

The stability effect

The second pattern is production stability. When services declare their contracts and boundaries explicitly, integration failures drop.

Boost Insurance adopted Pact for contract testing across their microservice architecture. Before Pact, they averaged one production outage per month from integration failures, plus roughly a dozen integration issues per quarter. After adopting contract testing (which is, structurally, a practice of declaring what each service promises to every other service), production outages dropped to virtually zero. Integration issues fell to a maximum of two per quarter. They measured an 80% increase in service stability overall.

A contract testing model analyzed by Resumly projects similar economics: defect leakage dropping from 9 per quarter to 2, mean time to detect falling from 36 hours to 8, and release cycles shortening from 12 days to 9, yielding an estimated $420,000 in annual savings. These are modeled figures, not a published case study, but they are consistent with the Boost Insurance results and with the 2023 State of Testing report's finding that contract testing adoption correlates with a 30% reduction in production incidents.

These are not testing improvements in the usual sense. The teams did not write more unit tests or improve their CI coverage. They declared what their services promised to each other, then automated the verification of those promises. The stability improvement came from making identity explicit at the boundary, not from catching more bugs in the implementation.

The DORA multiplier

The DORA research program has spent years measuring what separates high-performing engineering teams from everyone else. One of their less-discussed findings is how documentation quality amplifies every other practice.

The numbers are striking. Teams that adopted trunk-based development with high-quality documentation saw a 1,525% improvement in delivery performance. Teams that adopted trunk-based development without good documentation saw a 36% improvement. Same practice, wildly different outcomes, separated entirely by whether the team had declared what the system is and how it works.

The pattern holds across practices. Continuous integration with good documentation: 750% lift. Without: 34%. Continuous delivery with good documentation: 656%. Without: 63%.

Documentation here is doing something specific. It is not just "being helpful." It is acting as a declaration layer that lets every other practice work at full leverage. Trunk-based development without documentation means developers are merging to main without a shared understanding of what the system should be. With documentation, every merge is checked (at least mentally) against a declared identity. The practice is the same. The declaration layer determines whether it actually works.

The agent efficiency effect

This is where the story gets interesting for teams using AI agents in their development workflow.

An ETH Zurich study examined the effect of context files on LLM coding performance. The findings were counterintuitive. LLM-generated context files actually reduced task success by roughly 3% and increased token costs by over 20%. Human-written context files provided only a marginal 4% improvement. The researchers concluded that signal-to-noise ratio matters more than comprehensiveness. Dumping everything an agent might need into a context file does not help. It hurts.

A contrasting study looked specifically at AGENTS.md files, structured identity declarations for agent workflows. Repositories with well-written AGENTS.md files showed 28.64% lower median runtime and 16.58% reduced token consumption compared to repositories without them.

The difference between these two findings is not a contradiction. It is the same lesson from two angles. Generic context (here is everything about the codebase) adds noise. Structured identity (here is what this system is, what it does, and what the agent should know about working with it) reduces waste. The agent does less wandering, asks fewer dead-end questions, and arrives at correct solutions faster.

This maps directly to the DX Developer Experience Index finding that each 1-point improvement in developer experience saves 13 minutes per developer per week. For a 100-person engineering team, a 5-point improvement translates to roughly 5,000 hours annually, about $500,000 in recovered engineering time. When agents are part of the team, those minutes saved compound: the agent's time is cheaper, but its errors are just as expensive as a human's.

What the numbers have in common

These are different companies, different tools, different problems. Spotify built a developer portal. Toyota built a platform engineering team. Boost Insurance adopted contract testing. The DORA researchers measured documentation quality. The ETH Zurich team studied context files.

None of them used the phrase "identity engineering." But every one of them made the same structural move: they took something implicit (what the software is, what it promises, what it expects) and made it explicit and verifiable. Then they measured what happened.

The pattern across all of them:

MetricTypical BeforeTypical AfterSource
Onboarding time60 days20 daysSpotify/Backstage
Environment setupWeeksHoursToyota/Chofer
Production outages1/month~0Boost Insurance/Pact
Integration defects12/quarter2/quarterBoost Insurance/Pact
Mean time to detectDaysHoursContract testing adoption
Deploy frequencyBaseline2xSpotify/Backstage
Agent token consumptionBaseline-16.58%AGENTS.md study
Agent runtimeBaseline-28.64%AGENTS.md study

The improvements cluster around two effects. First, less time spent figuring out what the system is (onboarding, environment setup, agent context). Second, fewer failures at boundaries where two parts of the system have different ideas about what they promised each other (integration outages, contract violations, spec gaps).

Both effects trace to the same root cause: identity was implicit, and making it explicit removed an entire class of failure.

InnerSource adoption data reinforces this from a different angle. Teams with explicit, discoverable component identity consistently report higher rates of cross-team code reuse. When you can find out what something is without reading all of its source code, you are more likely to use it instead of rebuilding it.

Monday morning

If you want to run the same experiment, here is how to set a baseline.

Pick three numbers you can measure today. Onboarding time (how many days until a new engineer's first meaningful PR). Mean time to detect integration issues (how long between an integration defect being introduced and someone noticing). Deploy frequency (how many times per week your team ships to production).

Write them down. These are your "before."

Then pick one service, the one that breaks most often or the one your agents interact with most, and write its identity declaration. Not a novel. A document that answers: what does this service do, what does it not do, what does it promise to its consumers, and what does it expect from its dependencies. Make every claim specific enough that you could check it. "Responds within 200ms at P99" is checkable. "Is fast" is not.

Give that declaration to your team. Give it to your agents. Wait 90 days and measure those three numbers again.

You will not see Spotify's numbers on day one. They had years of iteration and a dedicated platform team. But the direction will be consistent, because the mechanism is consistent. Declaring what software is, explicitly and verifiably, removes an entire category of waste that shows up as slow onboarding, production surprises, and agent thrashing.

The teams that measured it found the same thing. The numbers are not ambiguous.


These are the results of declaring identity upstream. We are building tooling that makes those declarations structured, testable, and enforceable across your entire stack.

terminal

The ribo.dev Team

Building the identity layer for software.

We use cookies to understand how you use ribo.dev and improve your experience.

Learn more in our Cookie Policy