Skip to content
arrow_backBack to Insights
Essay· 9 min read

The Bottleneck Flipped

In 2025, the hard problem was finding enough context for agents. In 2026, the hard problem is finding less of it. When identity exists upstream, the new skill is slicing -- carving the smallest coherent cross-section of DNA that contains all the concerns for a given task.

For most of 2025, the hard part of working with AI agents was getting them enough context. You would open a coding agent, describe the feature you wanted, and then spend the next twenty minutes feeding it files. Here is the schema. Here is the API contract. Here is the design doc from last quarter. Here is the Slack thread where we decided to change the auth flow. The agent could write code. The bottleneck was you, assembling the context it needed to write the right code.

That bottleneck is flipping. Not because agents got smarter, but because identity is moving upstream. When a system has a DNA layer that declares what it is, what it promises, and where its boundaries are, the problem stops being "can I find enough context?" and starts being "can I find the right slice?"

This is a different skill. And almost nobody is talking about it yet.

Context engineering was the right name for the wrong era

The term took hold fast. In June 2025, Tobi Lutke wrote that he preferred "context engineering" over "prompt engineering" because it described the core skill better: the art of providing all the context for a task to be plausibly solvable by the LLM. Andrej Karpathy agreed, calling context engineering "the delicate art and science of filling the context window with just the right information for the next step." Simon Willison endorsed it days later, arguing the term had sticking power because its inferred definition was much closer to the intended meaning than "prompt engineering," which most people misunderstood as typing things into a chatbot.

They were right about the skill. The practice of assembling context for agents was real, time-consuming, and poorly understood. What they were naming, though, was a symptom. The reason developers spent so much time on context assembly was that the context did not exist in any structured, queryable form. It was scattered across READMEs, Confluence pages, Slack threads, and the heads of senior engineers. Context engineering was the discipline of compensating for that absence.

The METR randomized controlled trial from July 2025 showed what that compensation cost. Sixteen experienced open-source developers using AI tools (Cursor Pro with Claude 3.5/3.7 Sonnet) completed tasks 19% slower than developers working without AI. The developers predicted a 24% speedup before starting and believed they had gotten a 20% speedup afterward. The gap between perception and reality was the context tax. The time they spent finding, validating, and feeding context to the agent consumed the time the agent saved them on implementation.

More context makes agents worse

Context windows grew. Claude reached 200K tokens, then a million. Gemini hit a million. GPT-4.1 followed. The assumption was obvious: bigger windows, better agents. Give the model everything and let it figure out what matters.

The research says the opposite.

Du et al. (2025) ran what should have been the definitive test. They gave models all the relevant information with perfect retrieval, then varied the total input length by adding irrelevant padding. Performance degraded between 13.9% and 85% as input length increased. Even when they replaced irrelevant tokens with whitespace and forced models to attend only to relevant tokens, the degradation persisted. Length alone hurts, independent of retrieval quality.

The Chroma "Context Rot" study from July 2025 tested 18 frontier models across 194,480 API calls. Every single model exhibited performance degradation as input length increased, even on trivially simple tasks like replicating a sequence of repeated words. The finding that surprised the researchers: coherent, well-organized documents paradoxically hurt performance more than shuffled ones, because organized context created stronger competing signals that pulled the model's attention away from the task.

The original "Lost in the Middle" paper by Liu et al. documented the mechanism as early as 2023. Performance follows a U-shaped curve, highest when relevant information is at the beginning or end of context, with 30%+ accuracy drops when the answer sits in the middle. Larger windows make the middle larger.

Drew Breunig catalogued four failure modes that emerge as context grows. Context poisoning, where hallucinations enter the context and get repeatedly referenced. Context distraction, where the model over-focuses on context at the expense of its training knowledge (Databricks found correctness declined around 32K tokens for Llama 3.1 405b). Context confusion, where superfluous content degrades output quality (on the Berkeley Function-Calling Leaderboard, models performed worse with more tools available; one quantized model failed with 46 tools but succeeded with 19). And context clash, where new information conflicts with existing context (Microsoft and Salesforce research, as documented by Breunig, found a 39% average performance drop when prompts were sharded across multiple turns).

In typical coding tasks, the signal-to-noise ratio can be as low as 2.5%. Five hundred relevant tokens among twenty thousand total. The rest is navigation overhead, file structure, boilerplate, and code the agent will never touch.

Anthropic's own engineering guidance on context states the principle plainly: "Find the smallest set of high-signal tokens that maximize the likelihood of your desired outcome." Not the largest. The smallest.

When identity exists upstream, the skill becomes slicing

Here is where the shift happens.

Without a DNA layer, you have no choice but to go hunting. The context an agent needs is spread across files, docs, and human memory. You find what you can, stuff it into the context window, and hope the signal-to-noise ratio is high enough for the model to produce something useful. Context engineering is the discipline of doing this hunt well.

With a DNA layer, the context already exists in a structured, queryable form. The service declares what it is, what it promises, what it depends on, what it refuses to do. Its contracts are explicit. Its boundaries are named. Its constraints are verifiable. The information that developers used to spend twenty minutes assembling is already there.

But you cannot feed all of it to an agent at once. A 200-service system with full DNA declarations might represent hundreds of thousands of tokens of identity. The research is clear: dumping that into a context window will make the agent worse, not better. You need a slice.

A slice is the smallest coherent cross-section of identity that contains all the concerns relevant to a specific task. Not all the identity. Not a random sample. The specific subset where every declaration is load-bearing for the work at hand, and nothing else is included.

A different skill from context engineering. Context engineering asks: what does this agent need to know? Identity slicing asks: what is the smallest coherent boundary around this task's concerns?

What a slice looks like

The shape of a slice depends on the shape of the task.

If you are building a form that creates a new entity, the slice is data mutation. You need the contracts for how data enters the system, the validation rules declared in the DNA, the downstream consumers who will react to the new entity, and the constraints on what constitutes a valid state. You do not need the deployment topology, the incident runbook, or the observability configuration. Those are real identity. They are not relevant identity for this task.

An authentication bug calls for a different cross-section entirely: trust boundaries. Which services issue tokens, which consume them, what the session lifecycle looks like, what happens when auth is unavailable. The data schema and API pagination contracts are irrelevant here, even though they are real identity.

Splitting a service into two is a dependency slice. You need the contracts every consumer depends on, the boundaries that define what belongs in each new service, and the guarantees that must survive the split. You need the declared expectations other services have of you, not their implementation details.

Each slice has a natural boundary. It includes the identity of the component being changed, the identities of the components directly affected by the change, and the contracts that connect them. Everything outside that boundary is noise for this task, no matter how important it is in general.

The analogy is surgical. A surgeon operating on a knee does not need a full-body MRI displayed during the procedure. They need the imaging of the knee, the relevant ligament attachments, and the vascular map of the immediate area. More imaging would not help. It would distract.

This is architecture, not prompt engineering

Finding the right slice requires understanding the system's structure. Which services are coupled? Which contracts are load-bearing? Which boundaries are real versus aspirational? Which concerns cross-cut, and which are isolated?

These are architectural questions. They are the same questions a staff engineer asks when scoping a project, decomposing a system, or reviewing an architecture proposal. The difference is that the answers need to be legible to machines, not just humans, and they need to fit within a token budget.

This is why identity engineering and identity slicing are complementary. You cannot slice what does not exist. Without structured DNA declarations, there is nothing to slice from. You are back to ad-hoc context assembly, hunting through files and hoping you found everything. But once declarations exist, the skill of working with agents shifts from retrieval to scoping. From finding context to finding the boundary.

The Stack Overflow 2025 Developer Survey found that 84% of developers use or plan to use AI tools, but trust dropped to 29%, down 11 points from the prior year. The top frustration, cited by 66%, was AI solutions that are "almost right, but not quite." That "almost right" gap is the signal-to-noise problem. The agent had enough context to get close but not the right context to get it right. Better slicing, not more context, is how that gap closes.

The context window will keep growing. The slice will keep shrinking.

Context windows will hit 10 million tokens within a year or two. The instinct will be to fill them. The research says that instinct is wrong. Du et al. showed that length hurts even with perfect retrieval. Chroma showed it on 18 frontier models. Every vendor's own guidance says to minimize, not maximize, context.

The right response is not to ignore larger windows. They are useful. They let you include more of the identity graph when the task requires it. A refactoring task that touches eight services needs a larger slice than a bug fix that touches one. The window creates headroom. But headroom is not the same as a mandate to fill it.

The skill that matters is knowing how much headroom you actually need. A 50-token identity declaration for a single service contract might be all an agent needs to implement a specific endpoint correctly. A 5,000-token slice covering three services and their shared contracts might be what a cross-service refactoring requires. The developer who can look at a task and identify the minimal coherent slice of DNA, not the maximal pile of context, will get better agent output with lower cost and fewer false starts.

This is why coding agents burned 60-80% of their tokens on navigation in 2025. They had no identity layer to slice from, so they spent most of their budget figuring out what the system was before they could start building. With structured DNA, that navigation cost collapses. The agent starts with the slice it needs and spends its entire budget on the actual task.

Monday morning

Next time you hand an agent a task, pause before you start feeding it files.

Ask: what is the smallest set of identity declarations that contains all the concerns for this task? Not what files does the agent need to see. What identity does it need to understand?

If the task is a form, the slice is the data mutation boundary: input contracts, validation rules, downstream consumers. If the task is a bug fix, the slice is the failure mode boundary: what the service promises, what broke, what depends on the broken guarantee. If the task is a new feature, the slice is the capability boundary: what the service does today, what it refuses to do, and which of those refusals needs to change.

Start building a vocabulary for slices on your team. Name the common ones: mutation slice, trust slice, dependency slice, contract slice. When engineers learn to scope identity the same way they scope stories, agent output improves because the context gets tighter, not larger.

The bottleneck flipped. In 2025, you could not find enough context. In 2026, you have all of it. The new skill is knowing how much to leave out.


Identity slicing starts with declarations that are structured enough to query. We are building the tooling that makes DNA layers sliceable, not just readable.

terminal

The ribo.dev Team

Building the identity layer for software.

We use cookies to understand how you use ribo.dev and improve your experience.

Learn more in our Cookie Policy