All posts

Multi-Agent Systems

Context Loss in Multi-Agent Systems: Why Agent Handoffs Fail

Multi-agent LLM systems do not fail because of bad agents. They fail at the handoffs. Here is why context loss happens and what structured state to pass instead.

Multi-agent systems fail when LLM agents lose critical context between decision points. This is context loss, and it is one of the leading causes of production failures in agentic workflows.

Single-agent systems are useful up to a point. To scale to the next level, organizations need to construct multi-agent systems that can stay coordinated, observable, and auditable.

Usually, organizations start with high-minded goals that seem deceptively simple: create a set of agents with different roles and connect them into a reasonable workflow. Boxes, arrows, and labels come together to form an impressive diagram. This is going to be great.

Then you implement it. And it does not work at all. The quality is spotty at best and low quality at worst.

What happened?

You just got bitten by context loss: the recurring failure mode in multi-agent LLM systems. Each agent in isolation produced reasonable output, but together, the thread got lost. The problem is not the agents themselves. It is the handoff between the different subsystems.

Agent handoffs vs. tool calls and why it matters

Tool calls are bounded, specific function calls to narrowly defined tools. Handoffs are broader decision points that require more care than simple tool calls.

Why? Because the scope of handoffs is much broader than the scope of a simple tool call. In a general workflow, it is not one handoff that matters. It is all of them. The weakest link in a chain defines how strong the chain is, and information flow is no exception.

This is not a new problem. Information theory was first written about in 1948 by Claude Shannon, who framed communication as messages moving between senders and receivers through a noisy channel. The central problem was eliminating uncertainty between the sender and the receiver.

In 1956, John Kelly Jr. reframed the question in a more practical way. Instead of asking how much data got through, he asked what information is needed to make a difference to future outcomes. Any other information is immaterial to the problem at hand.

This applies directly to agents. How do we determine what makes a difference to the next step in a workflow?

Each agent, usually an LLM, works in relative isolation once it receives its inputs. It processes those inputs, generates output, and sends that output to the next step.

Clear roles are not enough. We have to inspect each handoff. Is the information complete? Will it help the next phase make the right decision?

One solution people try is to include more information in the message. This is a trap. Context rot happens when the context window fills with irrelevant information that the LLM has to dig through to find what matters. Just like a human, the more confusing the instructions, the more confused the output.

Repeated small errors add up. Eventually, noise overwhelms the signal. Without intervention, that condition is hard to recover from.

Agent handoffs are data transformations, not message passing

Can we just summarize the information?

Yes and no. Each handoff should eliminate irrelevant data, but too much compression can strip out the useful pieces. We cannot blindly summarize. We need to categorize what type of information is important for the next stage of the pipeline to act.

"This might be nice" is not enough. The information must be actionable.

The first place to optimize information flow is by not using text at the top level at all. We want structured data, not an essay. The data structure can be validated and passed along across clear boundaries.

Structured information can, and probably should, contain text. But that text should live in a knowable, defined context. Schema-validated, typed, predictable output is what makes handoffs reliable. The receiving agent knows exactly what it is getting and where to find it.

Handoffs are transformations of data, not blobs of meaning. Each step should add pertinent data and remove irrelevant noise.

Ways multi-agent systems lose context at handoffs

Multi-agent context loss tends to cluster into six recurring failure modes.

  • Goal loss: Locally, the agent does what it thinks is right, but the larger purpose is lost. The agent lawyers instead of resolving.
  • Evidence loss: Conclusions flow to the output, but they are unsupported by evidence. The receiving agent has no way to verify truth, so it has no way to measure accuracy.
  • Reasoning loss: The receiving agent gets the conclusion but loses the minimized reasoning that led to it. Confidence drops because the next step cannot tell why a decision was made.
  • Uncertainty loss: An uncertain fact gets passed down the chain as settled. This multiplies errors and creates overconfidence.
  • Constraint loss: Decisions are made inside a context of budgets, time, compliance, security, and scope. If those constraints are lost, later agents may draw outside the lines.
  • Authority loss: The agent makes decisions outside its purview. Later decision-makers become misinformed at best and irrelevant at worst.

What to include in every agent handoff: a state template

Everyone loves a good story. It is the primary way humans communicate. Each workflow may tell its own story, but every handoff is not a story. Treating it like one creates extra information that pollutes every stage of the pipeline.

Keeping handoffs clean means giving each agent exactly what it needs and expects.

Human systems do this automatically. Bad handoffs usually trigger corrective action, like asking the sender to clarify. When they do not, systems become slow, frustrating, and bureaucratic. The best way to avoid bad handoffs is to reduce uncertainty as much as possible.

Intent matters. Clean handoffs reduce retries and delays.

Think of each transition as delivering not a message, but state. State is the information needed to process the next link in the chain.

The exact state depends on each agent, but a useful handoff template usually includes:

  • Original goal: The task as scoped at the top of the workflow, unchanged.
  • Decisions made: What was resolved at this stage and why.
  • Supporting evidence: The inputs or data that justified each decision.
  • Active assumptions: What was treated as true but not verified.
  • Remaining uncertainty or risk: Open questions the next agent needs to account for.
  • Binding constraints: Budget, compliance, time, or scope limits still in effect.
  • Current authority: What this agent was and was not authorized to decide.
  • Next responsibility: What the receiving agent is expected to do with this state.

This should be stored in a persistent memory store, whether that is a database, flat files, logs, or another system that fits the workflow. You will need it when something goes wrong.

Persistent state lets you see the world from the agent's point of view, not just the larger narrative.

Context loss can never be eliminated, but it can be minimized and mitigated. Recognizing that fact keeps teams from being surprised when things go south. Each step should be able to answer clear questions. Only information that affects the next decision is useful.

Multi-agent systems with human collaborators: meet the Neoteams

As agentic systems mature, the most important handoffs will not be agent-to-agent. They will be agent-to-human.

Neoteams are combinations of humans and agents. Up until now, AI has mostly been discussed in terms of what models can do. We are entering a stage where coordinated systems of agents, humans, and automated tools pass work between each other.

Future organizations will look less like static org charts and more like neoteams.

The next frontier is coherence at scale.