NewGenApps

Single-agent vs multi-agent: when each architecture is the right call

In short: A single well-scoped agent is the correct architecture for a bounded, low-stakes job — adding agents to a task that does not warrant them is overhead, not maturity. Multi-agent earns its keep only when the work spans roles that should not grade each other and when the cost of a confidently wrong answer is high enough to justify more components, more latency, and an orchestration layer. The right question is not "how many agents" but "do the parts of this job need to be separated so they can check each other?"


We have been making early architecture calls since 2009, when we built an augmented-reality demo on the G1 — the first Android phone — before the category had a name. The recording of that demo is still public (vimeo.com/618305787). The pattern established then is the same discipline this page is built on: read the signal early, build on the raw stack, and ask how much architecture a job actually warrants before committing to complexity that outlasts the problem.

The agent architecture decision in 2025 is structurally identical. Multi-agent systems are technically possible; they are widely discussed; Gartner tracked a 1,445 percent increase in client inquiries about them from Q1 2024 to Q2 2025 (Gartner, Multiagent Systems in Enterprise AI, December 18, 2025). That same inquiry surge is paired with a Gartner prediction that over 40 percent of agentic AI projects will be canceled by the end of 2027, driven by escalating costs, unclear business value, and inadequate risk controls (Gartner press release, June 25, 2025, analyst Anushree Verma). Both data points are true at once, and together they say the same thing: the question is not whether multi-agent is a real pattern but whether a specific job warrants it.


What is the difference between a single-agent and a multi-agent system?

A single-agent system is one model loop that performs the whole job — it gathers, decides, acts, and evaluates its own work in a single thread of context. A multi-agent system splits that job across specialized agents with bounded scope, coordinated by an orchestration layer, so that the agent doing the work is not the only thing grading it. The deciding question is not "how many agents" but whether the parts of the job need to be separated so they can check each other — and whether the work decomposes into parts that can run independently at all.

Two distinctions worth being precise about:

Bounded scope is the real unit, not agent count. A second agent that shares the first one's context and goals is not a separate failure domain — it is the same loop with added latency. Multi-agent only buys you anything when each agent owns a distinct slice of the work: a role, a tool surface, a failure class that the others do not touch. Adding agents to appear more sophisticated, without that separation, is the over-engineering this page is designed to prevent.

The orchestration layer is a cost, not a capability. The moment you have more than one agent, you have to engineer coordination: routing, context-passing, conflict resolution, retry logic, time bounds, and a way to attribute failure to the right agent. Naming it as overhead — not headline feature — is what keeps an architecture honest. Orchestration is the mechanism that makes independent verification work; it is not the point in itself.

For a fuller treatment of the agentic AI category — including autonomous agents, human-in-the-loop patterns, and where multi-agent sits in the broader agentic stack — see the agentic AI overview. For the production engineering of multi-agent systems once you have decided on the architecture — bounded scope contracts, structured handoffs, and independent verifiers — see how we build reliable multi-agent systems in production.


When does a single agent win?

Single-agent is the right architecture — not a starter version to outgrow — when the job is bounded to one role and draws on one source, carries low stakes if it outputs incorrectly, and would add latency and operational surface area without a commensurate verification gain by splitting it up.

A single agent is the correct choice when all of the following hold:

The practitioner observation that follows: adding a second agent to a coupled, bounded job does not make it more reliable. It makes it more fragile and more expensive, because you have introduced a coordination problem where none existed. The default discipline, as Anthropic described it in their December 2024 framework, is "finding the simplest solution possible, and only increasing complexity when needed" (Anthropic, Building Effective Agents, December 19, 2024). Optimize the single agent first — better tools, better retrieval, better in-context structure — before reaching for a second one.

Common workloads that are correctly single-agent: document summarisation from a known source, drafting with human review, classification against a defined schema, extraction from a single structured input.


When does multi-agent win?

Multi-agent earns its overhead when work spans roles that should not grade each other — gathering, deciding, acting, and checking are steps that benefit from separation — and when the cost of a confidently wrong answer is high enough to justify the added components and latency.

Two conditions both need to hold:

The separation condition (the verification argument). The work spans roles that should not grade each other: a researcher, a drafter, a verifier, an executor. No agent should both perform a step and be the sole judge of it. The payoff of that separation is legible failure — when something goes wrong, the orchestrator can attribute the failure to a specific agent and a specific step, rather than producing one confident, fluent, wrong answer with no visible seam. The irreversibility of the downstream action matters here: when a wrong answer triggers a filed record, a sent communication, or a triggered workflow, independent verification is no longer optional.

The decomposition condition (the parallelism argument). The job has to break into parts that can run independently. Anthropic reported that its multi-agent research system outperformed a single-agent baseline on internal evaluations because research is breadth-first work — many independent sub-queries across many sources, with no shared write state for sub-agents to corrupt (How we built our multi-agent research system, Anthropic, June 13, 2025). The same architecture is a poor fit for a tightly-coupled planning task, where later steps depend on the full context of earlier ones.

Workloads where multi-agent commonly earns its keep: research that requires querying many independent sources; long-horizon tasks that exceed a single context window; pipelines where one agent's output becomes another agent's input and independent verification of each handoff is required; tasks that interface with multiple distinct tool surfaces or permission domains.

The broader production context: the difficulty of getting agentic systems into production makes the architecture decision consequential before any code is written. IDC's Lenovo CIO Playbook 2025 (February 2025) found that approximately 88 percent of AI proofs of concept do not reach widescale deployment. Choosing the right architecture is part of why that gap exists — over-engineered systems that cannot be debugged, debugged systems that should not have been multi-agent in the first place.

One important guard against a common error: multi-agent does not fix a retrieval failure. If a single agent is failing because the right passage never reaches it, adding agents multiplies that retrieval problem across more outputs. The architectural question and the retrieval quality question are separate.

For the measurement layer underneath this — how you confirm an agent is verifying correctly, what a confident-but-wrong rate looks like in practice — see the AI evaluation harness and the guide to evaluating an AI agent.


What does multi-agent add — and what does it cost?

Multi-agent adds independent verification and legible failure. It costs more components, higher latency, and an orchestration layer that must itself be engineered, monitored, and maintained.

The ledger:

What it adds: bounded failure domains (a mistake is attributable to one agent and contained); an independent check (a verifier that no working agent grades past); parallelism on genuinely independent sub-tasks; the ability to exceed a single context window.

What it costs: more components to build and operate; higher latency because steps are sequential or semi-sequential; materially more tokens. Anthropic measured this directly: agents use roughly four times the tokens of a chat interaction; multi-agent systems roughly 15 times, with token usage explaining approximately 80 percent of the variance in research-task performance (How we built our multi-agent research system, Anthropic, June 13, 2025). At a roughly 15 times cost multiple, the task must justify the investment.

The payoff framed correctly: what you are buying with multi-agent is legible failure. A multi-agent system that fails visibly and attributably is operationally superior to a single-agent system that fails silently, even if the absolute error rate is similar. The ability to see which step failed, attribute it, contain it, and correct it — without re-running the whole pipeline from scratch — is the production value. For the production engineering that delivers that legibility — bounded scope contracts, structured output handoffs, time bounds, and an independent verifier no agent grades past — see multi-agent orchestration in production.

One caution when reading any benchmark: part of what a multi-agent system "wins" can be the extra computation it is given, not the coordination itself. A second agent spends more tokens and more wall-clock time, and on a task that does not decompose, that spend buys little. The case for multi-agent should not rest on coordination as a mechanism in the abstract. It rests on what the added compute buys on the specific job — independent verification and legible failure on work that genuinely breaks apart.


How do you choose? The decision rule and the comparison table

Three questions that determine your architecture

1. Does the job decompose into parts that can run independently? If no — the work is one coupled thread of decisions that must share full context — stay single-agent. Splitting a coupled thread disperses context and invites the conflicting implicit decisions Cognition documents (June 12, 2025). If yes, continue.

2. Does the work span roles that should not grade each other, and is the cost of a confidently wrong answer rising? If no — one bounded, low-stakes job where a human reviews the output before any action — single-agent is correct. It is not the less mature choice; it is the right choice. If yes, continue.

3. Is the value of the task high enough to justify the overhead? More components, more latency, a roughly 15 times token multiple in the full multi-agent case, and an orchestration layer to build, monitor, and maintain. If yes — multi-agent earns its keep; engineer it for legible, attributable, contained failure. If no — a single well-scoped agent is the right call.

The one-line version: the question is never "how many agents" — it is "do the parts of this job need to be separated so they can check each other, and do they decompose enough to run apart at all." If not, one agent is the senior choice.

Single-agent vs multi-agent: the decision table

Dimension Single-agent Multi-agent
Best for One bounded job, one source, one coupled thread of decisions with a clear end state Work that decomposes into genuinely independent parts and spans roles that should not grade each other
Decomposition / parallelism Nothing to parallelize; coordination is pure overhead Independent sub-tasks run in parallel; can exceed a single context window; breadth-first research across many sources
Failure isolation One bad input contaminates the whole chain; the error has no seam to inspect Bounded scope — a failure is attributable to one agent and contained at that step
Verification Self-grades; the only check is the agent's own judgment An independent verifier — agent or deterministic code — reviews the output of each upstream step; no agent grades its own work
Dominant failure mode Confident-but-wrong output with no internal check; one bad step propagates with no seam to catch it Dispersed decisions and dropped context across agents — the coordination tax Cognition documents (Jun 12, 2025)
Cost / latency Lowest: one model loop, minimal infrastructure, lowest token spend More components, higher latency, roughly 15× tokens in the heavy case (Anthropic, Jun 13, 2025), plus an orchestration layer to engineer and operate
Operational legibility One loop — hard to attribute which step failed Each step is a discrete checkpoint; failures are visible, attributable, and containable
Orchestration overhead None Real: prompt contracts, handoff schemas, retry logic, time bounds, observability tooling — all must be engineered
When it is the WRONG choice When verification is required that the agent cannot credibly provide for itself; when steps should not grade each other; when a wrong answer triggers an irreversible downstream action When the task is bounded and low-stakes, or is one tightly-coupled thread that loses context and gains fragility when split; when latency is the binding constraint and the added cost cannot be justified

This table supersedes the shorter comparison in agentic AI systems, which answers "what is the difference." This table answers "which do I choose, and when is each the wrong call."

Note on the dominant-failure-mode row: the practical asymmetry is between an error that is caught and one that is not. Between independent agents with no central check, a wrong sub-output can pass downstream and compound; under an orchestrator that inspects and can reject sub-outputs before they propagate, the same error is contained at the step where it occurred. That containment is the reason to engineer the orchestration layer rather than simply add it — and the judgment call sits on top: is the task worth building that layer at all?

See how we approach judgment-not-complexity decisions for the method that informs this framework.


Is your job single-agent or multi-agent? A second opinion before you build.

The architecture decision is consequential before any code is written. Getting it wrong in either direction — over-building with multi-agent when one agent suffices, or under-building with single-agent when independent verification is required — costs time, budget, and trust in AI delivery.

A single well-scoped agent is not the beginner version of a multi-agent system. It is the correct version for a bounded job, and choosing it on purpose is the senior move. You add agents to separate doing from checking and to run independent work in parallel; you do not add them to signal that your AI work is advanced.

We are model-flexible by design — because the architecture decision should follow the workload, not the vendor. Production AI, proven, means committing to the simplest design the job warrants and then showing, through independent verification and evaluated reliability, that it works.

If you are choosing an agent architecture and want a senior second opinion before you commit to complexity, that judgment call is what an AI working session is for. Or see how we design and deliver AI systems into production in AI consulting.


Frequently asked questions

What is the difference between single-agent and multi-agent AI? A single-agent system gives one agent the entire job; it reasons, acts, and checks its own output within one context. A multi-agent system splits the job across specialized agents, each with bounded scope, coordinated by an orchestration layer, so that no agent both performs a step and independently verifies it. The core distinction is structural separation between action and verification.

When should I use a single agent instead of multiple agents? Use a single agent when the job is bounded to one role and one source, the work is one coupled thread of decisions that must share full context, a human reviews the output before any action is taken, and adding agents would increase complexity without adding a meaningful verification gain. Single-agent is not a less mature choice; it is the correct choice for a bounded, low-stakes task.

What does multi-agent AI actually add? Multi-agent adds independent verification — a separate agent reviewing work that another agent produced — and legible failure: when something goes wrong, the orchestrator can attribute it to a specific step and a specific agent. The cost is more components, higher latency, and an orchestration layer that requires engineering and maintenance. Anthropic's own measurement found multi-agent systems use roughly 15 times the tokens of a chat interaction (June 2025). The task must justify that multiple.

How many agents does a production AI system need? The right number is the minimum needed so that no agent is both performing a step and grading its own output on a consequential task. For many bounded business tasks, that is one. For tasks that span gather, decide, act, and check across high-stakes outputs with genuinely parallelizable sub-tasks, a small number of specialized agents with one orchestrator is a common pattern. More is not better; separation with a clear purpose is better.

What is agent orchestration? Agent orchestration is the coordination layer in a multi-agent system. It routes tasks to the appropriate agent, enforces output contracts between agents, manages retries and time bounds, and surfaces failures to a human operator or a downstream system. Orchestration is the mechanism that makes independent verification work in practice — not a feature in itself, but the infrastructure that makes multi-agent systems operationally legible. It is a cost that multi-agent systems carry by definition; whether the task justifies that cost is the architecture question.

Book an AI working session