Apr 21, 2026

When Should You Use Subagents vs Single-Turn in Claude Code?

Glowing blue neural network nodes representing AI agent coordination across parallel agentic workflows

Most Claude Code users make this choice on instinct. That works until it doesn’t — until a single-turn session bloats to 200k tokens on a task that didn’t need it, or until you wait minutes for sequential execution when parallel subagents would’ve finished in seconds. According to Anthropic’s 2026 Agentic Coding Trends Report, developers use AI in roughly 60% of their work but fully delegate only 0–20% of tasks. The gap isn’t capability — it’s knowing which pattern fits which job.

This guide gives you the decision criteria. You’ll learn what separates single-turn from subagent execution, when each wins on cost and speed, and what four real patterns look like in practice.

Key Takeaways

Single-turn wins when tasks are bounded, sequential, and fit comfortably in one context window.

Subagents win when tasks are independent, parallelizable, or need isolated context to avoid cross-contamination.

Output tokens cost 3–5× more than input tokens — subagent isolation prevents full-context inflation (CostGoat, 2026).

A 2026 arXiv study found single-agent LLMs match or outperform multi-agent systems on reasoning tasks when token budgets are equal (arXiv 2604.02460) — parallelism isn’t always the answer.

Claude Code autonomous session duration nearly doubled from under 25 min to over 45 min between Oct 2025 and Jan 2026 (Anthropic Research).

What’s the Difference Between Single-Turn, Multi-Turn, and Subagents?

Before picking a pattern, you need precise definitions. These three terms get conflated constantly, and the confusion leads to over-engineered sessions.

Single-turn means one prompt, one response, done. Claude reads your request, uses whatever tools it needs, and returns a result. No conversation history carries over. This is the right default for most isolated tasks.

Multi-turn means an ongoing conversation where Claude accumulates context across multiple exchanges. You’re building state together — iterating on a design, debugging interactively, refining a draft. The same context window grows with each turn.

Subagents are separate Claude instances that your primary session spawns using the Agent tool. Each subagent starts with a blank context window and runs independently — in parallel with other subagents, or sequentially when you need ordered results. The orchestrator passes them a prompt and waits for a result; the subagent’s full transcript never pollutes the parent’s context.

The cost implication is immediate: subagents contain their token usage. A subagent that reads 50 files and writes a report contributes only its summary output to the parent — not all 50 files.

When Does Single-Turn Win?

Single-turn is the right call in more situations than most developers assume. Research published on arXiv in April 2026 tested single-agent LLMs against five multi-agent architectures across three model families (Qwen3, DeepSeek, Gemini) and found that single agents consistently match or outperform multi-agent systems on multi-hop reasoning tasks when reasoning token budgets are equalized. Adding agents doesn’t add intelligence — it adds coordination overhead.

Use single-turn when:

The task is bounded. “Fix this bug in auth.ts” or “write a migration for this schema change” has a clear start and end. One context window handles it cleanly.
Context needs to stay connected. Refactoring a module where changes in one function affect another requires seeing everything together. Splitting across agents introduces coordination errors.
Speed matters more than parallelism. Spawning subagents has latency. If the task finishes in 30 seconds single-turn, don’t add orchestration overhead for nothing.
The task is sequential by nature. “Read this error, find the cause, fix it, run tests.” Each step depends on the previous one. Parallelism can’t help here — it can only hurt.

Our finding: The arXiv study found multi-agent systems underperformed even on tasks designed to benefit from decomposition, unless agents had strict independent subtask boundaries. Coordination overhead erodes specialization gains faster than most expect. This aligns with Claude Code’s own documentation guidance — subagents exist for isolation, not intelligence amplification.

Artist's illustration of AI neural network input and output pathways, visualising information flow through parallel neural connections — created by Rose Pilkington for Google DeepMind's Visualising AI project

When Do Subagents Win?

Subagents earn their keep in three scenarios: parallelism, isolation, and scale.

According to Anthropic’s research on agent autonomy, autonomous Claude Code session duration nearly doubled from under 25 minutes to over 45 minutes between October 2025 and January 2026 — driven largely by users delegating longer, more complex workflows. That growth maps directly to more effective subagent use.

Parallelism is the most obvious win. If you need to run tests on four modules simultaneously, spawn four subagents. If you need to research three competitor APIs, research them in parallel. Wall-clock time drops from n × task_time to roughly max(task_time).

Isolation matters when tasks would contaminate each other’s context. Running a security audit while writing new features in the same session means the audit’s findings bleed into the feature code’s context — Claude may start treating discovered vulnerabilities as design constraints. Isolated subagents don’t have this problem.

Scale is the breaking point for single-turn. A 300-file codebase can’t be meaningfully reviewed in one context window at acceptable quality. Subagents can own subsystems — one reviews auth, one reviews API handlers, one reviews the data layer — and the orchestrator synthesizes their reports.

Use subagents when:

Tasks are independent (no shared state required between them)
Tasks can run in parallel to reduce wall-clock time
Tasks are large enough to fill or approach a context window alone
You need context isolation to prevent cross-contamination
You’re orchestrating work across multiple tools or domains

The key test: Can I write a complete, self-contained prompt for this task without referencing results from another concurrent task? If yes, it’s a subagent candidate.

What Does the Token Cost Actually Look Like?

This is where the decision gets concrete. Claude Sonnet costs $3 per million input tokens and $15 per million output tokens (CostGoat, April 2026). Output tokens cost 5× more than input — a ratio that punishes bloated context windows.

Source: Anthropic pricing / CostGoat, April 2026. Single-turn cost scales with context utilisation; subagent cost stays bounded to each task’s own context window.

Subagent costs stay flat because each subagent only pays for its own task’s context. A single-turn session accumulating tool call results, file reads, and prior exchanges across 100k tokens pays for all of it on every generation. Subagents don’t inherit that overhead.

According to the Stack Overflow 2025 Developer Survey (65,000+ respondents), only 31% of developers currently use AI agents — and among those who do, 69% report productivity gains. Unmanaged context costs are a real part of what keeps adoption low.

Developer typing code on a Macbook Pro, representing a software engineer building agentic workflows with Claude Code

The Decision Framework

Run through this before choosing:

A “No” on Q1, Q2, or Q3 exits to single-turn. Four “Yes” answers mean subagents are the right call.

If you answer “yes” all the way through, subagents are the right call. Any “no” before that point means single-turn handles it — and likely handles it cheaper.

Four Real Workflow Patterns

These four patterns show the decision criteria in practice, drawn from Anthropic’s published workflow guidance and common Claude Code session structures.

Pattern 1: Bug Fix (Single-Turn)

Task: Fix a null pointer exception in the payment service.

Why single-turn: The bug is contained. Claude reads the file, traces the call stack, finds the null case, and writes a fix. That’s 2–4 file reads and one edit. Context stays small and sequential — no benefit to isolation or parallelism.

→ Read error + stack trace
→ Read source files
→ Identify root cause
→ Write fix + run tests

Pattern 2: Multi-Module Security Audit (Parallel Subagents)

Task: Audit auth, API, and data layers for OWASP Top 10 vulnerabilities.

Why subagents: Three independent domains, none dependent on the others mid-audit. Parallel execution cuts time from ~45 minutes to ~15 minutes. Context isolation prevents the auth findings from influencing the data layer’s assessment.

Orchestrator
  ├── Subagent A: Audit auth module → report
  ├── Subagent B: Audit API handlers → report
  └── Subagent C: Audit data layer → report
  → Synthesise three reports

Pattern 3: Research + Implement (Mixed)

Task: Research three competing implementations of a feature, then implement the best approach.

Why mixed: Research = three parallel subagents. Implementation = single-turn with the synthesised research as context. Don’t use subagents for implementation — it needs the full research in one place.

Phase 1 (parallel subagents)
  ├── Subagent A: Research approach X
  ├── Subagent B: Research approach Y
  └── Subagent C: Research approach Z
  → Synthesise

Phase 2 (single-turn)
  → Implement best approach

Pattern 4: Large Codebase Refactor (Sequential + Parallel Subagents)

Task: Migrate a 200-file codebase from callbacks to async/await.

Why subagents: No context window handles 200 files at quality. Subagents own subsystems. Some run in parallel (unrelated modules); shared utilities run first because other modules depend on them.

Orchestrator
  Phase 1 (sequential): Subagent → refactor shared utilities
  Phase 2 (parallel)
    ├── Subagent → refactor module group A
    ├── Subagent → refactor module group B
    └── Subagent → refactor module group C
  Phase 3 (single-turn): Integration + final tests

Artistic deep learning neural network visualization showing branching parallel pathways, part of Google DeepMind's Visualising AI series — created by Novoto Studio

Frequently Asked Questions

How many subagents can Claude Code run in parallel?

There’s no hard Claude Code limit, but practical ceilings apply: API rate limits, local resource constraints, and the orchestrator’s context window (which receives all results). Most workflows run 3–6 parallel subagents effectively. Beyond that, orchestrator synthesis becomes the bottleneck, not the subagents themselves.

Does using subagents always cost more?

Not necessarily. For large tasks, subagents can cost less because each isolated context window avoids accumulating the parent session’s full history. The savings grow as your primary session gets longer. For small tasks — single-file edits, quick lookups — subagent overhead makes them more expensive than single-turn. Gartner predicts 40% of enterprise apps will include task-specific AI agents by 2026, up from less than 5% in 2025 — that task specificity is exactly the isolation benefit subagents provide.

Can subagents spawn their own subagents?

Yes. Claude Code supports nested agent hierarchies. An orchestrator spawns subagents; those subagents can themselves spawn sub-subagents for finer-grained parallelism. Depth beyond 2–3 levels usually adds more coordination complexity than it saves — but it’s supported and sometimes the right call for very large codebases.

When does multi-turn beat both?

Multi-turn wins when the task is exploratory and iterative — you don’t know the full shape of the work upfront. Debugging an intermittent race condition, designing an API incrementally, working through a novel problem where each step reveals the next. The accumulated context is the value. Subagents can’t help because there’s no decomposable independent subtask yet.

What’s the biggest subagent mistake?

Using them for tasks that look independent but aren’t. If Subagent A needs Subagent B’s output before it can finish, they’re not independent — they’re sequential with extra steps. The coordination overhead collapses the parallelism benefit and introduces error propagation. For more on orchestration, review checkpoints, and when multi-step agent workflows misfire, see What Is Agentic Development?.

The Right Tool, Not the Clever One

Most tasks don’t need subagents. That’s the most useful thing to take away. According to Anthropic’s 2026 Agentic Coding Trends Report, only 27% of AI-assisted developer work consists of tasks that “wouldn’t have been done otherwise” — the genuine agentic unlock. The other 73% is enhancement of existing work, where a well-prompted single-turn session often outperforms an over-engineered multi-agent pipeline.

Start single-turn. Add subagents when you hit a concrete reason: parallelism, isolation, or scale. The framework above gives you the test. The four patterns show you what each choice looks like in practice.

When Should You Use Subagents vs Single-Turn in Claude Code?

Related reading

Claude MCP Servers, Agents, and Skills Explained

What Is Agentic Development?

What Are Claude Code Agents?