ArticlesFoundations

AI agent teams, explained: lead, bench, specialist

When one agent isn't enough you reach for a team — but more agents on one problem makes things worse. The roles that work.

Stuart LeoJune 8, 20265 min read

The moment a project gets big, the instinct kicks in: if one AI agent is good, five must be five times better. Spin up a swarm, point them at the codebase, and watch it fly.

It doesn't fly. It collides. More agents on one problem is one of the most reliable ways to make agentic development worse. But agent teams are real and useful — when you get the roles and the rhythm right. Here's the shape that works.

Why people reach for more agents

The appeal is obvious. A single agent is a bottleneck — it does one thing at a time. Throw more agents at the work and surely it parallelises, the way adding people to a team adds throughput.

The analogy breaks because agents don't coordinate like people. They don't have a quiet word at the coffee machine. Point several at the same files and they overwrite each other, contradict each other, and leave you a tangle nobody can reconstruct. The instinct is right that one agent is limited. The fix isn't a swarm.

The multi-agent anti-pattern

Here's the trap, stated plainly: running multiple agents simultaneously on the same surface burns cost, produces contradictory output, and makes session briefs impossible to write. It's one of the clearest anti-patterns in agentic development, and the research on what makes multi-agent coding actually work keeps landing on the same point — coordination, not raw parallelism, is the hard part.

Stuart Leo

Five agents on one problem isn't five times the output. It's five times the conflict, and no clean way to record what happened.

Lead, bench, specialist

A team that works has clear, distinct roles:

  • The Lead Agent — one agent, one surface. It reads the contextbase, writes the code, manages git, extracts knowledge, and writes the session brief. One lead per codebase surface. This is the agent doing the building.
  • Bench Agents — independent review on high-stakes calls. They review, they never execute. Invoked for a PRD, a security question, an architecture decision — and budget-capped. Their entire value is independence: a fresh perspective the lead doesn't have.
  • Specialist Agents — scoped subagents for repeatable tasks. A tightly defined role with specific tools and a documented protocol: QA review, release-note authoring, code review against a standard.

Each role is a different job, not a clone of the same job. That's why it works where a swarm doesn't.

Sequential, not parallel

The key insight people miss: a good agent team is a sequential practice, not a parallel one. The rhythm is lead executes → bench reviews → lead acts on the review. One after another, each step clean and recordable.

That sequence is what keeps the work coherent and lets you write a session brief at the end — because there's a single, followable story of what happened. A parallel swarm has no such story, which is exactly why it can't be captured or trusted.

(Genuinely parallel work is possible — but only when the tasks are independent, each agent in its own isolated git worktree. That's different from a crowd on one problem.)

The shared contextbase that holds it together

What coordinates an agent team isn't agents talking to each other. It's a shared contextbase — the version-controlled briefs, decisions and gotchas every agent reads before acting. The lead writes to it. The bench reviews against it. The specialists read it for their scoped job. One source of truth, on disk, that every agent and every human shares.

That's the difference between a team and a mob: a team reads from the same page.

Multi-agent is a sequential practice — lead executes, bench reviews, lead acts — not a swarm on one problem.

Start here: see multi-agent vs single agent, parallel agents with git worktrees, or read the method.

FAQ

What are the roles in an AI agent team?
Three that earn their place: the lead agent (one per codebase surface — reads context, writes code, commits, captures knowledge), bench agents (independent reviewers for high-stakes calls — they review, never execute), and specialist agents (scoped subagents for repeatable tasks like QA or release notes). Lead executes, bench reviews, specialists handle defined jobs.
Is running multiple agents at once a good idea?
Not on the same problem. Multiple agents on one surface produce contradictory output, fight over the same files, and make session memory impossible to write. Multi-agent works as a sequential practice — lead executes, then a bench agent reviews, then the lead acts — not as a parallel swarm.
How do agent teams stay coordinated?
Through a shared contextbase, not constant chatter. Every agent reads the same version-controlled briefs, decisions and gotchas, so they work from one source of truth. Coordination lives in the written context, not in agents talking to each other.