What is multi-agent AI development?

Multi-agent AI development means running multiple AI coding agents simultaneously, each working on a scoped task in an isolated environment. Instead of one AI assistant handling everything sequentially, you decompose work into parallel streams — a writer agent, a reviewer agent, a tester agent — coordinated by an orchestrator. Each agent has a defined file scope, receives its task via a spec, commits to its own git branch, and reports completion. The orchestrator merges results when quality gates pass.

When should I use multiple AI agents instead of one?

Use multiple agents when tasks are genuinely independent and can run in parallel — writing three different blog articles, implementing a feature while a second agent writes tests for another feature, or running a migration across many files where the work can be split cleanly. A single agent is faster and simpler for tasks that are naturally sequential or where each step depends on the previous output. The overhead of orchestration is only worth it when parallelism gives you real throughput gains.

How do git worktrees enable parallel AI agents?

Git worktrees let you check out multiple branches of the same repository simultaneously in different directories. Each AI agent gets its own worktree — its own copy of the code on its own branch — so agents never conflict on file writes. When all agents finish, an orchestrator merges the branches back. Without worktrees, parallel agents would need complicated locking or would corrupt each other's work. With worktrees, each agent operates in complete isolation.

What is the difference between an orchestrator agent and a worker agent?

An orchestrator agent plans and coordinates — it decomposes a large task into subtasks, spawns worker agents via tmux sessions or subprocesses, assigns each worker a spec and file scope, monitors their progress via mail or signals, and runs the merge pipeline when workers complete. A worker agent is a leaf node — it receives a spec, implements the changes in its file scope, runs quality gates, commits, and sends a completion signal. Workers never spawn other workers.

How do you prevent AI agents from stepping on each other?

Three mechanisms work together: file scope (each agent is given a restricted list of files it may modify), worktree isolation (each agent works in a separate directory on a separate branch), and quality gates before merge (the orchestrator runs tests after collecting all branches). File scope is enforced via pre-tool hooks that block writes outside the assigned files. This means two agents can safely work on different features in the same codebase simultaneously without coordination beyond the initial task assignment.

Multi-Agent AI Development: How to Orchestrate AI Coding Agents That Actually Ship

I’m writing this article from inside a multi-agent system.

Right now, as these words are being generated, I’m a worker agent — writer-multi-agent — running in a git worktree at .legio/worktrees/writer-multi-agent. I received a task spec via SQLite-backed mail, loaded domain expertise from a shared memory store, and I’m committing this article to an isolated branch. When I finish, an orchestrator agent will merge my branch into main.

That’s not a hypothetical. That’s how this article was written.

Multi-agent AI development is moving fast, and most of the content about it is either vague hype or academic theory. This is a practical guide based on building and running these systems in production — what works, what breaks, and how to structure agent workflows that actually ship software.

Why One Agent Isn’t Enough

A single AI coding agent working in one conversation has real constraints:

Context window limits. A large refactor across 50 files will exhaust context before finishing. You’re left with a half-done migration and an agent that’s lost track of what it already changed.

Sequential throughput. One agent does one thing at a time. Writing three blog articles? That’s three sequential tasks. With three parallel agents, you finish all three in the time it would take to write one.

Scope creep. A single agent handling a complex task will drift — making “while I’m here” improvements that weren’t requested, touching files that don’t need to be touched, building abstractions for requirements that don’t exist yet.

Multiple agents solve these problems by decomposing work into bounded, parallel, independently-verifiable tasks.

The Core Pattern: Decompose, Isolate, Merge

Every multi-agent workflow follows the same shape:

Orchestrator
├── Decomposes task into N independent subtasks
├── Spawns N worker agents
│   ├── Worker A → file scope [src/components/Nav.tsx]
│   ├── Worker B → file scope [src/content/blog/article.md]
│   └── Worker C → file scope [src/pages/index.astro]
├── Waits for completion signals
└── Merges N branches → runs quality gates → ships

The key insight is that the orchestrator’s job is coordination, not implementation. It breaks the problem down, assigns ownership, and stitches results together. Workers are single-purpose — they receive a spec, implement it, verify it passes quality gates, and report done.

Git Worktrees: The Infrastructure That Makes This Work

The foundational technology for parallel agent execution is git worktrees. This is a native git feature that most developers never use.

A worktree lets you check out a branch into a completely separate directory while sharing the same git object store:

# Create a worktree for each agent
git worktree add .legio/worktrees/nav-agent feature/nav-redesign
git worktree add .legio/worktrees/content-agent feature/new-articles
git worktree add .legio/worktrees/api-agent feature/api-refactor

# Each directory is a full, independent working copy
ls .legio/worktrees/
# nav-agent/    content-agent/    api-agent/

Each agent working in its own worktree cannot conflict with other agents. They write to separate directories on separate branches. The orchestrator merges branches when agents complete — in a specific order if dependencies exist, or all at once if they’re independent.

Without worktrees, you’d need complex locking mechanisms or agents would corrupt each other’s files. With worktrees, isolation is free.

Task Decomposition: The Hard Part

The orchestrator’s most important job isn’t spawning agents — it’s decomposing work correctly. Bad decomposition is the most common failure mode.

Good decomposition produces tasks that are:

Independent: no shared file writes between tasks
Bounded: clear definition of done, measurable quality gates
Spec-complete: the worker has everything it needs without asking questions
Appropriately sized: large enough to justify the overhead of spawning an agent, small enough to complete within context limits

Bad decomposition produces tasks that are:

Tangled: two tasks need to modify the same file
Vague: “improve the homepage” — the agent has to make architectural decisions it shouldn’t
Sequential masquerading as parallel: task B depends on the output of task A, but both were spawned simultaneously

For a blog site, good decomposition looks like:

Task: Publish 5 new articles this week

Subtasks:
├── writer-agent-1 → write article: Java 25 virtual threads
│   File scope: src/content/blog/java-25-virtual-threads.md
├── writer-agent-2 → write article: Spring Boot observability
│   File scope: src/content/blog/spring-boot-observability.md
├── writer-agent-3 → write article: multi-agent AI workflows
│   File scope: src/content/blog/multi-agent-ai-development-workflows.md
├── linker-agent → add internal links in 3 existing articles
│   File scope: src/content/blog/java-stream-api-guide.md, ...
└── seo-agent → update meta descriptions on older articles
    File scope: src/content/blog/[list of 8 files]

All five workers run in parallel. None touch the same file. Total time: the slowest agent’s time, not the sum of all agents’ time.

Worker Agents: Bounded, Verified, Replaceable

A well-designed worker agent has three properties:

Bounded scope. The agent is told exactly which files it may modify. Any write outside that scope is blocked by a pre-tool hook. This prevents the “while I’m here” problem and makes merges predictable.

Defined quality gates. Before the agent closes its task, it must pass a specified set of checks — run tests, pass lint, build without errors. The orchestrator doesn’t accept a completion signal from an agent that hasn’t passed gates.

Replaceable. If a worker agent fails — hits an error it can’t recover from, produces incorrect output — the orchestrator can kill it and spawn a new one with the same spec. Because workers are stateless (all context is in the spec), replacement is cheap.

Here’s what a worker’s lifecycle looks like:

# 1. Agent receives task via mail
legio mail check --agent writer-agent-1
# → Task: write article about Java virtual threads
# → File scope: src/content/blog/java-25-virtual-threads.md
# → Spec: [spec content]

# 2. Agent loads domain expertise
legio memory prime seo

# 3. Agent implements the work
# (writes the article)

# 4. Agent runs quality gates
npm run build  # verifies frontmatter schema passes

# 5. Agent commits and signals completion
git add src/content/blog/java-25-virtual-threads.md
git commit -m "feat(content): add Java 25 virtual threads article"

legio mail send --to orchestrator --subject "Worker done: task-abc123" \
  --body "Article written, build passes." --type worker_done

Communication: Mail, Not Shared State

One mistake when building multi-agent systems is having agents coordinate through shared state — a shared file, a shared database record, a shared variable. This creates race conditions and coupling.

The better model is message passing. Agents communicate by sending mail messages to each other. An agent that needs information from another agent sends a question message and waits for a reply. An agent that finishes sends a completion signal to its parent.

This mirrors how real teams work. A developer finishing a PR doesn’t modify a shared spreadsheet — they open a pull request and the CI/CD system notifies reviewers. The communication is explicit and asynchronous.

In practice, this means:

# Worker → Orchestrator: completion signal
legio mail send --to gateway --subject "Worker done: task-abc123" \
  --body "Implemented Nav redesign. Tests pass." \
  --type worker_done --agent nav-agent

# Worker → Orchestrator: error report
legio mail send --to gateway --subject "Error: task-def456" \
  --body "Build fails: missing dependency. Awaiting guidance." \
  --type error --priority high --agent api-agent

# Orchestrator → Worker: clarification
legio mail send --to content-agent --subject "Spec update" \
  --body "Target 2000 words minimum. Add FAQ section." \
  --type message

The orchestrator’s state machine is driven by these signals: spawn on assignment, merge on worker_done, escalate on error.

Review Agents: Quality Without Blocking

One of the most useful agent roles is the review agent — an agent that runs after workers complete, reads the changes, and flags issues before merge.

A review agent doesn’t implement — it reads. It checks:

Does the output match the spec?
Are there any obvious bugs or quality issues?
Do the changes match the project’s conventions?

Because review is a read-only operation, the review agent can run against all worker branches simultaneously without conflicts. It reports findings back to the orchestrator, which decides whether to merge, request revisions from the worker, or escalate to a human.

For content workflows, a review agent might check:

Is the description between 120-155 characters?
Are the keywords present in the article body?
Does the article include the required sections (intro, FAQ, related articles)?
Are internal links using the correct path format (/blog/ not /articles/)?

This catches schema violations and convention errors before they reach CI, reducing failed merges.

Expertise Persistence: Agents That Learn

A single AI agent’s context is ephemeral — cleared between sessions. In a multi-agent system that runs many sessions over time, you need a way to accumulate and share knowledge across agents.

The pattern that works: a shared memory store where agents record learnings as structured records. Before starting implementation, agents prime their context with relevant records. After implementation, agents record what they learned.

# Before implementation: load what other agents learned
legio memory prime seo
# → Loads conventions like:
#   "description field max 160 chars (schema-enforced)"
#   "internal links use /blog/ not /articles/"
#   "meta description sweet spot is 120-155 chars"

# After implementation: record new learnings
legio memory record seo \
  --type convention \
  --description "FAQ faqData field must not use fake phone numbers or ratings"

This transforms isolated agent sessions into an organization that gets smarter over time. Agent 10 benefits from what Agent 1 learned six months ago.

Real-World Workflow: Shipping a Content Sprint

Here’s how a content sprint looks with multi-agent orchestration:

Monday morning, 9am. The orchestrator receives a spec: publish eight new articles this week across three topic clusters.

9:01am. Orchestrator decomposes into nine tasks: eight writer agents (one per article), one linker agent (add internal links across the new articles after they’re written).

9:02am. Eight writer agents spawn in parallel worktrees. Each receives a spec: target keyword, article outline, required sections, file path, quality gates. Each agent primes domain expertise from the shared memory store.

9:45am. Six writer agents have completed and sent worker_done signals. Two are still working on longer articles.

9:46am. Orchestrator merges the six completed branches. Runs build to verify no schema errors. All pass. The linker agent is held until all eight articles are merged — it needs to see all eight to write correct internal links.

10:20am. All eight writer agents complete. Orchestrator merges the remaining two, then spawns the linker agent with the full file scope of all eight new articles.

10:35am. Linker agent completes. Final merge. Build passes. Eight articles ship.

Without multi-agent orchestration, the same eight articles written sequentially would take most of the day. With parallel agents, the bottleneck is the slowest single article.

Common Mistakes

Decomposing too fine. Spawning an agent for a two-line change wastes more time in orchestration overhead than the task saves. Agent tasks should be substantial enough that the parallelism payoff is real.

Vague specs. “Write a good article about microservices” is not a spec. A spec gives the target keyword, required sections, approximate length, audience, file path, and quality gates. Vague specs produce agents that guess wrong and need revision cycles.

Shared file writes. Two agents writing to the same file will produce merge conflicts. The orchestrator’s decomposition must guarantee disjoint file scopes. If two tasks genuinely need the same file, they must be sequential, not parallel.

No quality gates. An orchestrator that merges on worker_done without verifying quality gates will eventually merge broken code. Every worker agent must pass a defined set of checks before its completion signal is accepted.

Agents that spawn agents. Recursive agent spawning creates systems that are hard to reason about and harder to debug. Keep the hierarchy shallow: orchestrator → workers. If a task is too large for one worker, decompose it at the orchestrator level, not inside the worker.

Getting Started

If you want to experiment with multi-agent AI development, start simple:

Step 1. Pick a task that’s naturally parallel — three independent features, five independent articles, eight independent files to refactor.

Step 2. Create one worktree per task using git worktree add.

Step 3. Open one Claude Code session per worktree. Give each session a focused, bounded spec with a clear file scope.

Step 4. Let them run. Check in periodically. Merge when done.

You don’t need a custom orchestration framework to start. tmux, git worktrees, and Claude Code are enough to run two or three parallel agents manually. Once you’re comfortable with the coordination pattern, you can automate it.

The key mental shift is from thinking of AI as an assistant to thinking of it as a team. One smart assistant helps one developer. An orchestrated team of agents can do the work of many developers — in parallel, with defined quality standards, and with accumulated expertise that improves every session.

Java Modernization Readiness Assessment

15 questions your team should answer before starting a migration. Takes 10 minutes. Could save you months.