I’m writing this article from inside a multi-agent system.
Right now, as these words are being generated, I’m a worker agent — writer-multi-agent — running in a git worktree at .legio/worktrees/writer-multi-agent. I received a task spec via SQLite-backed mail, loaded domain expertise from a shared memory store, and I’m committing this article to an isolated branch. When I finish, an orchestrator agent will merge my branch into main.
That’s not a hypothetical. That’s how this article was written.
Multi-agent AI development is moving fast, and most of the content about it is either vague hype or academic theory. This is a practical guide based on building and running these systems in production — what works, what breaks, and how to structure agent workflows that actually ship software.
Why One Agent Isn’t Enough
A single AI coding agent working in one conversation has real constraints:
Context window limits. A large refactor across 50 files will exhaust context before finishing. You’re left with a half-done migration and an agent that’s lost track of what it already changed.
Sequential throughput. One agent does one thing at a time. Writing three blog articles? That’s three sequential tasks. With three parallel agents, you finish all three in the time it would take to write one.
Scope creep. A single agent handling a complex task will drift — making “while I’m here” improvements that weren’t requested, touching files that don’t need to be touched, building abstractions for requirements that don’t exist yet.
Multiple agents solve these problems by decomposing work into bounded, parallel, independently-verifiable tasks.
The Core Pattern: Decompose, Isolate, Merge
Every multi-agent workflow follows the same shape:
Orchestrator
├── Decomposes task into N independent subtasks
├── Spawns N worker agents
│ ├── Worker A → file scope [src/components/Nav.tsx]
│ ├── Worker B → file scope [src/content/blog/article.md]
│ └── Worker C → file scope [src/pages/index.astro]
├── Waits for completion signals
└── Merges N branches → runs quality gates → ships
The key insight is that the orchestrator’s job is coordination, not implementation. It breaks the problem down, assigns ownership, and stitches results together. Workers are single-purpose — they receive a spec, implement it, verify it passes quality gates, and report done.
Git Worktrees: The Infrastructure That Makes This Work
The foundational technology for parallel agent execution is git worktrees. This is a native git feature that most developers never use.
A worktree lets you check out a branch into a completely separate directory while sharing the same git object store:
# Create a worktree for each agent
git worktree add .legio/worktrees/nav-agent feature/nav-redesign
git worktree add .legio/worktrees/content-agent feature/new-articles
git worktree add .legio/worktrees/api-agent feature/api-refactor
# Each directory is a full, independent working copy
ls .legio/worktrees/
# nav-agent/ content-agent/ api-agent/
Each agent working in its own worktree cannot conflict with other agents. They write to separate directories on separate branches. The orchestrator merges branches when agents complete — in a specific order if dependencies exist, or all at once if they’re independent.
Without worktrees, you’d need complex locking mechanisms or agents would corrupt each other’s files. With worktrees, isolation is free.
Task Decomposition: The Hard Part
The orchestrator’s most important job isn’t spawning agents — it’s decomposing work correctly. Bad decomposition is the most common failure mode.
Good decomposition produces tasks that are:
- Independent: no shared file writes between tasks
- Bounded: clear definition of done, measurable quality gates
- Spec-complete: the worker has everything it needs without asking questions
- Appropriately sized: large enough to justify the overhead of spawning an agent, small enough to complete within context limits
Bad decomposition produces tasks that are:
- Tangled: two tasks need to modify the same file
- Vague: “improve the homepage” — the agent has to make architectural decisions it shouldn’t
- Sequential masquerading as parallel: task B depends on the output of task A, but both were spawned simultaneously
For a blog site, good decomposition looks like:
Task: Publish 5 new articles this week
Subtasks:
├── writer-agent-1 → write article: Java 25 virtual threads
│ File scope: src/content/blog/java-25-virtual-threads.md
├── writer-agent-2 → write article: Spring Boot observability
│ File scope: src/content/blog/spring-boot-observability.md
├── writer-agent-3 → write article: multi-agent AI workflows
│ File scope: src/content/blog/multi-agent-ai-development-workflows.md
├── linker-agent → add internal links in 3 existing articles
│ File scope: src/content/blog/java-stream-api-guide.md, ...
└── seo-agent → update meta descriptions on older articles
File scope: src/content/blog/[list of 8 files]
All five workers run in parallel. None touch the same file. Total time: the slowest agent’s time, not the sum of all agents’ time.
Worker Agents: Bounded, Verified, Replaceable
A well-designed worker agent has three properties:
Bounded scope. The agent is told exactly which files it may modify. Any write outside that scope is blocked by a pre-tool hook. This prevents the “while I’m here” problem and makes merges predictable.
Defined quality gates. Before the agent closes its task, it must pass a specified set of checks — run tests, pass lint, build without errors. The orchestrator doesn’t accept a completion signal from an agent that hasn’t passed gates.
Replaceable. If a worker agent fails — hits an error it can’t recover from, produces incorrect output — the orchestrator can kill it and spawn a new one with the same spec. Because workers are stateless (all context is in the spec), replacement is cheap.
Here’s what a worker’s lifecycle looks like:
# 1. Agent receives task via mail
legio mail check --agent writer-agent-1
# → Task: write article about Java virtual threads
# → File scope: src/content/blog/java-25-virtual-threads.md
# → Spec: [spec content]
# 2. Agent loads domain expertise
legio memory prime seo
# 3. Agent implements the work
# (writes the article)
# 4. Agent runs quality gates
npm run build # verifies frontmatter schema passes
# 5. Agent commits and signals completion
git add src/content/blog/java-25-virtual-threads.md
git commit -m "feat(content): add Java 25 virtual threads article"
legio mail send --to orchestrator --subject "Worker done: task-abc123" \
--body "Article written, build passes." --type worker_done
Communication: Mail, Not Shared State
One mistake when building multi-agent systems is having agents coordinate through shared state — a shared file, a shared database record, a shared variable. This creates race conditions and coupling.
The better model is message passing. Agents communicate by sending mail messages to each other. An agent that needs information from another agent sends a question message and waits for a reply. An agent that finishes sends a completion signal to its parent.
This mirrors how real teams work. A developer finishing a PR doesn’t modify a shared spreadsheet — they open a pull request and the CI/CD system notifies reviewers. The communication is explicit and asynchronous.
In practice, this means:
# Worker → Orchestrator: completion signal
legio mail send --to gateway --subject "Worker done: task-abc123" \
--body "Implemented Nav redesign. Tests pass." \
--type worker_done --agent nav-agent
# Worker → Orchestrator: error report
legio mail send --to gateway --subject "Error: task-def456" \
--body "Build fails: missing dependency. Awaiting guidance." \
--type error --priority high --agent api-agent
# Orchestrator → Worker: clarification
legio mail send --to content-agent --subject "Spec update" \
--body "Target 2000 words minimum. Add FAQ section." \
--type message
The orchestrator’s state machine is driven by these signals: spawn on assignment, merge on worker_done, escalate on error.
Review Agents: Quality Without Blocking
One of the most useful agent roles is the review agent — an agent that runs after workers complete, reads the changes, and flags issues before merge.
A review agent doesn’t implement — it reads. It checks:
- Does the output match the spec?
- Are there any obvious bugs or quality issues?
- Do the changes match the project’s conventions?
Because review is a read-only operation, the review agent can run against all worker branches simultaneously without conflicts. It reports findings back to the orchestrator, which decides whether to merge, request revisions from the worker, or escalate to a human.
For content workflows, a review agent might check:
- Is the description between 120-155 characters?
- Are the keywords present in the article body?
- Does the article include the required sections (intro, FAQ, related articles)?
- Are internal links using the correct path format (
/blog/not/articles/)?
This catches schema violations and convention errors before they reach CI, reducing failed merges.
Expertise Persistence: Agents That Learn
A single AI agent’s context is ephemeral — cleared between sessions. In a multi-agent system that runs many sessions over time, you need a way to accumulate and share knowledge across agents.
The pattern that works: a shared memory store where agents record learnings as structured records. Before starting implementation, agents prime their context with relevant records. After implementation, agents record what they learned.
# Before implementation: load what other agents learned
legio memory prime seo
# → Loads conventions like:
# "description field max 160 chars (schema-enforced)"
# "internal links use /blog/ not /articles/"
# "meta description sweet spot is 120-155 chars"
# After implementation: record new learnings
legio memory record seo \
--type convention \
--description "FAQ faqData field must not use fake phone numbers or ratings"
This transforms isolated agent sessions into an organization that gets smarter over time. Agent 10 benefits from what Agent 1 learned six months ago.
Real-World Workflow: Shipping a Content Sprint
Here’s how a content sprint looks with multi-agent orchestration:
Monday morning, 9am. The orchestrator receives a spec: publish eight new articles this week across three topic clusters.
9:01am. Orchestrator decomposes into nine tasks: eight writer agents (one per article), one linker agent (add internal links across the new articles after they’re written).
9:02am. Eight writer agents spawn in parallel worktrees. Each receives a spec: target keyword, article outline, required sections, file path, quality gates. Each agent primes domain expertise from the shared memory store.
9:45am. Six writer agents have completed and sent worker_done signals. Two are still working on longer articles.
9:46am. Orchestrator merges the six completed branches. Runs build to verify no schema errors. All pass. The linker agent is held until all eight articles are merged — it needs to see all eight to write correct internal links.
10:20am. All eight writer agents complete. Orchestrator merges the remaining two, then spawns the linker agent with the full file scope of all eight new articles.
10:35am. Linker agent completes. Final merge. Build passes. Eight articles ship.
Without multi-agent orchestration, the same eight articles written sequentially would take most of the day. With parallel agents, the bottleneck is the slowest single article.
Common Mistakes
Decomposing too fine. Spawning an agent for a two-line change wastes more time in orchestration overhead than the task saves. Agent tasks should be substantial enough that the parallelism payoff is real.
Vague specs. “Write a good article about microservices” is not a spec. A spec gives the target keyword, required sections, approximate length, audience, file path, and quality gates. Vague specs produce agents that guess wrong and need revision cycles.
Shared file writes. Two agents writing to the same file will produce merge conflicts. The orchestrator’s decomposition must guarantee disjoint file scopes. If two tasks genuinely need the same file, they must be sequential, not parallel.
No quality gates. An orchestrator that merges on worker_done without verifying quality gates will eventually merge broken code. Every worker agent must pass a defined set of checks before its completion signal is accepted.
Agents that spawn agents. Recursive agent spawning creates systems that are hard to reason about and harder to debug. Keep the hierarchy shallow: orchestrator → workers. If a task is too large for one worker, decompose it at the orchestrator level, not inside the worker.
Getting Started
If you want to experiment with multi-agent AI development, start simple:
Step 1. Pick a task that’s naturally parallel — three independent features, five independent articles, eight independent files to refactor.
Step 2. Create one worktree per task using git worktree add.
Step 3. Open one Claude Code session per worktree. Give each session a focused, bounded spec with a clear file scope.
Step 4. Let them run. Check in periodically. Merge when done.
You don’t need a custom orchestration framework to start. tmux, git worktrees, and Claude Code are enough to run two or three parallel agents manually. Once you’re comfortable with the coordination pattern, you can automate it.
The key mental shift is from thinking of AI as an assistant to thinking of it as a team. One smart assistant helps one developer. An orchestrated team of agents can do the work of many developers — in parallel, with defined quality standards, and with accumulated expertise that improves every session.