Multi-Model AI Agent Orchestration — Claude Code + Codex + Gemini with Shared Memory

Claude Code already has agents. You can spawn subagents inside a session. They do research, run commands, read files, and report back. It works.

But those agents are all Claude.

Same model. Same subscription. Same context budget. Every agent you spawn eats from the same pool of tokens. If you need five agents researching five different parts of a codebase, that is five parallel draws on the same Claude plan. That adds up fast for heavy sessions.

What if the agents did not have to be Claude?

The Idea

Use Claude Code as the orchestrator. The brain that plans, delegates, and synthesizes. But let it spawn agents that run on external models — Codex, Gemini, local Ollama models, whatever is available and cheap for the task.

Those agents do not need to be brilliant. They need to be thorough. Scan this directory. Read these files. Search for this pattern. Summarize this module. Find every place this function is called. That is grunt work. It does not require the most expensive model in the room.

The critical part is how findings come back.

Instead of piping raw text through fragile multi-model communication, each agent stores its findings directly in SynaBun memory using the remember tool. It gets back a memory ID. That ID is all it sends to Claude.

Claude's planning phase does not need to parse another model's stream. It calls recall with the IDs, gets structured findings from the memory system, and builds its plan from indexed, searchable, persistent context.

The memory system becomes the bus.

Why This Architecture Works

The usual problem with multi-model workflows is the handoff. Model A produces output. You need to get that output into Model B. So you pipe text, parse JSON, build wrapper APIs, or shove everything into a shared file. Every approach is fragile in its own special way.

SynaBun memory sidesteps that because it is already designed to be the shared context layer across different AI sessions.

When an external agent calls remember, it is not just dumping text. It is creating a categorized, tagged, importance-ranked, semantically searchable memory entry with a stable UUID. That entry lives in SynaBun's vector store. It survives the session. It can be recalled by content, by tag, by project, by category. It is not a temporary artifact. It is indexed knowledge.

So the handoff becomes:

Claude identifies what it needs researched
Claude spawns an external agent (Codex, Gemini, local model) with a clear task
The agent does its work using whatever tools it has access to
The agent calls remember with its findings — structured content, relevant tags, related files, appropriate importance level
The agent returns the memory ID to Claude
Claude calls recall or reads the memory directly by ID
Claude's planning phase uses real indexed findings instead of raw dumps

No intermediate parsing. No text format negotiation. No "here is my output as a giant string, good luck." Just a UUID pointing to structured memory.

What the External Agent Needs

Not much. That is the point.

The external agent needs access to the SynaBun MCP server. Specifically, it needs the remember tool. If it also gets recall, it can check what previous agents already found before duplicating work. But the minimum viable surface is just remember.

The agent does not need to understand SynaBun's full tool surface. It does not need the browser, the whiteboard, the loop system, or the Discord tools. It needs to do its research task and store the result in one structured call.

That makes the integration lightweight. Any model that can use MCP tools — or any wrapper that translates tool calls to MCP — can participate.

Codex already supports MCP. Local models through OpenCode already connect to SynaBun MCP. The plumbing exists.

What Claude Gets Back

This is the part that changes the planning phase.

Today, when Claude Code spawns a subagent, the agent's entire output comes back as a text blob in the conversation. Claude has to parse it, extract the useful parts, and hold everything in context while planning. If you spawn five agents, you get five blobs competing for context window space.

With memory-indexed agents, Claude gets back five UUIDs. Tiny. Cheap. No context pressure.

When Claude enters the planning phase, it pulls exactly what it needs:

recall with a specific memory ID to get one agent's full findings
recall with a semantic query to search across all agent findings
recall with a tag filter to get findings about a specific subsystem
recall with a project filter to get everything relevant to the current work

The planning agent is not drowning in raw output. It is querying an indexed knowledge base that was built in parallel by cheaper models doing focused research.

That is a fundamentally different shape of planning.

The Economics

This is not just about architecture elegance. It is about cost.

Claude Opus is expensive. It is worth it for planning, synthesis, architectural reasoning, and the final implementation decisions. It is arguably overkill for "read every file in this directory and tell me what each one exports."

If you can offload research and indexing to cheaper models — Codex for code understanding, local models for file scanning, lighter cloud models for pattern matching — and route only the structured findings back to Claude for the expensive planning phase, the per-session cost changes meaningfully.

You are paying Opus prices for Opus-level work: synthesis, judgment, planning. You are paying commodity prices for commodity work: reading, scanning, summarizing, cataloging.

The memory system is the layer that makes this split possible without losing coherence.

What This Looks Like in Practice

Imagine you are starting a major refactor. You need to understand the current state of five subsystems before you can plan the changes.

Without this pattern: Claude reads all five subsystems sequentially, burning expensive tokens on file reads and summaries, compressing context as it goes, potentially losing details from the first subsystem by the time it finishes the fifth.

With this pattern:

Claude looks at the task and identifies five research questions
Claude spawns five agents — maybe Codex for the backend modules, a local model for the config files, Gemini for the API surface
Each agent researches its subsystem, calls remember with structured findings (what the module does, its dependencies, its public API, known issues, relevant patterns)
Each agent returns one memory ID
Claude enters planning with five memory IDs
Claude recalls each one, gets clean structured findings, and builds a refactor plan from complete indexed context

The research happened in parallel on cheaper models. The planning happens on Claude with full access to indexed, searchable results. No context overflow. No lost details. No re-reading files Claude already paid to scan.

The Honest Gaps

This does not exist yet as a turnkey feature. The pieces are in place but the orchestration layer needs building.

Current gaps:

Claude Code does not natively spawn non-Claude agents. The subagent system today is Claude-only. Spawning an external model as an agent requires either a wrapper, a webhook, or an external orchestrator that Claude can trigger.
Tool-use quality varies. Not every model calls MCP tools cleanly. Codex is solid. Local models through OpenCode are improving. But some models still struggle with structured tool calls, which means the remember call might need guardrails or retry logic.
Memory quality depends on the agent. A cheaper model might store findings that are incomplete or poorly structured. The memory is only as good as what the agent writes. This probably needs a validation step or a schema contract for what "good findings" look like.
Coordination overhead. Deciding which agent gets which task, with which model, is itself a planning problem. Over-splitting is worse than sequential Claude if the coordination cost exceeds the savings.

None of these are blockers. They are engineering problems with known shapes.

Where This Is Going

The direction is clear even if the full implementation is not shipping tomorrow.

SynaBun is already the shared memory layer across Claude Code, Codex, OpenCode, and local models. The remember and recall tools already work across all of them. The category system, tagging, importance ranking, semantic search, and project scoping are already built.

The missing piece is the orchestration: letting Claude Code say "run this task on Codex, give it the SynaBun MCP connection, and bring me back the memory ID." That is a feature, not a research problem.

Once that orchestration exists, the economics of AI-assisted development shift. You stop paying premium prices for bulk research. You start paying premium prices only for the work that actually needs premium reasoning.

And the memory system stops being just a persistence layer. It becomes the communication protocol between models.

The best use of an expensive model is not making it do everything. It is making it the brain that directs cheaper models and synthesizes their work.

SynaBun memory is the layer that makes that handoff clean: structured, indexed, searchable, and persistent. Not a pipe. Not a file dump. A shared knowledge base that every agent can write to and the planning agent can query.

External models as research agents. SynaBun memory as the bus. Claude as the architect.

That is the shape of multi-model development we are building toward.

External Models as Agents, SynaBun Memory as the Bus