SubAgent Architecture: The Core Implementation for Context Isolation

Introduction: Why Do We Need Sub-Agents?

When building AI Agents, one unavoidable problem is context bloat. The longer an Agent converses with a user and the more tool calls it executes, the more bloated the messages array becomes. The input and output of every file read and command execution accumulates in the Agent's context, severely impacting the model's clarity of thought.

For example: when you ask an Agent "what testing framework does this project use," it might need to read five files to reach a conclusion. Without a sub-agent architecture, the complete contents of all five files pile up in the main Agent's context, creating enormous redundancy.

The Deeper Problem of Context Windows and Attention Dilution

An LLM's "Context Window" refers to the maximum number of tokens the model can process in a single inference. For example, Claude 3.5 has a context window of up to 200K tokens, while GPT-4o supports approximately 128K tokens. Although window capacity continues to grow, the problems caused by context bloat aren't just about "running out of space"—the deeper issue is Attention Dilution. Research shows that when context becomes too long, the model's attention to early critical information (such as core instructions in the System Prompt) drops significantly—a phenomenon known as the "Lost in the Middle" effect. In Agent scenarios, each tool call appends input and output to the messages array, and after prolonged operation, the model may "forget" the original task objective or get lost in a sea of intermediate information. Therefore, context management isn't just a performance issue—it's a core challenge for Agent reliability.

This is the core problem that the SubAgent solution addresses—context isolation.

SubAgent Working Principle

SubAgent Architecture Design

Division of Labor Between Parent and Child Agents

The core idea of the SubAgent approach can be analogized to a "supervisor and graduate student" relationship:

Parent Agent (Supervisor): Responsible for receiving user questions, breaking large tasks into subtasks, dispatching them to child Agents for execution, and ultimately receiving only a summary of the results
Child Agent (Graduate Student): Receives specific tasks dispatched by the parent Agent, independently executes tool calls (reading/writing files, running commands, etc.), and returns results to the parent Agent upon completion

The key point is that all intermediate information generated during the child Agent's execution (such as tool call errors, multiple retry attempts) never enters the parent Agent's messages. The parent Agent receives only a text summary, returned as an ordinary tool result.

Tool Call Mechanism and Messages Array Structure

To understand the SubAgent architecture, you first need to understand the Tool Call mechanism of modern LLMs. In OpenAI's and Anthropic's API designs, the messages array is an ordered list of conversation history, where each message contains a role (user/assistant/tool) and content field. When the model decides to call a tool, it generates an assistant message containing a tool_use block; after the tool finishes executing, the result is appended to the end of the array as a new message in tool_result format. This design means each tool call produces at least two new messages (call request + execution result), and tool outputs (such as full file contents, command output) are often large in size. In a complex programming task, an Agent might need to call tools dozens of times consecutively, and the token consumption of the messages array grows linearly or even exponentially—this is precisely the fundamental problem that the SubAgent approach solves at the architectural level.

This means that even if a child Agent runs 30 tool calls, the parent Agent's context remains clean.

How Does SubAgent Differ from TodoList and Agent Teams?

Many people easily confuse SubAgent with other Agent patterns. Here's a clear comparison:

Pattern	Core Purpose	Characteristics
TodoList	Task planning	Creates plan reminders, prevents the model from going off track, essentially a reminder
SubAgent	Context isolation	Independent messages, prevents conversation from being polluted by tool calls
Agent Teams	Multi-Agent collaboration	Multiple persistent Agents exist long-term, collaborating in parallel

A child Agent has a very short lifecycle—it exists only during task execution and is destroyed upon completion. In contrast, each Agent in Agent Teams exists long-term and is persistent.

Agent Loop Main Function

SubAgent Code Implementation Analysis

Tool Definition Differences Between Parent and Child Agents

In implementation, the parent Agent and child Agent have different tool sets:

Child Agent tools (childTools): Include basic tools like reading/writing files and running terminal commands
Parent Agent tools (parentTools): Include all child Agent tools plus an additional task tool for dispatching tasks

The elegance of this design is that the parent Agent both understands what capabilities the child Agent possesses (all basic tools) and has "leadership ability" (the task tool) to reasonably dispatch tasks.

Parent Agent Tool Calls

System Prompt Design Strategy

The System Prompts for the two roles are clearly differentiated:

Child Agent System Prompt: "You are a programming sub-agent that completes tasks assigned by the parent Agent in the specified directory and returns a summary"
Parent Agent System Prompt: "You are an agent that can use the task tool to assign exploration and subtasks in the working directory"

Engineering Significance of System Prompt Design and Role Boundaries

The System Prompt is one of the most important design decisions in LLM application engineering. It's injected before the conversation begins, defining the model's role, capability boundaries, behavioral guidelines, and output format. In multi-Agent systems, System Prompt design is particularly critical—Agents with different roles must have clear awareness of their responsibilities; otherwise, "overstepping" behavior (such as a child Agent attempting global planning) or "shirking" behavior (such as a parent Agent directly executing detailed tasks that should be delegated to a child Agent) can easily occur. From a Prompt Engineering perspective, the parent Agent's System Prompt needs to emphasize a "delegation and summarization" mindset, while the child Agent's System Prompt needs to emphasize "focused execution and refined summarization." This philosophy of role separation aligns closely with the Single Responsibility Principle in software engineering, representing a typical practice of migrating engineering design thinking to AI system architecture.

Through different system prompts, parent and child Agents each clearly understand their responsibility boundaries, avoiding role confusion.

Core Function: The run_subagent Workflow

The run_subagent function is the core of the entire SubAgent solution. Its workflow is as follows:

Receives the prompt (task instruction) passed from the parent Agent via the task tool
Creates an independent submessages array, completely isolated from the parent Agent's messages
Executes the task within a loop limited to 30 iterations (tool calls, file reads/writes, etc.)
Upon task completion, extracts text blocks from the final iteration and concatenates them into a summary using the join method
Returns the summary to the parent Agent as a tool result

Error messages and records of multiple attempts during the intermediate process no longer need to be retained once the child Agent resolves the issue, thereby achieving context streamlining.

SubAgent Result Return

Evolution of the Loop Structure

From single-Agent to SubAgent architecture, the Agent Loop structure undergoes a fundamental change:

Single-Agent architecture: A single Agent's while loop
SubAgent architecture: Parent Agent's while loop (main loop) + Child Agent's for loop (limited to 30 iterations)

Each time the parent Agent calls the task tool, it triggers the run_subagent function, creating a temporary child Agent to execute the task. Once the task is complete, the child Agent is destroyed and doesn't occupy the parent Agent's context space.

Current Limitations and Improvement Directions

Serial Execution Performance Bottleneck

The current implementation has an obvious engineering limitation: child Agents can only execute serially, not in parallel.

Specifically, the parent Agent can only run one child Agent per tool call and must wait for that child Agent to return results before making further judgments and decisions about whether to dispatch new tasks. If the parent Agent needs to call task twice, it will run two subagents serially.

Engineering Trade-offs Between Serial and Parallel Execution

The serial execution limitation of the current SubAgent approach is essentially a concurrent programming problem mapped to the AI Agent domain. In traditional software engineering, sequential tasks mean tasks must be completed in order, where the output of one task is the input of the next; parallel tasks allow multiple independent tasks to execute simultaneously, significantly improving throughput. Transforming SubAgents for parallel execution requires solving several technical challenges: First, constructing a task dependency graph—the parent Agent needs to determine which subtasks have no data dependencies and can be executed concurrently; second, result aggregation—when multiple child Agents return results simultaneously, the parent Agent needs to integrate this information in an orderly manner; finally, resource contention—when multiple child Agents read and write the same file simultaneously, locking mechanisms are needed. This is also why advanced AI coding tools like Claude Code and Devin invest substantial engineering resources in architecture—parallel multi-Agent collaboration is the key path to improving complex task processing efficiency.

This is inefficient in scenarios requiring simultaneous exploration of multiple directions, and represents a future optimization opportunity for parallel execution.

Lack of Shared Memory Between Child Agents

Another notable limitation is that multiple child Agents don't share conversation memory—they only share file system read/write access. This means one child Agent's discoveries cannot be directly passed to another child Agent; information can only be shared indirectly through the file system.

Conclusion

The core value of the SubAgent approach lies in context isolation—by delegating high-noise exploration tasks to temporarily created child Agents, it protects the parent Agent's system prompt and primary objectives from dilution. This is a highly practical Agent architecture pattern in engineering practice and an important piece for understanding the internal mechanisms of complex AI coding tools like Claude Code.

The evolution from single-Agent to parent-child Agent architecture essentially transforms "one person doing everything" into a collaborative model of "leaders dispatching tasks, subordinates executing independently." This also lays the foundation for subsequent Agent Teams multi-agent parallel collaboration.

Key Takeaways

The SubAgent approach solves the messages array bloat problem in Agent conversations through context isolation—the child Agent's intermediate processes don't pollute the parent Agent's context
The parent Agent has an additional task tool for dispatching tasks; child Agents return only text summaries as tool results after completing tasks
SubAgent is fundamentally different from TodoList (task planning to prevent going off track) and Agent Teams (multiple persistent Agents collaborating in parallel)—SubAgents have short lifecycles and focus on context isolation
Current implementation limitations include serial-only execution and no shared conversation memory between child Agents, with potential for future parallel improvements
The overall architecture evolves from a single Agent's while loop to a nested structure of the parent Agent's main loop plus child Agent for loops limited to 30 iterations