Anthropic's Official Breakdown: Three Core Components for Building AI Agents

Overview

At the Anthropic Developer Conference, Product Manager Brad Abrams delivered an in-depth breakdown of the core architecture for building next-generation AI Agents. He distilled the entire system into three pillars: Build, Connect, and Optimize, demonstrating through multiple live demos how these components work together, enabling developers to build powerful intelligent agents with minimal API calls.

The core philosophy behind this presentation is: no matter how intelligent a model is, its performance depends on how much data and how many tools you can provide it. Anthropic is building a highly composable component system that lets developers focus on business logic rather than underlying infrastructure.

And what publicly traded companies

We don't quite have the SONNET 4 scores up,

project like Asana has a complicated API structure.

Build: Code Execution Tool — Letting Claude Write and Run Its Own Code

Why Do AI Agents Need Code Execution Capabilities?

Despite large language models' ability to accomplish many remarkable tasks, certain scenarios still require traditional code execution capabilities. Brad cited several typical examples:

Advanced data analysis: Processing large spreadsheets, performing deep statistical analysis
Auditability requirements: Code is repeatable and verifiable
Precise computation: Tasks requiring deterministic results like mathematical operations and prime number calculations

This touches on a fundamental understanding in the AI field: large language models are essentially probabilistic text generation systems that produce output by predicting the next token. This means that for tasks requiring deterministic results (such as precise mathematical calculations or data aggregation statistics), the model's "reasoning" may produce approximate but imprecise results. For example, when asking an LLM to directly calculate large number multiplication, it's actually "simulating" the calculation process rather than truly performing arithmetic operations, and the error rate increases significantly with the number of digits. Code execution tools fundamentally solve this problem — letting the model handle "thinking about how to do it" while letting the computer handle "doing it precisely."

Anthropic's approach is straightforward: since Claude is already excellent at writing code, why not give it a computer to write and execute code on its own?

How the Code Execution Tool Works

The architecture design of the code execution tool is quite elegant:

The client sends a request to Claude with the code execution tool declaration
Claude analyzes the problem and decides whether code is needed
If needed, Claude writes Python code and sends it to a dedicated container
The container executes the code and returns stdout, stderr, and generated files
Claude reasons about the execution results and generates the final answer

The containerized design here deserves deeper understanding. Anthropic's containerized code execution solution is based on a modern cloud-native technology stack. Containers are a lightweight virtualization technology that achieves process isolation through Linux kernel namespace and cgroup mechanisms, offering faster startup and lower resource overhead compared to traditional virtual machines. Each organization having a dedicated isolated container means code execution environments don't interfere with each other, ensuring both security (preventing malicious code from affecting other users) and resource predictability. This design is similar to the serverless architecture philosophy of AWS Lambda, but optimized for AI Agent long-running interactive computation scenarios — containers can maintain state and support data persistence across multiple rounds of code execution.

Each organization has a dedicated isolated container, and developers have full control over container allocation strategies. Setup is extremely simple — just add a tools block to the existing Messages API.

Live Demo: A/B Test Analysis

Brad demonstrated live using Opus 4. The most impressive scenario was A/B test analysis: the model first analyzed the structure of an uploaded spreadsheet, then wrote deep analysis code, and even when unsatisfied with the initial analysis, proactively wrote additional code for deeper exploration, ultimately delivering business recommendations backed by data.

This "try again if not satisfied" behavioral pattern is known as "Self-Reflection" in AI research and is a key characteristic distinguishing advanced Agents from simple tool calls. The model can not only execute tasks but also evaluate the quality of its own output and take corrective action when necessary. This capability is typically achieved through the model's exposure to extensive code debugging and iterative optimization data patterns during training.

Shopify has already integrated this tool into its merchant assistant for helping merchants with A/B test analysis. Currently, each developer receives 50 hours of free container time.

Connect: Data Connection Layer — Bridging AI Agents with the External World

Web Search: Agentic Web Search

Model training data has a knowledge cutoff, but many application scenarios require real-time information — financial data, legal precedents, latest API documentation, etc. The Web Search tool solves exactly this problem.

Unlike traditional search, Claude's search is Agentic Search:

The model doesn't simply convert user questions into search queries
Instead, it first reasons about the overall task and decides on a search strategy
It autonomously decides how many times to search, what to search for, and when to stop
After each search, it evaluates results and decides whether deeper exploration is needed
Finally, it generates a report with complete citations and footnotes

This approach is fundamentally different from traditional Retrieval-Augmented Generation (RAG). Traditional RAG typically uses a fixed two-stage retrieve-then-generate pipeline: first converting the user query into a vector retrieval request, extracting relevant document fragments from a knowledge base, then injecting these fragments as context into the prompt. The limitation of this approach is that the retrieval strategy is static and cannot dynamically adjust based on intermediate results. Agentic search essentially models the search process as a multi-step decision problem — the model can evaluate the sufficiency of acquired information at each step, deciding whether to search again from a different angle or dive deeper in a specific direction. This approach draws on the Exploration-Exploitation Tradeoff concept from reinforcement learning, transforming information retrieval from passive response to active exploration.

All of this is accomplished in a single API call. Developers can also restrict search domains (e.g., only searching official documentation in customer service scenarios) and set maximum search rounds.

Quora is already using this feature in their consumer agents, as users frequently ask questions about current events.

MCP Connector: A Game Changer for Connecting to Remote MCP Servers

The MCP (Model Context Protocol) ecosystem is experiencing explosive growth. MCP is an open-source standardized protocol released by Anthropic in late 2024, designed to solve interoperability issues between AI models and external data sources and tools. Before MCP, every AI application needed custom integration code for each external service, creating an M×N complexity problem (M AI applications connecting to N external services). MCP reduces this complexity to M+N by defining a unified communication protocol (based on JSON-RPC 2.0) — any MCP-supporting model can call tools exposed by any MCP server. Remote MCP servers authenticate via OAuth 2.0 and support Server-Sent Events (SSE) for streaming communication. As of mid-2025, hundreds of services including GitHub, Slack, Google Drive, and Stripe have provided official MCP server implementations.

MCP Connector enables developers to directly call remote MCP servers from within their own Agents — a true game changer.

Brad demonstrated a complex multi-tool collaboration scenario:

"Based on my Asana project status, create an email with a creative motivational image and send it to the team."

This seemingly simple request required Claude to:

Call Asana MCP → Get workspace, search projects, retrieve task lists
Call image generation MCP (via Cloudflare-hosted remote MCP) → Generate motivational image
Call Zapier MCP → Combine all data and send a formatted email

Throughout the process, Claude demonstrated powerful long-horizon planning capabilities, autonomously navigating complex enterprise API structures (like Asana's multi-level API) without human intervention.

This long-horizon planning capability involves multiple frontier directions in AI research. The autoregressive generation approach of traditional LLMs naturally tends toward locally optimal decisions rather than global planning. Effective long-horizon planning typically requires: task decomposition ability (breaking complex goals into manageable subtasks), state tracking (maintaining awareness of current progress and remaining tasks during multi-step execution), and error recovery (ability to backtrack or adopt alternative approaches when a step fails). Anthropic likely enhances these capabilities through Extended Thinking, reinforcement learning training for tool-calling scenarios, and longer context windows.

Setup is equally simple: add an mcp_servers property to the Messages API, listing the MCP server URLs, names, and OAuth tokens. Multiple remote MCPs are already available including Asana, Zapier, and Cloudflare.

Ultimate Demo: Complete Agent Workflow with Four-Tool Collaboration

The most impressive demo chained all components together:

"Create an email containing a creative motivational image, Asana project status analysis (with completion percentage), related web news, and send it to the team."

Claude sequentially used in a single call: Asana MCP to get tasks → Code execution tool to calculate completion rate → Web Search to find related news → Image generation MCP to create illustrations → Zapier MCP to send the email. The entire workflow completed automatically, and the team ultimately received a beautifully formatted HTML email.

The technical implications of this demo go far beyond what's visible on the surface. It demonstrates the possibility of "tool combination explosion" — when N tools can be freely combined, the task space an Agent can handle grows exponentially. This is why Anthropic chose a highly composable API design: each tool is an independent atomic capability, but when combined, they can exhibit emergent complex behaviors far exceeding any single tool. This design philosophy is directly aligned with Unix's "small tools, big combinations" principle.

Optimize: AI Agent Performance Optimization Strategies

Prompt Caching

Prompt caching allows reuse of frequently used prompt segments, saving cost and latency. Its core principle leverages the KV Cache mechanism in the Transformer architecture. In attention computation, each inference requires calculating Key and Value matrices, which is one of the most computationally expensive parts. When multiple requests share the same prompt prefix (such as system prompts, few-shot examples, or long documents), caching the KV states corresponding to these prefixes avoids redundant computation, significantly reducing latency and cost.

Updates in this release:

Original 5-minute cache window
New 1-hour cache option, also enjoying 90% cache hit discount
Suitable for long-running Agents or scenarios where human users leave and return

The 90% cache hit discount reflects the actual computational resources saved — on cache hit, only the attention for new tokens needs to be computed, not the entire sequence. Given the 1-hour cache window design, Anthropic likely employs a distributed caching system (such as distributed memory storage based on consistent hashing), balancing GPU memory usage against user cost.

Batch Processing

Batch processing now supports Web Search, code execution, and MCP Connector, meaning it's no longer just a traditional batch processing tool but an asynchronous agent API:

50% price discount
Suitable for building asynchronous Agent workflows

Traditional batch processing in machine learning typically refers to packaging multiple inference requests together to improve GPU utilization. But Anthropic's positioning of it as an "asynchronous agent API" implies deeper architectural changes: each batch task can itself be a complete multi-step Agent execution flow (including search, code execution, MCP calls), just without requiring real-time results. This is extremely applicable for background data processing, scheduled report generation, large-scale content moderation, and similar scenarios — developers can submit thousands of complex Agent tasks, and the system automatically executes them during lower load periods and returns results.

Priority Tier

For enterprise customers requiring high reliability:

Purchase dedicated capacity on a monthly basis
99% availability guarantee
Discounts for long-term commitments

This tiered service model draws from the mature capacity reservation mechanisms of the cloud computing industry (similar to AWS Reserved Instances). For enterprises deploying AI Agents in critical business processes, API availability directly impacts business continuity. A 99% availability guarantee means at most approximately 7.2 hours of downtime per month, which is sufficient for most enterprise application scenarios while also providing Anthropic with a predictable load baseline for infrastructure planning.

Summary: The Evolution of AI Agents from Tool Calling to Autonomous Decision-Making

The component system Anthropic released demonstrates a clear product philosophy: minimal API, maximum capability. Whether it's code execution, web search, or MCP connection, everything is implemented by adding a single property to the existing Messages API. This highly composable design lets developers build complex Agent systems like assembling building blocks.

More noteworthy is Claude's improvement in long-horizon planning capabilities. From the demos, we can see the model autonomously decomposing complex tasks, selecting appropriate tools, handling multi-step dependencies, and even proactively conducting deeper exploration when results are unsatisfactory. This marks AI Agents transitioning from the "tool calling" stage toward genuine "autonomous decision-making."

From an industry development perspective, this evolutionary path is highly consistent with the academic trajectory of AI Agent research. Early tool-augmented LLMs (such as Toolformer and the ReAct framework) primarily addressed the problem of "how models know when to call tools"; the current stage's core challenge has shifted to "how models perform multi-step planning and adaptive decision-making in complex environments." By unifying tool calling, execution feedback, and planning reasoning within a single API framework, Anthropic is essentially building a general-purpose Agent Runtime, laying the infrastructure foundation for the next generation of autonomous AI systems.