Deep Dive into MCP Code Execution & Agent Skills: Compressing 150K Tokens Down to 2,000

MCP faces token overload at scale; code execution and Agent Skills are the two key solutions.
MCP faces two bottlenecks at scale: tool definition overload (150K tokens) and redundant intermediate result transmission. Anthropic proposes two solutions: exposing MCP tools as code APIs with progressive discovery to reduce token consumption by 98.7%, and introducing Agent Skills as an open standard with three-layer progressive loading to package workflows into reusable capability units. MCP handles tool connectivity while Skills handle workflow packaging, together driving Agent infrastructure from the connectivity layer to the capability layer.
MCP's Scaling Bottleneck: Tool Definition Overload and Redundant Transmission
MCP (Model Context Protocol) is an open protocol released by Anthropic in November 2024 with a simple goal—enabling AI Agents to connect to external tools in a unified way. Whether it's GitHub, Slack, Salesforce, or custom APIs, the interface looks the same to the Agent.
Notably, MCP is built on top of the JSON-RPC 2.0 protocol, using a client-server architecture. Its design philosophy draws from the success of LSP (Language Server Protocol)—LSP unified the interface for IDEs to connect with language servers for any programming language, and MCP attempts to replicate this pattern for AI Agents. The protocol defines three core primitives: Tools (tool invocation), Resources (resource reading), and Prompts (prompt templates), covering the primary scenarios for Agent interaction with the external world.
One year after launch, the community has built thousands of MCP servers. But as scale grows, two critical problems have surfaced: tool definition overload and redundant transmission of intermediate results. Anthropic published two articles proposing "Code Execution" and "Agent Skills" as two solutions, outlining MCP's evolution from tool standardization to capability unitization.

The 150K Token Tool Definition Overload Problem
When an Agent simultaneously connects to GitHub, Slack, Sentry, Grafana, Splunk, and other services, each tool's definition must be loaded into the context before a conversation begins. Real-world data shows: tool definitions alone can consume 150,000 tokens, before the task even starts.
There's a real economic cost behind this number. Taking Claude 3.5 Sonnet as an example, input tokens cost approximately $3 per million tokens, meaning 150K tokens of tool definitions generate about $0.45 in fixed costs before each conversation begins—a non-trivial expense in high-frequency scenarios. More critically, research shows that models exhibit a "Lost in the Middle" phenomenon in ultra-long contexts—attention to information at the beginning and end of the context is significantly higher than for content in the middle.
This is the most direct trigger for Context Corruption—large volumes of irrelevant tool descriptions crowd out the limited context window, causing the model's attention to truly important information to decline.
The Redundant Transmission of Intermediate Results
Imagine asking an Agent to download a two-hour meeting recording from Google Drive and attach it to a sales record in Salesforce. Under the traditional MCP calling approach, the recording transcript enters the context twice: first when the Agent reads it, and second when the Agent passes it as a parameter to the Salesforce tool.
Two transmissions, two rounds of token consumption, when all you really need is for data to move from A to B.
The Core Solution: Exposing MCP Tools as Code APIs
The solution is unexpectedly elegant: expose MCP tools as code APIs, letting the Agent write code to call them instead of invoking tools directly.

The specific approach is to map each MCP server's tool functions into a set of TypeScript files within a Servers/ directory in the Agent's execution environment. When the Agent needs to call a tool, it first browses the file system to find the relevant tool definition, then writes code to invoke it directly.
The improvement is orders of magnitude: for the same task, the traditional approach uses 150K tokens while the code execution approach uses only 2,000 tokens—a 98.7% reduction.
Why is the gap so large? The core design principle here is Progressive Disclosure—a concept originating from UX design, where the core idea is to present information on demand rather than all at once. In software engineering, this manifests as Lazy Loading and Dynamic Import. Applied to Agent systems, it essentially shifts tool discovery from "compile time" to "runtime": the Agent no longer needs to preload all tool definitions. It first sees only the file directory, and only when it determines a tool is useful does it read the specific function signature. This shares a striking similarity with operating system virtual memory management—rather than loading all programs into physical memory, pages are swapped in on demand.
Four Major Capability Improvements from Code Execution
1. Execution-Layer Data Filtering
For example, you need to find pending orders in a spreadsheet with 10,000 rows. The traditional approach pushes all 10,000 rows into the context for the Agent to search through. With code execution, the Agent writes code to filter out completed rows in the execution environment, returning only the 5 matching results. The model sees 5 rows, not 10,000.
2. Code-Level Control Flow
Polling, loops, conditional logic—all of these can be written directly in code and executed without requiring a model inference call for each step.

The article gives an example: waiting for a specific message to appear in a Slack channel. The traditional approach requires repeatedly calling tools and having the model evaluate the result; the code approach is simply a while loop that only returns when the condition is met, dramatically reducing unnecessary model calls and first-token latency.
3. PII Data Protection
Within the execution environment, sensitive data in intermediate results (email addresses, phone numbers, names) can be automatically replaced with placeholders like [EMAIL], [PHONE]—only substituted back when actual data flow requires it. The model sees only de-identified data throughout.
This solves a real enterprise compliance challenge. PII (Personally Identifiable Information) protection is a core compliance requirement for enterprise AI deployment—both GDPR (General Data Protection Regulation) and CCPA (California Consumer Privacy Act) explicitly mandate that personal data processing must follow the principle of data minimization. In AI Agent scenarios, the Agent needs access to PII-containing data to complete tasks, but raw PII should not be exposed to the language model since inference logs carry data leakage risks. Automatic de-identification at the execution layer is a natural extension of bringing data tokenization practices into Agent architecture.
4. Code as Skills—Reusable Workflows
The code itself can be saved as a reusable skill. After an Agent successfully implements a workflow, it can save that code as a TypeScript file with an accompanying Skill.md description. Next time a similar task arises, it can be reused directly without re-reasoning.
This naturally leads to the second core concept—Agent Skills.
Agent Skills: Giving Agents Muscle Memory
Skills are a capability packaging standard released by Anthropic. The definition is simple: a directory containing a Skill.md file, which can include instructions, scripts, and other resources.

Think of it as a "job training manual" for the Agent—rather than building an entirely new specialized Agent, you equip a general-purpose Agent with a new skill pack that enables it to handle specific domain tasks. From a software engineering perspective, Skills are closer to a Plugin System than a traditional function library: each Skill is a self-contained capability unit carrying its own documentation, scripts, and metadata, following the Separation of Concerns design principle.
Three-Layer Progressive Loading Design
The most critical design aspect of Skills is three-layer progressive loading:
- Layer 1: Metadata—the skill's name and a one-line description, loaded into the system prompt at Agent startup
- Layer 2: Full Skill.md—loaded in full only when the Agent determines a skill is relevant to the current task
- Layer 3: Referenced files—read only when the Agent reaches a step that requires them
The elegance of this design is: no matter how many skills you equip an Agent with, the Layer 1 overhead is fixed—just one line of description per skill. This mechanism is similar to HTTP Content Negotiation—metadata is exchanged first to confirm capabilities, then full content is transmitted on demand. This allows the number of Skills to grow linearly without incurring quadratic context overhead, making it a key architectural decision for solving Agent capability scalability.
Example: PDF Operations Skill
The article provides a concrete example—a PDF skill. Claude can inherently understand PDF content, but lacks the ability to directly manipulate PDFs (such as filling PDF forms). The PDF skill packages this capability:
- The core
Skill.mdcontains basic PDF operation instructions - A separate
Forms.mdfile specifically covers form filling - An accompanying Python script precisely extracts form fields

This is far faster and more accurate than having the language model guess. When the Agent needs to fill a PDF form, it reads Skill.md, then Forms.md, and executes the Python script—throughout the process, only necessary context is loaded.
The Relationship Between MCP and Skills: Tentacles and Muscle Memory
The positioning of Skills is clear: it's not a competitor to MCP, but a layer above MCP.
- MCP handles tool connectivity—standardizing external service interfaces
- Skills handle workflow packaging—encapsulating complete task execution capabilities
A complete skill can simultaneously contain instructions on "how to call external services via MCP" as well as local scripts for processing results. Using a vivid metaphor: MCP extends the Agent's tentacles, while Skills give the Agent muscle memory.
The Three-Layer Evolution Logic of the MCP Ecosystem
Putting these two articles together reveals a clear three-layer evolution logic for the MCP ecosystem:
- Protocol Standardization—MCP unified tool interfaces, solving the connectivity problem
- Code Execution—By exposing tools as code APIs, it solves efficiency problems at scale, achieving a 98.7% token reduction
- Capability Unitization—Skills package workflows into composable capability units, enabling knowledge and experience reuse
Anthropic has released Agent Skills as an open standard, meaning the entire ecosystem can build reusable Agent capabilities around this framework. From "what tools can be connected" to "what tasks can be accomplished"—this is a critical step for Agent infrastructure moving from the connectivity layer to the capability layer.
For developers, the core insight is: don't make the Agent rediscover the world in every conversation—let it stand on the shoulders of existing skills. Progressive discovery, on-demand loading, code-based execution—these three design principles deserve serious consideration in the architecture design of any Agent system.
Key Takeaways
- Large-scale MCP deployment faces two major problems: tool definition overload (150K tokens) and redundant transmission of intermediate results
- Exposing MCP tools as code APIs enables progressive discovery, reducing token consumption from 150K to 2,000—a 98.7% reduction
- Code execution brings four capability improvements: data filtering, control flow optimization, PII protection, and skill reuse
- Agent Skills employ a three-layer progressive loading design, packaging workflows into composable capability units
- MCP handles tool connectivity (tentacles), Skills handle workflow packaging (muscle memory)—they complement rather than compete with each other
Related articles
Deep DivesDeep Dive into How OpenClaw (Open-Source Crayfish) AI Agent Works
Deep analysis of OpenClaw AI Agent internals: System Prompt, tool calling, SubAgents, Skill system, memory, and Context Engineering explained.
Deep DivesDemystifying Transformer: A Word-Continuation Function, Deconstructed
Understand Transformer through the lens of word continuation. Breaking down language generation into Embedding, Transformer Block, and Probability output modules for intuitive understanding.
Deep DivesFive Core Differences Between Claude Code and Regular AI Chat
A detailed comparison of Claude Code vs regular AI chat across five dimensions: interaction, context understanding, execution, memory, and tool integration.