Deep Dive into MCP Code Execution & Agent Skills: Compressing 150K Tokens Down to 2,000

MCP's Scaling Bottleneck: Tool Definition Overload and Redundant Transmission

MCP (Model Context Protocol) is an open protocol released by Anthropic in November 2024 with a simple goal—enabling AI Agents to connect to external tools in a unified way. Whether it's GitHub, Slack, Salesforce, or custom APIs, the interface looks the same to the Agent.

Notably, MCP is built on top of the JSON-RPC 2.0 protocol, using a client-server architecture. Its design philosophy draws from the success of LSP (Language Server Protocol)—LSP unified the interface for IDEs to connect with language servers for any programming language, and MCP attempts to replicate this pattern for AI Agents. The protocol defines three core primitives: Tools (tool invocation), Resources (resource reading), and Prompts (prompt templates), covering the primary scenarios for Agent interaction with the external world.

One year after launch, the community has built thousands of MCP servers. But as scale grows, two critical problems have surfaced: tool definition overload and redundant transmission of intermediate results. Anthropic published two articles proposing "Code Execution" and "Agent Skills" as two solutions, outlining MCP's evolution from tool standardization to capability unitization.

MCP Scaling Bottleneck Analysis

The 150K Token Tool Definition Overload Problem

When an Agent simultaneously connects to GitHub, Slack, Sentry, Grafana, Splunk, and other services, each tool's definition must be loaded into the context before a conversation begins. Real-world data shows: tool definitions alone can consume 150,000 tokens, before the task even starts.

There's a real economic cost behind this number. Taking Claude 3.5 Sonnet as an example, input tokens cost approximately $3 per million tokens, meaning 150K tokens of tool definitions generate about $0.45 in fixed costs before each conversation begins—a non-trivial expense in high-frequency scenarios. More critically, research shows that models exhibit a "Lost in the Middle" phenomenon in ultra-long contexts—attention to information at the beginning and end of the context is significantly higher than for content in the middle.

This is the most direct trigger for Context Corruption—large volumes of irrelevant tool descriptions crowd out the limited context window, causing the model's attention to truly important information to decline.

The Redundant Transmission of Intermediate Results

Imagine asking an Agent to download a two-hour meeting recording from Google Drive and attach it to a sales record in Salesforce. Under the traditional MCP calling approach, the recording transcript enters the context twice: first when the Agent reads it, and second when the Agent passes it as a parameter to the Salesforce tool.

Two transmissions, two rounds of token consumption, when all you really need is for data to move from A to B.

The Core Solution: Exposing MCP Tools as Code APIs

The solution is unexpectedly elegant: expose MCP tools as code APIs, letting the Agent write code to call them instead of invoking tools directly.

Architecture diagram of exposing MCP tools as code APIs

The specific approach is to map each MCP server's tool functions into a set of TypeScript files within a Servers/ directory in the Agent's execution environment. When the Agent needs to call a tool, it first browses the file system to find the relevant tool definition, then writes code to invoke it directly.

The improvement is orders of magnitude: for the same task, the traditional approach uses 150K tokens while the code execution approach uses only 2,000 tokens—a 98.7% reduction.

Why is the gap so large? The core design principle here is Progressive Disclosure—a concept originating from UX design, where the core idea is to present information on demand rather than all at once. In software engineering, this manifests as Lazy Loading and Dynamic Import. Applied to Agent systems, it essentially shifts tool discovery from "compile time" to "runtime": the Agent no longer needs to preload all tool definitions. It first sees only the file directory, and only when it determines a tool is useful does it read the specific function signature. This shares a striking similarity with operating system virtual memory management—rather than loading all programs into physical memory, pages are swapped in on demand.

Four Major Capability Improvements from Code Execution

1. Execution-Layer Data Filtering

For example, you need to find pending orders in a spreadsheet with 10,000 rows. The traditional approach pushes all 10,000 rows into the context for the Agent to search through. With code execution, the Agent writes code to filter out completed rows in the execution environment, returning only the 5 matching results. The model sees 5 rows, not 10,000.

2. Code-Level Control Flow

Polling, loops, conditional logic—all of these can be written directly in code and executed without requiring a model inference call for each step.

Code-level control flow reducing model calls

The article gives an example: waiting for a specific message to appear in a Slack channel. The traditional approach requires repeatedly calling tools and having the model evaluate the result; the code approach is simply a while loop that only returns when the condition is met, dramatically reducing unnecessary model calls and first-token latency.

3. PII Data Protection

Within the execution environment, sensitive data in intermediate results (email addresses, phone numbers, names) can be automatically replaced with placeholders like [EMAIL], [PHONE]—only substituted back when actual data flow requires it. The model sees only de-identified data throughout.

This solves a real enterprise compliance challenge. PII (Personally Identifiable Information) protection is a core compliance requirement for enterprise AI deployment—both GDPR (General Data Protection Regulation) and CCPA (California Consumer Privacy Act) explicitly mandate that personal data processing must follow the principle of data minimization. In AI Agent scenarios, the Agent needs access to PII-containing data to complete tasks, but raw PII should not be exposed to the language model since inference logs carry data leakage risks. Automatic de-identification at the execution layer is a natural extension of bringing data tokenization practices into Agent architecture.

4. Code as Skills—Reusable Workflows

The code itself can be saved as a reusable skill. After an Agent successfully implements a workflow, it can save that code as a TypeScript file with an accompanying Skill.md description. Next time a similar task arises, it can be reused directly without re-reasoning.

This naturally leads to the second core concept—Agent Skills.

Agent Skills: Giving Agents Muscle Memory

Skills are a capability packaging standard released by Anthropic. The definition is simple: a directory containing a Skill.md file, which can include instructions, scripts, and other resources.

Agent Skills directory structure

Think of it as a "job training manual" for the Agent—rather than building an entirely new specialized Agent, you equip a general-purpose Agent with a new skill pack that enables it to handle specific domain tasks. From a software engineering perspective, Skills are closer to a Plugin System than a traditional function library: each Skill is a self-contained capability unit carrying its own documentation, scripts, and metadata, following the Separation of Concerns design principle.

Three-Layer Progressive Loading Design

The most critical design aspect of Skills is three-layer progressive loading:

Layer 1: Metadata—the skill's name and a one-line description, loaded into the system prompt at Agent startup
Layer 2: Full Skill.md—loaded in full only when the Agent determines a skill is relevant to the current task
Layer 3: Referenced files—read only when the Agent reaches a step that requires them

The elegance of this design is: no matter how many skills you equip an Agent with, the Layer 1 overhead is fixed—just one line of description per skill. This mechanism is similar to HTTP Content Negotiation—metadata is exchanged first to confirm capabilities, then full content is transmitted on demand. This allows the number of Skills to grow linearly without incurring quadratic context overhead, making it a key architectural decision for solving Agent capability scalability.

Example: PDF Operations Skill

The article provides a concrete example—a PDF skill. Claude can inherently understand PDF content, but lacks the ability to directly manipulate PDFs (such as filling PDF forms). The PDF skill packages this capability:

The core Skill.md contains basic PDF operation instructions
A separate Forms.md file specifically covers form filling
An accompanying Python script precisely extracts form fields

PDF skill form field extraction workflow

This is far faster and more accurate than having the language model guess. When the Agent needs to fill a PDF form, it reads Skill.md, then Forms.md, and executes the Python script—throughout the process, only necessary context is loaded.

The Relationship Between MCP and Skills: Tentacles and Muscle Memory

The positioning of Skills is clear: it's not a competitor to MCP, but a layer above MCP.

MCP handles tool connectivity—standardizing external service interfaces
Skills handle workflow packaging—encapsulating complete task execution capabilities

A complete skill can simultaneously contain instructions on "how to call external services via MCP" as well as local scripts for processing results. Using a vivid metaphor: MCP extends the Agent's tentacles, while Skills give the Agent muscle memory.

The Three-Layer Evolution Logic of the MCP Ecosystem

Putting these two articles together reveals a clear three-layer evolution logic for the MCP ecosystem:

Protocol Standardization—MCP unified tool interfaces, solving the connectivity problem
Code Execution—By exposing tools as code APIs, it solves efficiency problems at scale, achieving a 98.7% token reduction
Capability Unitization—Skills package workflows into composable capability units, enabling knowledge and experience reuse

Anthropic has released Agent Skills as an open standard, meaning the entire ecosystem can build reusable Agent capabilities around this framework. From "what tools can be connected" to "what tasks can be accomplished"—this is a critical step for Agent infrastructure moving from the connectivity layer to the capability layer.

For developers, the core insight is: don't make the Agent rediscover the world in every conversation—let it stand on the shoulders of existing skills. Progressive discovery, on-demand loading, code-based execution—these three design principles deserve serious consideration in the architecture design of any Agent system.

Key Takeaways

Large-scale MCP deployment faces two major problems: tool definition overload (150K tokens) and redundant transmission of intermediate results
Exposing MCP tools as code APIs enables progressive discovery, reducing token consumption from 150K to 2,000—a 98.7% reduction
Code execution brings four capability improvements: data filtering, control flow optimization, PII protection, and skill reuse
Agent Skills employ a three-layer progressive loading design, packaging workflows into composable capability units
MCP handles tool connectivity (tentacles), Skills handle workflow packaging (muscle memory)—they complement rather than compete with each other