Understanding Function Calling and MCP Through Cursor's System Prompt

Introduction

In AI Agent development, tool calling (Function Calling) is one of the core capabilities. This article dissects the working principles of Function Calling and MCP (Model Context Protocol) by analyzing Cursor editor's system prompt, and tests the Agent capabilities of models with different parameter sizes.

bilibili source

The Basic Flow of Function Calling

Tool Definition and Calling Mechanism

An Agent's tool is essentially a function. Defining an Agent requires three elements: model instantiation (name, temperature, etc.), tool definitions (Tools), and a system prompt.

Function Calling was first introduced by OpenAI in June 2023 for the GPT API. Its design philosophy is that the LLM doesn't execute code directly, but generates structured descriptions of calling intent. This design completely decouples "decision-making" (model's responsibility) from "execution" (program's responsibility), allowing LLMs to safely interact with external systems without needing actual execution permissions. In practice, tool definitions are typically passed as a JSON object array to the API's tools parameter, with each tool containing four fields: type (currently fixed as "function"), function.name, function.description, and function.parameters.

The workflow is straightforward:

User asks a question (e.g., "What's 100+100?")
The system prompt instructs the model to output a specific format string (e.g., {name: "calculate", arguments: {expression: "100+100"}})
The function receives this string, performs the calculation, and returns the result
The model receives the result and formulates a natural language response

The key point: the system prompt must strictly regulate the AI's output format, otherwise the function cannot parse it and will throw an "unknown tool" error.

Core Differences Between Regular Tools and MCP Tools

Regular tools use synchronous calling, store tool descriptions in plain dictionaries, and are suited for local use. They must wait for one tool execution to complete before making the next call.

MCP tools differ in two major ways:

Fully adhere to JSON Schema standard format definitions
Support asynchronous message processing (via stdin mode), allowing multiple users to access the same tool simultaneously

MCP (Model Context Protocol) is an open protocol released by Anthropic in late 2024, designed to establish a unified communication standard between AI models and external data sources/tools. It draws from LSP (Language Server Protocol) design principles—just as LSP lets any editor connect to any language server, MCP lets any AI application connect to any tool service. JSON Schema mentioned here is a standard specification for describing JSON data structures (defined in IETF RFC drafts) that precisely describes each parameter's type, required status, value ranges, and other constraints, enabling different systems to understand tool input/output formats without additional negotiation.

MCP's asynchronous design exists because it's publicly available—multiple people may call the same tool simultaneously, and synchronous approaches would create waiting issues. MCP supports two transport modes: stdio (standard input/output, suitable for local inter-process communication) and HTTP+SSE (Server-Sent Events, suitable for remote services). Asynchronous communication is based on the JSON-RPC 2.0 protocol, where each message carries a unique ID, allowing requests and responses to arrive out of order, thus supporting concurrent calls.

Cursor System Prompt Structure Analysis

Core Components of the Prompt

Cursor's system prompt contains these core parts:

Identity definition: You are an assistant running in Cursor
Tool calling guidelines (Court Tooling): e.g., "Don't tell the user which tool you're calling"
Tool list and parameter definitions: e.g., search_and_reading, make_code_change, etc.
User-defined Rules: Injected into the system prompt
Attached Files: User-attached file contents

Each tool has detailed definitions of description, required parameters, parameter types, etc.—essentially telling the model "what this tool is and what input it needs."

The Relationship Between System Prompts and Tools

System prompts and tool definitions have a one-to-one, mutually dependent relationship. The prompt needs to include:

Usage examples for tools (Few-shot)
Strict output format requirements ("Only generate JSON format, don't explain steps")
Supported operation descriptions

Few-shot here refers to providing a small number of examples (typically 2-5) in the prompt to guide the model toward correct output patterns—a prompt engineering technique that adapts models to specific tasks without fine-tuning. For tool calling scenarios, few-shot examples typically demonstrate "user input → correct tool call JSON" mappings, helping the model understand when to call which tool and how to fill parameters.

Testing Agent Capabilities Across Different Model Sizes

Basic Tool Calling Test

Using "What is 2 to the power of 8?" to test small models (~1-2B) and 4B models, both correctly output well-formatted tool calling strings. The 4B model deliberates between 2**8 and 2^8 in its chain of thought, but that's a tool-side handling issue.

The Trust Problem with Tool Return Results

An interesting experiment: telling the model that 2^8 equals 200 (incorrect answer). After lengthy deliberation, the model chose to trust its own knowledge (256) rather than the tool's returned result. This shows the system prompt needs stronger instructions to "always trust tool return results."

This phenomenon involves the "grounding" problem in AI—what should the model use as its factual basis. LLMs have memorized vast world knowledge during pre-training, and when external tool results conflict with internal knowledge, the model faces a dilemma. In practical Agent systems, real-time data from tools should typically take priority over the model's static knowledge (since model knowledge has a training cutoff date), so instructions like "Always trust tool results over your own knowledge" need to be explicitly written into the system prompt.

Performance Differences in Complex Scenarios

Testing with Cursor's complete system prompt:

Small model: Unable to accurately locate and complete tasks within lengthy prompts
4B model: Successfully identified attached file content, followed custom rules (Spanish language response), and output code modification instructions in the required format

The 4B model's output perfectly matched Cursor's requirements—natural language explanation above for the user, structured data below that's actually passed to the function. This "dual output" design is common in Agent architectures: the user-facing natural language part provides readability and transparency, while the system-facing structured part ensures reliable program parsing. Small models fail at these tasks mainly due to insufficient context window utilization—although they technically support long contexts, precisely locating relevant instructions within thousands of tokens of system prompts while simultaneously satisfying multiple constraints places high demands on the model's attention mechanism and instruction-following ability.

Conclusion

The essence of Function Calling is constraining the LLM via system prompts to output specifically formatted strings, which are then parsed and executed by corresponding functions. MCP adds standardized definitions and asynchronous processing capabilities on top of this. A model's instruction-following ability directly determines Agent reliability, which is why model selection is crucial in Agent development.

Key Takeaways

Function Calling essentially constrains LLMs via system prompts to output specifically formatted strings, parsed and executed by functions
The core differences between MCP and regular tools lie in JSON Schema standardization and asynchronous message processing
System prompts and tool definitions must correspond one-to-one and are mutually indispensable
Model parameter count affects instruction-following ability in complex scenarios; 4B models can handle Cursor-level complex tool calls
When tool results conflict with the model's own knowledge, priority must be explicitly defined in the prompt