Deep Dive into Cursor Skills: From Function Call to Practice
Deep Dive into Cursor Skills: From Fun…
Analyzing the evolution from Function Call to MCP to Skills in AI tool invocation with practical examples
This article starts from the foundational Function Call mechanism, progressively dissecting MCP protocol and the technical essence of Skills. Function Call solves how LLMs call external tools, MCP protocol unifies the standard for where to call, and Skills as on-demand Markdown files (Sub-Agents) solve efficient orchestration of complex tasks, avoiding the prompt bloat pain point of Workflow Agents. The article also demonstrates implementing Skill functionality with any LLM through Spring AI Alibaba.
Introduction
In today's rapidly evolving landscape of AI programming tools, Cursor's Skills feature has become a focal point for developers. Many know how to use it, but few understand how it works under the hood. This article starts from the foundational Function Call (Tools), progressively dissecting MCP protocol and Workflow Agent, ultimately diving deep into the essence of Skills. Through a practical case study with Spring AI Alibaba, we'll demonstrate how to integrate with any large language model to implement Skill functionality.
Starting with Tools: How LLMs Call External Capabilities
The Core Mechanism of Function Call
To understand Skills, you must first understand their foundation — Function Call (also known as Tools).
Function Call was officially introduced by OpenAI in June 2023 with the GPT-3.5/GPT-4 API update, and quickly became the de facto standard for LLM interaction with the external world. Its core concept derives from RPC (Remote Procedure Call) in programming languages, but with a critical layer of abstraction: the LLM doesn't execute code directly. Instead, it generates structured output conforming to JSON Schema to "describe" the function it wants to call and its parameters. This design elegantly decouples the LLM's natural language understanding capability from deterministic program execution, avoiding the security risks and uncertainty of having the LLM directly generate and run code.
When we ask an LLM to query "Beijing's weather," the model itself doesn't have real-time information retrieval capabilities. What it does is: after reasoning, it discovers there's a Tool specifically for weather queries, then returns a structured JSON message containing the Tool's method name and required parameters (such as the query location "Beijing").
The application recognizes this JSON and uses reflection to locate and execute the corresponding method. The reflection mechanism here refers to the ability in programming languages like Java to dynamically discover and invoke methods at runtime — the application uses reflection to locate the actual function implementation based on the method name string returned by the LLM and passes in the parameters for execution. Therefore, the essence of Tools is converting unstructured natural language into processable structured JSON information, allowing the LLM to indirectly call methods in the application.
Tools solve the "how to call" problem, but immediately face the "where to call" problem.
MCP Protocol: A Unified Standard for External Calls
When we need the LLM to query third-party services (such as blog content, map locations, etc.), declaring a Tools method for each third-party service has extremely high implementation costs. Even worse, if there are multiple AI applications, each one needs to redundantly implement these Tools methods — because they can't be shared.
MCP (Model Context Protocol) was born to solve this exact problem. It was officially released and open-sourced by Anthropic in November 2024, aiming to become the "USB-C port" for AI applications connecting to external data sources and tools. Before MCP, each AI application required a customized integration solution with each external service, creating an M×N complexity problem. MCP simplifies this to M+N: service providers only need to implement an MCP Server once, and all AI applications supporting MCP (MCP Clients) can call it directly.
MCP provides two unified calling methods:
- STDIO method: Standard input/output. The MCP Server runs as a subprocess, exchanging JSON-RPC messages through standard I/O streams. It has extremely low latency but is limited to local use, suitable for local inter-process communication scenarios.
- HTTP method: Includes SSE and Streamable variants. SSE (Server-Sent Events) is based on HTTP long connections, supporting server-to-client real-time data streaming, suitable for remote call scenarios. Streamable HTTP is a later improvement that resolves SSE compatibility issues in certain network environments, supporting more flexible request-response patterns.

Through the MCP protocol, third-party services can declare and expose Tools methods on their own, and the LLM calls these external tools remotely via a unified protocol. However, it's important to note that MCP fundamentally still relies on Function Call — from the LLM's perspective, it doesn't distinguish between internal Tools and external Tools; they're all just "tools."
The Birth of Skills: From Workflow to On-Demand Loading
Pain Points of Workflow Agent
LLM calling scenarios are becoming increasingly complex, often requiring more than a single Tool call to complete. For example:
- Having the LLM open a browser and search Baidu for specific information
- Having the LLM retrieve file information from the desktop and classify it
This is the so-called Agent in Workflow mode. Workflow Agent is one of the mainstream architectures in the current AI Agent field, with typical implementation frameworks including LangChain's Agent module, AutoGPT, and Microsoft's AutoGen. We need to tell the LLM through extensive prompts: how to break down steps for different tasks and how to handle each step. For example, "search for web information" needs to be broken down into: open browser → enter search keywords → retrieve webpage content → reason and return results.
The problem with this approach is: the prompts are extremely large, needing to be sent to the LLM all at once, which is inefficient and difficult to maintain. In traditional Workflow mode, all possible task processing logic needs to be pre-injected into the LLM's context window as System Prompts. As capabilities increase, these prompts can balloon to tens or even hundreds of thousands of tokens, not only consuming massive context window resources (even Claude 3.5's 200K context window becomes strained), but also causing the LLM's attention to scatter, reducing reasoning precision for the current task — this is the so-called "Lost in the Middle" problem, where the LLM's attention to information in the middle portions of ultra-long contexts drops significantly.
The Design Philosophy of Skills
Anthropic (the developer of the Claude LLM) recognized this problem and introduced the Skill mechanism. Anthropic was founded in 2021 by former OpenAI Research VP Dario Amodei and Daniela Amodei, and is currently one of the most influential companies in the AI field. Their flagship Claude model series is renowned for safety and long-context processing capabilities.
The Skill mechanism's design is deeply influenced by "Separation of Concerns" and "Lazy Loading" concepts from software engineering. A Skill is essentially a Markdown file containing two core parts:
- Metadata: Describes the current Skill's purpose, such as "web search," "file processing," etc.
- Instructions: Detailed orchestration of each execution step, including which Tools to call, which scripts (Python/JS) to execute, etc.
The choice of Markdown as the Skill's carrier format is deliberate: Markdown is one of the most common structured text formats in LLM training corpora, so LLMs have excellent parsing and comprehension capabilities for it. At the same time, Markdown has good readability and a low editing barrier — even non-technical personnel can write and maintain Skill files.

The core advantage of Skills lies in on-demand loading. The complete workflow is:
- The system sends all Skills' metadata (description information only) to the LLM
- The LLM reasons based on the user's request, selects the matching Skill, and returns a JSON (containing the
callSkillmethod and Skill name) - The application reads the corresponding Markdown file based on the Skill name
- The complete instructions of that Skill are sent to the LLM
- The LLM executes step by step according to the instructions
The key difference from Workflow Agent is: you don't need to send all processing capability prompts to the LLM at once — only the Skill instructions for the currently needed capability. This is also why Skills are called Sub-Agents — they're just one sub-component within a larger intelligent agent. The Sub-Agent concept originates from Multi-Agent Systems, where each Sub-Agent focuses on tasks in a specific domain, dynamically dispatched by an Orchestrator as needed. This architecture demonstrates far greater flexibility and scalability than a single agent for complex task processing.
Skills Are Still Fundamentally Function Calls
At this point, you'll realize that Skills still use the Function Call mechanism. They provide a built-in Function Call internally (such as the callSkill method) to read the corresponding Skill text and return it to the LLM for reasoning and execution. Therefore, you must have an LLM that supports Function Call to use Skills.
Practice: Implementing Skills with Any LLM via Spring AI Alibaba
Although Skills were originally introduced with the Claude model, since we understand the principles, we can use Spring AI Alibaba combined with Tools to integrate with any LLM for the same functionality.
Code Structure Analysis

Spring AI Alibaba is an extension project developed by Alibaba based on the Spring AI framework, designed to provide seamless integration with domestic LLMs (such as the Qwen series) for the Java ecosystem. Spring AI itself is an AI application development framework officially launched by Spring in 2023, providing a unified API abstraction layer that allows developers to interface with different LLM providers (OpenAI, Anthropic, Ollama, etc.) using the same code interface — similar to how Spring Data abstracts different databases.
In the latest version of Spring AI Alibaba, the core implementation is very concise:
- Define Skill Agent: The framework has built-in encapsulation — you only need to provide the root directory path for Skill files, and the framework automatically reads all Skill Markdown files
- Shell command execution capability: Since Skills may need to execute scripting languages like Python, they need to be executed via CMD commands
- Python tool support: Relies on third-party libraries provided by GraalVM to execute Python code. GraalVM is a high-performance polyglot virtual machine developed by Oracle. Its Polyglot feature allows direct execution of Python, JavaScript, Ruby, and other language code within the JVM without launching independent interpreter processes, avoiding cross-process communication overhead while providing sandbox-level security isolation.
What developers actually need to do is just two things: provide the Skill Markdown files and the corresponding Python scripts.
Actual Running Results
Using "search for the latest papers on protein folding prediction" as an example:

The system's execution process demonstrates Skill's intelligent orchestration capabilities:
- The LLM reads the corresponding Skill Markdown file
- It discovers a Python script needs to be executed, first checking if the Python environment is ready
- After confirming the environment, it runs the Python script to access arXiv for paper searches
- The initial search results are unsatisfactory, and the LLM automatically reasons about possible causes
- It checks the script logic, reruns it, and adjusts the search strategy (such as using category-based search)
- Finally successfully retrieves five relevant papers
Throughout the process, the LLM automatically decides the execution approach for each step based on the Skill description file, including error handling and strategy adjustment, fully demonstrating the Agent's autonomous reasoning capability. This "encounter problem → analyze cause → adjust strategy → retry" closed loop is the core characteristic that distinguishes Agents from simple API calls, and is a vivid manifestation of the ReAct (Reasoning + Acting) paradigm in real-world scenarios.
Skill Ecosystem and Resources
There are already abundant ready-made Skills available for use. Through the skills.sh website, you can search over 40,000 community-contributed Skills covering web search, file processing, code analysis, and various other scenarios. Developers can select them as needed.
skills.sh operates similarly to npm for JavaScript or PyPI for Python — providing a centralized discovery, sharing, and distribution channel for AI Skills. These Skills cover scenarios ranging from simple file operations and network requests to complex database management, CI/CD pipeline operations, and Kubernetes cluster management in DevOps scenarios, forming an increasingly rich capability ecosystem. Skill Markdown files are typically only a few KB to tens of KB in size. Compared to traditional plugin systems (which require compilation, packaging, and installation), their distribution and update costs are virtually zero — this lightweight characteristic is an important reason why the Skill ecosystem can grow rapidly.
The Markdown file format of Skills is naturally suited for transmission and sharing, which is another major advantage over traditional Workflow Agents.
Summary
From Function Call to MCP to Skills, the evolution path of AI tool invocation is very clear:
- Function Call solves the "how to call" problem
- MCP Protocol solves the "where to call" problem (unified external calling protocol)
- Skills solve the "how to efficiently orchestrate complex tasks" problem (on-demand loading Sub-Agents)
These three are not replacements for each other, but progressive capability layers. Understanding this technical lineage enables you to truly comprehend the "why" behind things, whether you're using Cursor or building your own AI applications. It's worth noting that this evolution path also mirrors the software architecture trend from monolith to microservices to Serverless — moving from tight coupling to loose coupling, from static configuration to dynamic orchestration, ultimately achieving on-demand loading and elastic scaling of capability delivery.
Key Takeaways
Related articles
Deep DivesDeep Dive into How OpenClaw (Open-Source Crayfish) AI Agent Works
Deep analysis of OpenClaw AI Agent internals: System Prompt, tool calling, SubAgents, Skill system, memory, and Context Engineering explained.
Deep DivesDemystifying Transformer: A Word-Continuation Function, Deconstructed
Understand Transformer through the lens of word continuation. Breaking down language generation into Embedding, Transformer Block, and Probability output modules for intuitive understanding.
Deep DivesFive Core Differences Between Claude Code and Regular AI Chat
A detailed comparison of Claude Code vs regular AI chat across five dimensions: interaction, context understanding, execution, memory, and tool integration.