How AI Coding Skills Work: From Function Call to Sub-Agent Implementation

From Function Call to Skills: deconstructing the technical essence of Cursor Skills and their implementation.
This article progressively deconstructs the technical essence of Cursor Skills starting from Function Call. Function Call solves "how to call," MCP solves "where to call," and Skills solve "how to orchestrate complex tasks." Skills are essentially a clever encapsulation of Function Call, using Markdown files for modular prompt management and on-demand loading, enabling LLMs to handle multi-step tasks. Spring AI Alibaba has built-in Skills support for integrating with any LLM.
Introduction
In the AI coding space, Cursor's Skills feature is transforming how developers interact with large language models. Many people know Skills are useful, but few understand the underlying mechanics. This article starts from Function Call, progressively breaks down the technical essence of Skills, and demonstrates how to implement Skills functionality by integrating with any LLM through Spring AI Alibaba.

Tools: Solving the "How to Call" Problem
The Essence of Function Call
LLMs cannot access real-time information on their own. When we ask an LLM to check the weather in Beijing, it goes through a reasoning process to determine whether there's an available Tool to handle the current conversation. Once a matching Tool is found, the LLM returns a structured JSON payload containing the Tool's method name and required parameters (e.g., query location "Beijing").
Function Call is a capability first introduced by OpenAI into the GPT API in June 2023, and has since been widely adopted by major model providers. The core idea is to let the LLM, during its reasoning process, generate a JSON output conforming to a predefined schema—rather than a natural language response—when it determines that external information or a specific action is needed. This JSON contains the function name and parameters, and the client application is responsible for the actual execution.
When the application detects this JSON, it uses reflection to locate and invoke the corresponding method. Reflection is a technique in languages like Java that dynamically retrieves class information and invokes methods at runtime, allowing the application to dynamically locate and execute the corresponding method based on the function name string returned by the LLM. The essence of Tools is transforming unstructured natural language into processable structured JSON information, enabling the LLM to indirectly call methods within the application.
MCP: Solving the "Where to Call" Problem
When we need to query GitHub project information, blog content, map locations, and other third-party services, declaring a separate Tool method for each service becomes extremely costly. The problems are:
- The sheer number of third-party services makes implementing corresponding Tool methods a massive undertaking
- Tool methods cannot be shared across multiple AI applications, requiring redundant implementations
MCP (Model Context Protocol) was introduced to solve this problem. Released by Anthropic in November 2024, MCP is an open standard protocol inspired by LSP (Language Server Protocol)—LSP unified the interaction between IDEs and programming language services, while MCP unifies the interaction between AI applications and external tool services.
MCP provides two communication methods: STDIO and HTTP (including SSE and Streamable). STDIO communicates between local processes via standard input/output streams, suitable for local tools. HTTP SSE (Server-Sent Events) supports remote service calls, where the server can continuously push events to the client. Streamable HTTP is a more flexible transport method introduced later, supporting both stateless and stateful modes. This allows LLMs to interface with shared Tools from third-party services in a unified way.
However, it's important to note that MCP still relies on Function Call—from the LLM's perspective, it doesn't distinguish between external and internal Tools; they're all just tools.
Skills: Sub-Agents in Workflow Mode
Why Skills Are Needed
When an LLM's task goes beyond calling a single Tool and requires a series of workflow steps to complete, the traditional approach demands extensive prompts telling the LLM how to break down tasks and handle each step. For example, having an LLM search for web information requires decomposition into: open browser → enter search keywords → retrieve webpage content → reason and return results. These prompts become enormously large.
The Claude team recognized this problem and introduced Skills.
The Structure of Skills
A Skill is a Markdown file consisting of two parts:
- Metadata: Defines the purpose of the current Skill (e.g., "web search", "file processing")
- Instructions: Detailed orchestration of each execution step, including which Tools to call, which scripts to run (Python/JS), etc.
How Skills Work
The core advantage of Skills is on-demand loading, rather than sending all prompts to the LLM at once:
- First, the metadata (i.e., purpose descriptions) of each Skill is sent to the LLM
- The LLM reasons about which Skill is needed based on the user's request
- It returns JSON information calling
CoreSkill, carrying the Skill name - The
CoreSkillmethod in the application reads the corresponding Markdown file based on the name - The Markdown content is sent to the LLM, which then reasons through and executes the specific steps
Key Insight: Skills fundamentally still use the Function Call mechanism—they simply provide a built-in Function Call internally to read the corresponding Skill text, which is then returned to the LLM for reasoning. Therefore, a model that supports Function Call is required to support Skills.
Skills are also called Sub-agents. They are components within a larger agent system and are very easy to transfer and share. The Sub-agent concept originates from Multi-Agent System architecture, where a main agent (Orchestrator) is responsible for understanding user intent and distributing tasks, while multiple sub-agents each handle task execution in specific domains. This design pattern is widely adopted in frameworks like AutoGPT, CrewAI, and MetaGPT. The advantage of Skills as sub-agents lies in their lightweight nature—a single Markdown file defines a complete workflow for a specialized domain, greatly simplifying things compared to the complex configurations of traditional multi-agent frameworks.
Spring AI Alibaba in Practice: Integrating with Any LLM
Code Implementation
Although Skills were originally introduced by Claude, understanding the principles allows us to implement them using Spring AI Alibaba combined with Tools to integrate with any LLM. Spring AI Alibaba is an extension project developed by Alibaba based on the Spring AI framework, designed to provide Java developers with a convenient AI application development experience. Spring AI itself is an AI integration framework launched by the Spring ecosystem in late 2023, providing a unified API abstraction layer for interfacing with different LLM providers. Spring AI Alibaba builds on this foundation with native support for domestic models like Qwen (Tongyi Qianwen), and integrates cutting-edge capabilities like MCP and Skills, enabling enterprise Java applications to quickly incorporate AI capabilities.
In the latest version of Spring AI Alibaba, the framework has built-in Skills implementation:
- Define Skill Agent Hook: Specify the root directory path for Skill files; the framework automatically reads all Skill Markdown files
- Define Shell Command Execution Hook: Because Skills may need to execute scripting languages like Python
- Configure Python Tool Support: Execute Python code through third-party libraries provided by RawVM
Demo Results
Using "search for the latest papers on protein folding prediction" as an example, the entire execution flow is as follows:
- The LLM reads the corresponding Skill's Markdown file
- It discovers that a Python script needs to be executed
- It automatically checks whether the Python environment is ready
- It runs the Python script to search for papers on the arXiv website
- When initial search results are unsatisfactory, it automatically reasons about the cause
- It checks script syntax, reruns, and adjusts the search strategy
- Finally, it finds five relevant papers through category-based search
The entire process demonstrates Skills' autonomous reasoning and error recovery capabilities—the LLM decides its execution strategy step by step based on the Skill description file.
Skills Ecosystem and Resources
There is already a rich collection of ready-made Skills available. Through the Skills.sh website, you can search over 40,000 Skills, and developers can select them as needed without writing from scratch. This sharing mechanism significantly lowers the barrier to AI application development.
Conclusion
From a technical evolution perspective: Function Call → MCP → Skills respectively solve three core problems: "how to call," "where to call," and "how to orchestrate complex tasks." Skills are not an entirely new technical paradigm but rather a clever encapsulation of Function Call—using Markdown files to achieve modular management and on-demand loading of prompts, enabling LLMs to handle complex multi-step tasks while maintaining excellent shareability and maintainability.
Related articles
TutorialsCursor + Codex Dual-IDE Collaboration: A Practical Methodology for Open-Source Project Customization
A complete methodology for open-source project customization based on real-world experience, detailing the Cursor+Codex dual-IDE workflow, seven-stage process, MVP validation, and AI source code reading techniques.
TutorialsCursor Multi-Agent in Practice: Building a Full-Stack Next.js Blog in 50 Minutes
Build a full-stack blog in 50 minutes using Cursor IDE's multi-Agent mode with Next.js, Clerk auth, and Supabase. Learn the 4-phase AI Agent workflow and key integration pitfalls.
TutorialsBuilding an AI Software Factory from Scratch: A Cursor Engineer's Hands-On Experience with Multi-Agent Collaboration
Cursor engineer Eric shares practical insights on building an AI software factory: automation levels, guardrail design, parallel Agent management, and scaling to 1000+ Agents for 24/7 development.