How AI Coding Skills Work: From Function Call to Sub-Agent Implementation

Introduction

In the AI coding space, Cursor's Skills feature is transforming how developers interact with large language models. Many people know Skills are useful, but few understand the underlying mechanics. This article starts from Function Call, progressively breaks down the technical essence of Skills, and demonstrates how to implement Skills functionality by integrating with any LLM through Spring AI Alibaba.

bilibili source

Tools: Solving the "How to Call" Problem

The Essence of Function Call

LLMs cannot access real-time information on their own. When we ask an LLM to check the weather in Beijing, it goes through a reasoning process to determine whether there's an available Tool to handle the current conversation. Once a matching Tool is found, the LLM returns a structured JSON payload containing the Tool's method name and required parameters (e.g., query location "Beijing").

Function Call is a capability first introduced by OpenAI into the GPT API in June 2023, and has since been widely adopted by major model providers. The core idea is to let the LLM, during its reasoning process, generate a JSON output conforming to a predefined schema—rather than a natural language response—when it determines that external information or a specific action is needed. This JSON contains the function name and parameters, and the client application is responsible for the actual execution.

When the application detects this JSON, it uses reflection to locate and invoke the corresponding method. Reflection is a technique in languages like Java that dynamically retrieves class information and invokes methods at runtime, allowing the application to dynamically locate and execute the corresponding method based on the function name string returned by the LLM. The essence of Tools is transforming unstructured natural language into processable structured JSON information, enabling the LLM to indirectly call methods within the application.

MCP: Solving the "Where to Call" Problem

When we need to query GitHub project information, blog content, map locations, and other third-party services, declaring a separate Tool method for each service becomes extremely costly. The problems are:

The sheer number of third-party services makes implementing corresponding Tool methods a massive undertaking
Tool methods cannot be shared across multiple AI applications, requiring redundant implementations

MCP (Model Context Protocol) was introduced to solve this problem. Released by Anthropic in November 2024, MCP is an open standard protocol inspired by LSP (Language Server Protocol)—LSP unified the interaction between IDEs and programming language services, while MCP unifies the interaction between AI applications and external tool services.

MCP provides two communication methods: STDIO and HTTP (including SSE and Streamable). STDIO communicates between local processes via standard input/output streams, suitable for local tools. HTTP SSE (Server-Sent Events) supports remote service calls, where the server can continuously push events to the client. Streamable HTTP is a more flexible transport method introduced later, supporting both stateless and stateful modes. This allows LLMs to interface with shared Tools from third-party services in a unified way.

However, it's important to note that MCP still relies on Function Call—from the LLM's perspective, it doesn't distinguish between external and internal Tools; they're all just tools.

Skills: Sub-Agents in Workflow Mode

Why Skills Are Needed

When an LLM's task goes beyond calling a single Tool and requires a series of workflow steps to complete, the traditional approach demands extensive prompts telling the LLM how to break down tasks and handle each step. For example, having an LLM search for web information requires decomposition into: open browser → enter search keywords → retrieve webpage content → reason and return results. These prompts become enormously large.

The Claude team recognized this problem and introduced Skills.

The Structure of Skills

A Skill is a Markdown file consisting of two parts:

Metadata: Defines the purpose of the current Skill (e.g., "web search", "file processing")
Instructions: Detailed orchestration of each execution step, including which Tools to call, which scripts to run (Python/JS), etc.

How Skills Work

The core advantage of Skills is on-demand loading, rather than sending all prompts to the LLM at once:

First, the metadata (i.e., purpose descriptions) of each Skill is sent to the LLM
The LLM reasons about which Skill is needed based on the user's request
It returns JSON information calling CoreSkill, carrying the Skill name
The CoreSkill method in the application reads the corresponding Markdown file based on the name
The Markdown content is sent to the LLM, which then reasons through and executes the specific steps

Key Insight: Skills fundamentally still use the Function Call mechanism—they simply provide a built-in Function Call internally to read the corresponding Skill text, which is then returned to the LLM for reasoning. Therefore, a model that supports Function Call is required to support Skills.

Skills are also called Sub-agents. They are components within a larger agent system and are very easy to transfer and share. The Sub-agent concept originates from Multi-Agent System architecture, where a main agent (Orchestrator) is responsible for understanding user intent and distributing tasks, while multiple sub-agents each handle task execution in specific domains. This design pattern is widely adopted in frameworks like AutoGPT, CrewAI, and MetaGPT. The advantage of Skills as sub-agents lies in their lightweight nature—a single Markdown file defines a complete workflow for a specialized domain, greatly simplifying things compared to the complex configurations of traditional multi-agent frameworks.

Spring AI Alibaba in Practice: Integrating with Any LLM

Code Implementation

Although Skills were originally introduced by Claude, understanding the principles allows us to implement them using Spring AI Alibaba combined with Tools to integrate with any LLM. Spring AI Alibaba is an extension project developed by Alibaba based on the Spring AI framework, designed to provide Java developers with a convenient AI application development experience. Spring AI itself is an AI integration framework launched by the Spring ecosystem in late 2023, providing a unified API abstraction layer for interfacing with different LLM providers. Spring AI Alibaba builds on this foundation with native support for domestic models like Qwen (Tongyi Qianwen), and integrates cutting-edge capabilities like MCP and Skills, enabling enterprise Java applications to quickly incorporate AI capabilities.

In the latest version of Spring AI Alibaba, the framework has built-in Skills implementation:

Define Skill Agent Hook: Specify the root directory path for Skill files; the framework automatically reads all Skill Markdown files
Define Shell Command Execution Hook: Because Skills may need to execute scripting languages like Python
Configure Python Tool Support: Execute Python code through third-party libraries provided by RawVM

Demo Results

Using "search for the latest papers on protein folding prediction" as an example, the entire execution flow is as follows:

The LLM reads the corresponding Skill's Markdown file
It discovers that a Python script needs to be executed
It automatically checks whether the Python environment is ready
It runs the Python script to search for papers on the arXiv website
When initial search results are unsatisfactory, it automatically reasons about the cause
It checks script syntax, reruns, and adjusts the search strategy
Finally, it finds five relevant papers through category-based search

The entire process demonstrates Skills' autonomous reasoning and error recovery capabilities—the LLM decides its execution strategy step by step based on the Skill description file.

Skills Ecosystem and Resources

There is already a rich collection of ready-made Skills available. Through the Skills.sh website, you can search over 40,000 Skills, and developers can select them as needed without writing from scratch. This sharing mechanism significantly lowers the barrier to AI application development.

Conclusion

From a technical evolution perspective: Function Call → MCP → Skills respectively solve three core problems: "how to call," "where to call," and "how to orchestrate complex tasks." Skills are not an entirely new technical paradigm but rather a clever encapsulation of Function Call—using Markdown files to achieve modular management and on-demand loading of prompts, enabling LLMs to handle complex multi-step tasks while maintaining excellent shareability and maintainability.