How AI Coding Skills Work: From Function Call to Sub-Agent Implementation

Introduction

In the AI coding space, Cursor's Skills feature is changing how developers interact with LLMs. Many people know Skills are useful, but few understand the underlying mechanics. This article starts from Function Call, progressively breaks down the technical essence of Skills, and demonstrates how to implement Skills functionality by integrating with any LLM through Spring AI Alibaba.

bilibili source

Tools: Solving the "How to Call" Problem

The Essence of Function Call

LLMs cannot access real-time information on their own. When we ask an LLM to check Beijing's weather, it reasons through whether there's an available Tool to handle the current conversation. Once a matching Tool is found, the LLM returns structured JSON containing the Tool's method name and required parameters (e.g., query location "Beijing").

Function Call was first introduced by OpenAI in June 2023 for the GPT API and has since been widely adopted by major model providers. The core idea is that when an LLM determines it needs external information or needs to perform a specific action during reasoning, it generates a JSON output conforming to a predefined schema instead of a natural language response. This JSON contains the function name and parameters, and the client application handles the actual execution.

The application identifies this JSON, locates the corresponding method via reflection, and executes the call. Reflection is a technique in languages like Java that dynamically retrieves class information and invokes methods at runtime, allowing the application to dynamically locate and execute methods based on the function name string returned by the LLM. The essence of Tools is converting unstructured natural language into processable structured JSON, enabling the LLM to indirectly call methods in the application.

MCP: Solving the "Where to Call" Problem

When we need to query GitHub project info, blog content, map locations, and other third-party services, declaring separate Tool methods for each service is extremely costly. The problems are:

The sheer number of third-party services makes implementing corresponding Tool methods a massive effort
Tool methods cannot be shared across multiple AI applications, requiring redundant implementations

MCP (Model Context Protocol) solves this problem. Released by Anthropic in November 2024, MCP is an open standard protocol inspired by LSP (Language Server Protocol) — LSP unified the interaction between IDEs and language services, while MCP unifies the interaction between AI applications and external tool services.

MCP provides two communication methods: STDIO and HTTP (including SSE and Streamable). STDIO communicates between local processes via standard input/output streams, suitable for local tools. HTTP SSE (Server-Sent Events) supports remote service calls where the server can continuously push events to the client. Streamable HTTP is a more flexible transport method introduced later, supporting both stateless and stateful modes. This enables LLMs to connect with shared Tools from third-party services in a unified way.

However, it's important to note that MCP still relies on Function Call — from the LLM's perspective, it doesn't distinguish between external and internal Tools; they're all just tools.

Skills: Sub-Agents in Workflow Mode

Why Skills Are Needed

When an LLM's task goes beyond calling a single Tool and requires a series of workflow steps, the traditional approach demands extensive prompts telling the LLM how to decompose tasks and handle each step. For example, having an LLM search the web requires breaking it down into: open browser → enter search keywords → fetch page content → reason and return results. These prompts become extremely large.

The Claude team recognized this problem and introduced Skills.

Skills Structure

A Skill is a Markdown file containing two parts:

Metadata: Defines the Skill's purpose (e.g., "web search", "file processing")
Instructions: Detailed orchestration of each execution step, including which Tools to call, which scripts (Python/JS) to execute, etc.

How Skills Work

The core advantage of Skills is on-demand loading, rather than sending all prompts to the LLM at once:

First, send each Skill's metadata (purpose description) to the LLM
The LLM reasons which Skill is needed based on the user's request
Returns JSON calling CoreSkill, carrying the Skill name
The CoreSkill method in the application reads the corresponding Markdown file by name
Sends the Markdown content to the LLM, which then reasons through the specific steps

Key Insight: Skills fundamentally still use the Function Call mechanism — they simply provide a built-in Function Call internally to read the corresponding Skill text, then return it to the LLM for reasoning. Therefore, a model that supports Function Call is required to support Skills.

Skills are also called Sub-agents — they are components within a larger agent system and are very easy to transfer and share. The Sub-agent concept originates from Multi-Agent System architecture, where a main agent (Orchestrator) understands user intent and distributes tasks, while multiple sub-agents each handle specific domain tasks. This design pattern is widely used in frameworks like AutoGPT, CrewAI, and MetaGPT. The advantage of Skills as sub-agents lies in their lightweight nature — a single Markdown file defines a complete workflow for a specialized domain, greatly simplifying compared to the complex configurations of traditional multi-agent frameworks.

Hands-On with Spring AI Alibaba: Integrating Any LLM

Code Implementation

Although Skills were originally introduced by Claude, understanding the principles allows us to implement them with any LLM using Spring AI Alibaba combined with Tools. Spring AI Alibaba is an extension project developed by Alibaba based on the Spring AI framework, designed to provide Java developers with a convenient AI application development experience. Spring AI itself is an AI integration framework launched by the Spring ecosystem in late 2023, offering a unified API abstraction layer for connecting to different LLM providers. Spring AI Alibaba adds native support for domestic models like Qwen and integrates cutting-edge capabilities such as MCP and Skills, enabling enterprise Java applications to quickly incorporate AI capabilities.

In the latest version of Spring AI Alibaba, the framework has built-in Skills implementation:

Define Skill Agent Hook: Specify the root directory path for Skill files; the framework automatically reads all Skill Markdown files
Define Shell Command Execution Hook: Because Skills may need to execute scripts in Python or other languages
Configure Python Tool Support: Execute Python code through a third-party library provided by RawVM

Demo Results

Using "search for the latest papers on protein folding prediction" as an example, the execution flow is:

The LLM reads the corresponding Skill's Markdown file
Discovers it needs to execute a Python script
Automatically checks if the Python environment is ready
Runs the Python script to search for papers on the arXiv website
When initial results are unsatisfactory, automatically reasons about the cause
Checks script syntax, re-runs, and adjusts the search strategy
Finally finds five relevant papers through category-based search

The entire process demonstrates Skills' autonomous reasoning and error recovery capabilities — the LLM decides its execution strategy step by step based on the Skill description file.

Skills Ecosystem and Resources

There is already a rich collection of ready-made Skills available. Through the Skills.sh website, you can search over 40,000 Skills, and developers can pick what they need without writing from scratch. This sharing mechanism significantly lowers the barrier to AI application development.

Summary

From the technology evolution perspective: Function Call → MCP → Skills, solving three core problems respectively: "how to call", "where to call", and "how to orchestrate complex tasks". Skills are not an entirely new paradigm but rather a clever wrapper around Function Call — using Markdown files to achieve modular prompt management and on-demand loading, enabling LLMs to handle complex multi-step tasks while maintaining good shareability and maintainability.