How AI Coding Skills Work: From Function Call to Sub-Agent Implementation

From Function Call to Skills: breaking down the technical essence of Cursor Skills and their implementation.
This article progressively deconstructs Cursor Skills starting from Function Call. Function Call solves "how to call," MCP solves "where to call," and Skills solve "how to orchestrate complex tasks." Skills are essentially a clever wrapper around Function Call, using Markdown files for modular prompt management and on-demand loading, enabling LLMs to handle multi-step tasks. Spring AI Alibaba has built-in Skills support for integrating with any LLM.
Introduction
In the AI coding space, Cursor's Skills feature is changing how developers interact with LLMs. Many people know Skills are useful, but few understand the underlying mechanics. This article starts from Function Call, progressively breaks down the technical essence of Skills, and demonstrates how to implement Skills functionality by integrating with any LLM through Spring AI Alibaba.

Tools: Solving the "How to Call" Problem
The Essence of Function Call
LLMs cannot access real-time information on their own. When we ask an LLM to check Beijing's weather, it reasons through whether there's an available Tool to handle the current conversation. Once a matching Tool is found, the LLM returns structured JSON containing the Tool's method name and required parameters (e.g., query location "Beijing").
Function Call was first introduced by OpenAI in June 2023 for the GPT API and has since been widely adopted by major model providers. The core idea is that when an LLM determines it needs external information or needs to perform a specific action during reasoning, it generates a JSON output conforming to a predefined schema instead of a natural language response. This JSON contains the function name and parameters, and the client application handles the actual execution.
The application identifies this JSON, locates the corresponding method via reflection, and executes the call. Reflection is a technique in languages like Java that dynamically retrieves class information and invokes methods at runtime, allowing the application to dynamically locate and execute methods based on the function name string returned by the LLM. The essence of Tools is converting unstructured natural language into processable structured JSON, enabling the LLM to indirectly call methods in the application.
MCP: Solving the "Where to Call" Problem
When we need to query GitHub project info, blog content, map locations, and other third-party services, declaring separate Tool methods for each service is extremely costly. The problems are:
- The sheer number of third-party services makes implementing corresponding Tool methods a massive effort
- Tool methods cannot be shared across multiple AI applications, requiring redundant implementations
MCP (Model Context Protocol) solves this problem. Released by Anthropic in November 2024, MCP is an open standard protocol inspired by LSP (Language Server Protocol) — LSP unified the interaction between IDEs and language services, while MCP unifies the interaction between AI applications and external tool services.
MCP provides two communication methods: STDIO and HTTP (including SSE and Streamable). STDIO communicates between local processes via standard input/output streams, suitable for local tools. HTTP SSE (Server-Sent Events) supports remote service calls where the server can continuously push events to the client. Streamable HTTP is a more flexible transport method introduced later, supporting both stateless and stateful modes. This enables LLMs to connect with shared Tools from third-party services in a unified way.
However, it's important to note that MCP still relies on Function Call — from the LLM's perspective, it doesn't distinguish between external and internal Tools; they're all just tools.
Skills: Sub-Agents in Workflow Mode
Why Skills Are Needed
When an LLM's task goes beyond calling a single Tool and requires a series of workflow steps, the traditional approach demands extensive prompts telling the LLM how to decompose tasks and handle each step. For example, having an LLM search the web requires breaking it down into: open browser → enter search keywords → fetch page content → reason and return results. These prompts become extremely large.
The Claude team recognized this problem and introduced Skills.
Skills Structure
A Skill is a Markdown file containing two parts:
- Metadata: Defines the Skill's purpose (e.g., "web search", "file processing")
- Instructions: Detailed orchestration of each execution step, including which Tools to call, which scripts (Python/JS) to execute, etc.
How Skills Work
The core advantage of Skills is on-demand loading, rather than sending all prompts to the LLM at once:
- First, send each Skill's metadata (purpose description) to the LLM
- The LLM reasons which Skill is needed based on the user's request
- Returns JSON calling
CoreSkill, carrying the Skill name - The
CoreSkillmethod in the application reads the corresponding Markdown file by name - Sends the Markdown content to the LLM, which then reasons through the specific steps
Key Insight: Skills fundamentally still use the Function Call mechanism — they simply provide a built-in Function Call internally to read the corresponding Skill text, then return it to the LLM for reasoning. Therefore, a model that supports Function Call is required to support Skills.
Skills are also called Sub-agents — they are components within a larger agent system and are very easy to transfer and share. The Sub-agent concept originates from Multi-Agent System architecture, where a main agent (Orchestrator) understands user intent and distributes tasks, while multiple sub-agents each handle specific domain tasks. This design pattern is widely used in frameworks like AutoGPT, CrewAI, and MetaGPT. The advantage of Skills as sub-agents lies in their lightweight nature — a single Markdown file defines a complete workflow for a specialized domain, greatly simplifying compared to the complex configurations of traditional multi-agent frameworks.
Hands-On with Spring AI Alibaba: Integrating Any LLM
Code Implementation
Although Skills were originally introduced by Claude, understanding the principles allows us to implement them with any LLM using Spring AI Alibaba combined with Tools. Spring AI Alibaba is an extension project developed by Alibaba based on the Spring AI framework, designed to provide Java developers with a convenient AI application development experience. Spring AI itself is an AI integration framework launched by the Spring ecosystem in late 2023, offering a unified API abstraction layer for connecting to different LLM providers. Spring AI Alibaba adds native support for domestic models like Qwen and integrates cutting-edge capabilities such as MCP and Skills, enabling enterprise Java applications to quickly incorporate AI capabilities.
In the latest version of Spring AI Alibaba, the framework has built-in Skills implementation:
- Define Skill Agent Hook: Specify the root directory path for Skill files; the framework automatically reads all Skill Markdown files
- Define Shell Command Execution Hook: Because Skills may need to execute scripts in Python or other languages
- Configure Python Tool Support: Execute Python code through a third-party library provided by RawVM
Demo Results
Using "search for the latest papers on protein folding prediction" as an example, the execution flow is:
- The LLM reads the corresponding Skill's Markdown file
- Discovers it needs to execute a Python script
- Automatically checks if the Python environment is ready
- Runs the Python script to search for papers on the arXiv website
- When initial results are unsatisfactory, automatically reasons about the cause
- Checks script syntax, re-runs, and adjusts the search strategy
- Finally finds five relevant papers through category-based search
The entire process demonstrates Skills' autonomous reasoning and error recovery capabilities — the LLM decides its execution strategy step by step based on the Skill description file.
Skills Ecosystem and Resources
There is already a rich collection of ready-made Skills available. Through the Skills.sh website, you can search over 40,000 Skills, and developers can pick what they need without writing from scratch. This sharing mechanism significantly lowers the barrier to AI application development.
Summary
From the technology evolution perspective: Function Call → MCP → Skills, solving three core problems respectively: "how to call", "where to call", and "how to orchestrate complex tasks". Skills are not an entirely new paradigm but rather a clever wrapper around Function Call — using Markdown files to achieve modular prompt management and on-demand loading, enabling LLMs to handle complex multi-step tasks while maintaining good shareability and maintainability.
Related articles
TutorialsCursor + Codex Dual-IDE Collaboration: A Practical Methodology for Open-Source Project Customization
A complete methodology for open-source project customization based on real-world experience, detailing the Cursor+Codex dual-IDE workflow, seven-stage process, MVP validation, and AI source code reading techniques.
TutorialsCursor Multi-Agent in Practice: Building a Full-Stack Next.js Blog in 50 Minutes
Build a full-stack blog in 50 minutes using Cursor IDE's multi-Agent mode with Next.js, Clerk auth, and Supabase. Learn the 4-phase AI Agent workflow and key integration pitfalls.
TutorialsBuilding an AI Software Factory from Scratch: A Cursor Engineer's Hands-On Experience with Multi-Agent Collaboration
Cursor engineer Eric shares practical insights on building an AI software factory: automation levels, guardrail design, parallel Agent management, and scaling to 1000+ Agents for 24/7 development.