CherryStudio + MCP: Tutorial for Building Automated AI Agents and Local Knowledge Bases

Introduction

As AI tools become increasingly abundant, many users are wondering how to extend large model capabilities from simple conversations to automated task execution. Cherry Studio, an open-source and free AI desktop client, makes it easy for ordinary users to build personalized automated AI agents by introducing the MCP (Model Context Protocol).

This article provides a complete walkthrough—from environment configuration to practical applications—covering the two core functionalities of Cherry Studio + MCP: automated tasks and local knowledge bases.

Cherry Studio Basic Configuration

Installation and Model Integration

Cherry Studio supports Windows, Mac, and Linux. You can download the installation package for your system from its GitHub project page. After installation, the first priority is configuring the AI model.

The client uses SiliconFlow's model service by default, but also supports connecting to DeepSeek and other models. Using DeepSeek as an example, the configuration steps are:

Find the DeepSeek option in Settings
Go to the DeepSeek API platform to create an API Key
Paste the Key into Cherry Studio and click Test
DeepSeek Chat corresponds to the V3 model, DeepSeek Reasoner corresponds to the R1 model

An important note: DeepSeek R1 does not support Function Call. Since Cherry Studio relies on Function Call to invoke MCP servers, you must select the DeepSeek V3 (i.e., DeepSeek Chat) model when using MCP features.

Function Call is the core mechanism for large models to interact with external tools. Here's how it works: when a user makes a request, the model analyzes the intent and determines whether an external tool is needed. If so, the model generates a structured JSON call request containing the tool name and parameters. The client receives this request, executes the corresponding tool operation, and returns the result to the model for final integration. R1, as a reasoning model, is architecturally designed for long-chain reasoning and does not integrate Function Call capability—this is the fundamental reason why the V3 model must be used in MCP scenarios.

MCP Protocol Explained and Environment Preparation

What is MCP

MCP (Model Context Protocol) is an open standard protocol introduced by Anthropic, designed to provide large models with a standardized way to connect to external data and tools. Think of it as the "USB port" of the AI world—allowing AI models to access various services and resources in a unified way, including querying databases, calling APIs, reading and writing files, etc.

From a technical architecture perspective, MCP adopts the classic Client-Server pattern. The MCP Client (such as Cherry Studio) handles interaction with the large model and forwards tool call requests, while the MCP Server provides specific functionality, with each Server encapsulating a set of specific capabilities (called Tools). The two communicate via the standardized JSON-RPC 2.0 protocol. Compared to traditional API integration approaches, MCP's revolutionary aspect is its unified interface specification—developers only need to write a Server once following the MCP standard, and it can be called by any MCP-compatible client without developing separate adapters for each AI application. This design greatly reduces the complexity of tool integration and enables rapid ecosystem expansion. Currently, the MCP protocol supports three main resource types: Tools (functions that models can actively call), Resources (data sources that models can read), and Prompts (predefined interaction patterns).

The core advantage of this architecture lies in its modular design: users can flexibly combine different MCP Servers like building blocks, constructing highly customized AI assistants based on their needs.

Installing Environment Dependencies

Using MCP features requires installing the following tools:

1. UV and Bun (Built-in Tools)

UV: A tool for managing Python environments
Bun: A development package for running JavaScript/TypeScript applications
Cherry Studio only uses built-in versions—even if already installed on your system, you need to reinstall them within the software
Since they're downloaded from GitHub, speeds may be slow; you can manually download and place them in the specified path

UV is a next-generation Python package management tool written in Rust by Astral, running 10-100x faster than traditional pip. In MCP scenarios, UV's core role is to quickly create isolated virtual environments and install dependencies for Python-based MCP Servers, avoiding package version conflicts between different Servers. Bun is a JavaScript/TypeScript runtime written in Zig that integrates a bundler, transpiler, package manager, and runtime into one, with startup speeds far exceeding Node.js. Cherry Studio uses Bun to run TypeScript-based MCP Servers, as many community-contributed Servers are developed in TypeScript, and Bun can directly execute .ts files without a pre-compilation step.

2. Node.js Environment

Some MCP services are built on Node.js; download and install the latest stable version from the official website
After installation, verify success with node -v and npm -v commands

After installation, be sure to restart Cherry Studio for the tools to take effect.

MCP Server Configuration in Practice

Two Configuration Methods

Cherry Studio provides two MCP configuration methods:

JSON Import: Same method as Claude Desktop configuration, suitable for copying ready-made configurations from GitHub
Quick Create: Configuration via command-line parameters, suitable for custom scenarios

Configuring the File System MCP Server

Find the File System service on the MCP Server GitHub project page and copy its JSON configuration into Cherry Studio. The key step is modifying the path in the configuration to your actual local directory (e.g., D:\\ABC)—this defines the directory scope that the MCP Server is allowed to access and operate on.

The File System MCP Server's permission design follows the principle of least privilege—it can only perform read/write operations within directories explicitly specified in the configuration and cannot access files in other system locations. This sandboxed design is an important part of MCP's security architecture, preventing AI from accidentally modifying or deleting critical system files during task execution.

Configuring the Web Scraping Tool

Similarly, copy Firecrawl's configuration from GitHub. Before use, you need to obtain an API Token from its official website and replace the API Key field in the JSON.

Firecrawl is a web scraping service designed specifically for AI applications. Unlike traditional crawling tools, it can convert complex web content (including JavaScript dynamically rendered pages) into clean Markdown or structured data formats, making it particularly suitable as input for large models. It provides multiple capabilities including scrape (single page), crawl (entire site), and search. The free tier offers 500 API calls per month, which is sufficient for personal use.

Configuring the Shell Controller

Configure MCP Shell via the Quick Create method:

Name: Custom (e.g., mcp-shell)
Type: STDIO (Standard Input/Output)
Command: Choose NPX (for TypeScript) or UVX (for Python)
Arguments: -y and @anthropic/mcp-shell

STDIO (Standard Input/Output) is a local transport method defined by the MCP protocol. In this mode, the MCP Client launches a subprocess to run the MCP Server, and the two communicate through standard input (stdin) and standard output (stdout) pipes. Each message is a complete JSON-RPC request or response, separated by newlines. Compared to the other transport method HTTP/SSE (Server-Sent Events, suitable for remote servers), STDIO's advantages are that it requires no network configuration and has lower latency, making it ideal for local tool scenarios. NPX is Node.js's package executor that can temporarily download and run npm packages without global installation; UVX provides similar functionality from UV for temporary execution of Python packages.

After configuration, you need to manually activate each MCP Server—a green button indicates successful activation.

Three Automated Application Scenario Demos

Scenario 1: Scraping Web Data to Generate CSV

Task: Scrape all data from an AI model leaderboard webpage and organize it into CSV format.

The AI automatically calls Firecrawl's scrape tool to fetch the web content and presents the structured data in CSV format. The entire process requires no manual crawler code—the AI independently completes data extraction and format conversion.

This scenario demonstrates MCP's "tool chain orchestration" capability. Under the hood, the large model first analyzes the user's intent and identifies that two steps are needed: first, calling the web scraping tool to obtain raw data; second, using its own text processing ability to convert unstructured content into CSV format. The entire decision-making process is completed autonomously by the model—users only need to describe the final goal in natural language.

Scenario 2: Searching Information and Generating Analysis Reports

Task: Search for NVIDIA RTX 50 series GPU information, write a Chinese analysis report in Markdown format, and save it locally.

The AI first uses the firecrawl-search tool to gather relevant information, then uses the file system functionality to create a Markdown document. The generated report covers GPU architecture, core technologies, performance parameters, and more—the information compilation is quite comprehensive.

This scenario demonstrates the power of multiple MCP Servers working together. In a single conversation, the AI chains calls to two different MCP Servers: first using Firecrawl for information retrieval, then using File System to write results to a local file. This cross-Server task orchestration capability is the core value of MCP's modular architecture—each Server focuses on its own capability domain, with the large model serving as the "dispatch center" for unified coordination.

Scenario 3: Executing Shell Commands

Task: Query the local Docker version number.

The AI executes the docker --version command through MCP Shell and accurately returns the version number. This demonstrates AI's ability to directly control system commands.

Security Warning: When using MCP Shell, you must strictly limit permissions to ensure only preset tasks are executed, avoiding accidental operations that could terminate running tasks or cause data loss.

MCP Shell essentially grants AI the ability to execute arbitrary system commands—a double-edged sword. In production environments, it's recommended to use whitelist mechanisms to restrict the range of executable commands, or use Docker containers and other isolated environments to run the Shell Server, confining potentially destructive operations within a sandbox. Anthropic officially also recommends explicitly declaring allowed command patterns in MCP Shell's configuration to prevent the model from executing dangerous operations (such as rm -rf) driven by "hallucinations."

Building a Local Knowledge Base

Deploying Local Models with Ollama

The core advantage of a local knowledge base is data privacy—all operations are executed locally without sending sensitive information to external servers.

Deployment Steps:

Download the installation package from the Ollama official website and install with one click
Verify successful installation with the ollama version command
Choose an appropriate model version (e.g., Qwen3 0.6B for demonstration)
Execute the corresponding command to download and run the model

Ollama is an open-source framework designed specifically for running large language models locally. It packages model downloading, quantization, and inference services into a concise command-line tool. Its underlying engine is based on llama.cpp, supporting GGUF format quantized models. Quantization is a model compression technique that reduces model weights from 32-bit floating point to 4-bit or 8-bit integer representations, dramatically reducing model size and memory usage while maintaining inference quality as much as possible. For example, an original 70B parameter model might require 140GB of VRAM, but after 4-bit quantization only needs about 35GB, making it possible to run on consumer-grade hardware.

For personal computers, it's recommended to start with smaller parameter versions. Large parameter models (such as the 235B flagship version at 142GB) require powerful computing resources that typically exceed personal computer capabilities.

A rule of thumb for hardware requirements: the memory/VRAM needed to run a model roughly equals the quantized model file size. If using GPU inference, you need corresponding VRAM; if using CPU inference, you need corresponding system RAM, but inference speed will be significantly slower. For a 7B parameter 4-bit quantized model, you typically need about 6-8GB of available memory/VRAM—a configuration most modern computers can meet.

Deploying the Embedding Model

The knowledge base also requires an embedding model (such as bge-m3), which converts text into vector representations so computers can understand text semantics and perform fast matching retrieval. It can also be downloaded and deployed via Ollama commands.

The Embedding Model is the foundational infrastructure of the entire knowledge base system. It works by mapping text of arbitrary length to a fixed-dimensional numerical vector (e.g., bge-m3 outputs 1024-dimensional vectors). Texts with similar semantics are close in vector space, allowing computers to judge semantic relevance between two text passages by calculating cosine similarity between vectors. bge-m3 is a multilingual embedding model developed by the Beijing Academy of Artificial Intelligence (BAAI), supporting over 100 languages, particularly excelling at Chinese semantic understanding, and supporting input lengths up to 8192 tokens—making it ideal for processing Chinese documents.

The entire knowledge base workflow follows the RAG (Retrieval-Augmented Generation) architecture: during the indexing phase, documents are split into small paragraphs (chunks), each converted to vectors via the embedding model and stored in a vector database; during the Q&A phase, the user's question is similarly converted to a vector, the system finds the most relevant document fragments through vector similarity search, injects these fragments as context into the large model's prompt, and the model generates answers based on this real information. This architecture effectively addresses the large model's "hallucination" problem and knowledge timeliness issues.

Creating the Knowledge Base

Enable Ollama in Cherry Studio settings and add the two downloaded models (note: names must be complete and correct—use ollama list to verify)
Click the Knowledge Base option, create a new knowledge base, and select the locally deployed bge-m3 as the embedding model
Supports multiple data formats including files, directories, URLs, and notes
Drag and drop files to upload—a green checkmark indicates successful loading

Knowledge Base Q&A Testing

On the chat page, select the local model and associate the knowledge base to conduct Q&A based on knowledge base content. Testing shows that AI can accurately cite specific information from the knowledge base (such as pet vaccination dates, birth dates, etc.). After disassociating the knowledge base, AI cannot answer related questions, verifying the knowledge base's effectiveness.

Analysis of Local Deployment Pros and Cons

Advantages:

No network connection required—works normally even offline
Data privacy and security, suitable for handling confidential files and sensitive data
Ideal for enterprise internal confidential documents or personal private data scenarios

Disadvantages:

Model performance may not match cloud-based online services
Large parameter models have high hardware requirements

From practical experience, locally deployed small parameter models (such as 7B-14B) perform quite well on specific tasks like knowledge base Q&A, because the RAG architecture shifts most of the "knowledge memory" burden to the retrieval system—the model only needs good reading comprehension and information integration abilities. However, in scenarios requiring the model's own capabilities, such as complex reasoning and creative writing, there remains a noticeable gap between local small models and cloud-based large models (like GPT-4, Claude 3.5). A compromise approach is: use local models for privacy-sensitive data processing, and call cloud APIs for non-sensitive tasks to get better results.

When choosing a deployment method, consider your actual needs, hardware resources, and different priorities regarding performance and privacy.

Conclusion

The combination of Cherry Studio + MCP + local large models provides individual users with a complete automated AI solution. MCP's modular design makes feature expansion simple and flexible, while local deployment ensures data security. It's recommended to explore more practical Servers in the MCP marketplace based on your actual needs, gradually building your own AI workflow.