DeepSeek V4 Flash Free Usage Guide: Configuration for Cherry Studio and CC Switch

Introduction

The OpenModel platform recently launched an attractive promotion: DeepSeek V4 Flash is completely free to use, with zero charges for both input and output tokens, and no deductions from your account balance. For developers who regularly use AI coding tools, this means access to a high-quality backup model at zero cost. This article covers the details of this promotion and how to configure it in Cherry Studio and CC Switch.

Promotion Details and Model Features

Free Policy Overview

The OpenModel platform currently offers DeepSeek V4 Flash at no cost:

Input tokens: Free
Output tokens: Free
Rate limits: 10 requests per minute (10 RPM), 100K tokens per minute (TPM)
End date: Not officially specified; users will be notified when it ends

OpenModel platform rate limit details

Overall, the 10 RPM and 100K TPM limits are reasonable and more than sufficient for personal daily use. Here's what these metrics mean: RPM (Requests Per Minute) limits the number of requests within a time window, regardless of how many tokens each request consumes; TPM (Tokens Per Minute) limits the total tokens processed per minute, including both input and output tokens combined. 10 RPM means you can make at most one request every 6 seconds, and 100K TPM means approximately 100,000 tokens can be processed per minute. For reference, OpenAI's free tier typically offers 3 RPM, and Anthropic's free tier is around 5 RPM, so OpenModel's 10 RPM is quite generous among free services. For non-real-time batch processing scenarios, developers can optimize throughput under rate limits using request queues and exponential backoff strategies. No top-up or card binding is required — just register and start using it.

Model Positioning: A Cost-Effective Backup Model

While DeepSeek V4 Flash doesn't match top-tier models like Claude Opus 4 or GPT-5.5 in core coding capabilities, it has several key advantages:

1M (one million) context window: Consistent with the DeepSeek V4 Pro series, capable of handling extremely long texts
Deep thinking support: Equipped with chain-of-thought reasoning capabilities
Tool calling support: Compatible with tools like MCP

Regarding the technical significance of the 1M context window: a context window refers to the maximum number of tokens a large language model can process in a single inference. A 1M (approximately 1 million) token context window represents the current industry pinnacle. For intuitive reference, 1M tokens is roughly equivalent to 7.5 million English words or about 50-70 average books; for Chinese, it's approximately 3-5 million characters. Achieving this capability typically relies on sparse attention mechanisms, RoPE positional encoding extrapolation, and hierarchical caching techniques. In practice, ultra-long context enables the model to analyze an entire code repository, complete technical documentation sets, or perform cross-file code refactoring in a single pass without splitting content across multiple conversations. However, it's worth noting that larger context windows generally increase inference latency and cost, and the model's information retrieval accuracy in the middle portions of very long texts (the "needle in a haystack" capability) may decrease.

Regarding the technical background of the DeepSeek V4 series: DeepSeek V4 is the next-generation large language model series released by DeepSeek in 2025. DeepSeek is known for architectural innovation — its predecessor DeepSeek V3 employed a Mixture of Experts (MoE) architecture with 671B total parameters but only activating approximately 37B parameters per inference, achieving a balance between performance and efficiency. The V4 series continues this technical approach with further optimizations. V4 Flash is positioned as a lightweight, high-efficiency fast inference version emphasizing response speed and cost-effectiveness, while V4 Pro targets complex tasks with higher performance. DeepSeek has also introduced important training innovations including Multi-head Latent Attention (MLA) and fine-grained expert allocation strategies in DeepSeekMoE, enabling superior performance under equivalent computational resources. The Flash version typically undergoes additional optimization through model distillation, quantization, or architectural streamlining to achieve faster inference speeds and lower deployment costs.

DeepSeek V4 Flash performance as a backup model

The recommended usage strategy is: Assign core coding tasks to your primary model (such as Claude Opus 4 or GPT-5.5), while delegating "miscellaneous work" like processing spreadsheets, organizing documents, and retrieving information to DeepSeek V4 Flash. This ensures quality for critical tasks while saving on API costs.

Important Limitation: Anthropic Protocol Only

Currently, DeepSeek V4 Flash on the OpenModel platform only supports the Anthropic (Claude Code) compatible SDK mode and does not support OpenAI's standard Chat Completions format. This means you must select the Anthropic protocol type during configuration, or the API calls will fail.

It's worth explaining the differences between these two mainstream API protocols. OpenAI's Chat Completions API uses the /v1/chat/completions endpoint, passing conversation history via a messages array with roles divided into system, user, and assistant. Anthropic's Messages API uses the /v1/messages endpoint, where the system prompt is passed as a separate top-level parameter, and supports a richer content blocks structure including text, image, tool_use, tool_result, and other types. Authentication also differs: OpenAI uses the Authorization: Bearer header, while Anthropic uses the x-api-key header. This protocol difference means client tools must explicitly select the corresponding protocol type for correct communication.

The API Base URL is: https://api.openmodel.ai/v1

Registration and API Key Setup

Step 1: Register an OpenModel Account

Visit the OpenModel website
Click "Quick Start" to enter the console
Sign in with your Google account to complete registration

Step 2: Create an API Key

After logging into the console, navigate to the API key management page:

Click "Create Key"
Enter any name (other fields can be left blank)
Click create to obtain your API Key

Creating an API key

Make sure to save the generated Key — you'll need it for all subsequent configurations. In the console's usage statistics, you'll see request counts increasing while spending remains at zero.

Configuring in Cherry Studio

Add an Anthropic-Type Provider

Open Cherry Studio and go to the settings page
Click "Add Provider Type"
Critical step: The type must be set to Anthropic model type (because OpenModel is compatible with Claude Code's API protocol)

Enter API Configuration

API Request URL: https://api.openmodel.ai/v1
Key: Enter the API Key you created in the console
Click "Fetch Model List" to pull all available models

Select the DeepSeek V4 Flash Model

Search for and select deepseek-v4-flash in the model list:

Selecting the DeepSeek V4 Flash model

Once configured, return to the Cherry Studio home page and select OpenModel's DeepSeek V4 Flash model when starting a new chat. Testing confirms it supports thinking capabilities and tool calling (such as EXA's MCP web search), performing impressively as a backup model.

MCP (Model Context Protocol) mentioned here is an open standard protocol introduced by Anthropic in late 2024, designed to provide large language models with a unified way to connect to external tools and data sources. In the MCP architecture, AI applications act as MCP Clients, while various tools and services act as MCP Servers, communicating through standardized JSON-RPC protocol. MCP's core value lies in solving the M×N integration problem where every AI application previously needed custom integrations for each external service, simplifying it to an M+N standardized approach. The EXA MCP web search mentioned in this article is a typical MCP Server implementation that allows models to call EXA's search engine for real-time web information retrieval through a standard interface. The MCP ecosystem currently covers hundreds of tool services including file system operations, database queries, browser control, and code execution.

Configuring in CC Switch

If you regularly use Claude Code or Claude Desktop, you can configure this with one click using the CC Switch tool.

CC Switch is a community-developed Claude Code configuration management tool whose core function is enabling users to quickly switch between multiple API service endpoints without manually modifying environment variables or configuration files each time. Claude Code is Anthropic's command-line AI programming assistant that runs directly in the terminal, capable of reading/writing files, executing commands, and performing code search and refactoring. Claude Desktop is its desktop client version with a graphical interface. Both tools connect to Anthropic's official API by default, but can be pointed to any Anthropic-protocol-compatible third-party endpoint through environment variable configuration. CC Switch leverages this mechanism — through its model mapping feature, users can route a model name displayed in the Claude interface to a completely different backend model, enabling flexible model resource scheduling without changing usage habits.

Here are the specific configuration steps:

Open CC Switch
Click the plus button and select "Custom Configuration"
Service endpoint type: Select "Claude-compatible API service endpoint"
API Key: Enter your OpenModel key
Name can be anything; website link can be left blank

Model Mapping Setup

Since DeepSeek V4 Flash is not a native Claude model, you need to set up model mapping:

You can map lower-capability models like Claude Haiku to DeepSeek V4 Flash
You can also map models like Claude 4.6 to it

This way, when you select the corresponding model in Claude Code or Claude Desktop, it actually calls the free DeepSeek V4 Flash.

Usage Recommendations and Summary

Recommended Use Cases

Scenario	Recommended Model
Core coding tasks	Claude Opus 4 / GPT-5.5
Document organization, spreadsheet processing	DeepSeek V4 Flash (free)
Information retrieval, web queries	DeepSeek V4 Flash + MCP
Long text analysis (ultra-long context)	DeepSeek V4 Flash (1M context)

Important Notes

The promotion end date is undetermined — follow OpenModel's official announcements
Only Anthropic-compatible protocol is supported; OpenAI standard format is not supported
The free tier has rate limits (10 RPM / 100K TPM); be mindful during high-frequency usage
Coding capabilities are limited; core development tasks should still use your primary model

Overall, the limited-time free promotion for DeepSeek V4 Flash provides developers with a zero-cost, high-quality backup model option. The 1M context window, deep thinking, and tool calling support make it excellent for handling everyday miscellaneous tasks. While the promotion is still active, get it configured now.