Claude Code Model Configuration & Cost Comparison: A Practical Money-Saving Guide

Introduction: The Hidden Costs of AI Programming Tools

When using AI programming tools like Claude Code, many developers overlook a critical issue — the costs driven by token consumption. Tokens are the basic units that large language models use to process text, and they don't simply equate to a single character or word. For English, one token corresponds to roughly 4 characters or 0.75 words; for Chinese, a single character is typically encoded as 1-2 tokens. In programming scenarios, tokenization is even more complex — variable names, function signatures, indentation, comments, and more are all broken down into tokens. The reason tools like Claude Code consume tokens at an alarming rate is that they need to send entire project code files, directory structures, dependency configurations, and other context to the model simultaneously. A medium-sized project's context alone can easily consume tens or even hundreds of thousands of tokens.

This article is based on the real-world experience of a Bilibili content creator, breaking down the actual costs of various models in Claude Code and providing the most cost-effective configuration strategies.

Claude Code Tutorial

Mainstream Model Price Comparison

Anthropic Official Model Pricing

Among the high-end models currently supported by Claude Code, both Sonnet and Opus support a 1M (one million) context window with impressive capabilities, but their prices are equally substantial. The context window refers to the maximum number of tokens a model can process in a single inference. A 1M context window means the model can "read" approximately 750,000 English words or hundreds of thousands of lines of code in one go. This capability is crucial for programming scenarios because understanding a function's bug might require tracing the entire call chain, related type definitions, and configuration files. However, the larger the context window, the more tokens are sent per request, and the higher the cost — this explains why programming tools cost far more than regular chat. A normal conversation might only need a few hundred tokens of context, while code analysis routinely requires tens of thousands.

Specific pricing:

Sonnet: $3 per million input tokens, $15 per million output tokens
Opus: $5 per million input tokens, $25 per million output tokens
Haiku (lightweight version): Approximately 1/5 the price of the above models, suitable for simple tasks

These numbers may seem abstract, but in real programming scenarios, token consumption far exceeds regular conversations due to large context windows and numerous code files. It's worth noting that output token prices are typically 3-5x higher than input tokens because generating text (output) requires more computational resources than understanding text (input) — each generated token requires a complete forward inference pass.

Lessons Learned from Domestic Models

The content creator shared a real case: he once configured the Qwen Coder model in Claude Code with a balance of 37 RMB, and it was completely depleted after just two conversations. Qwen Coder's tiered pricing charges approximately 24 RMB per million tokens, and slightly larger projects burn through credits rapidly.

Usage Cost Explanation

Key lesson: Token consumption in programming tools is far higher than regular chat because large amounts of code context must be read. A seemingly simple request like "help me fix this bug" might involve sending dozens of code files, configuration files, and dependency declarations to the model, easily generating 100,000+ tokens of consumption.

DeepSeek V4 Pro: Currently the Best Value

DeepSeek is a large language model series developed by DeepSeek AI, a company incubated by the quantitative fund High-Flyer, with powerful computing resources and R&D capabilities. The DeepSeek model series is known for its open-source strategy and extreme inference efficiency. Its MoE (Mixture of Experts) architecture allows the model to significantly reduce inference costs while maintaining high performance. The core idea of MoE architecture is that the model contains multiple "expert" sub-networks, and only a small subset of experts is activated for each inference task rather than engaging all parameters — this significantly reduces computational consumption per inference without sacrificing model capacity. V4 Pro, as its latest flagship version, excels in code generation, mathematical reasoning, and other tasks.

DeepSeek V4 Pro is currently running a 75% off promotion:

Input: ~3 RMB per million tokens
Output: ~6 RMB per million tokens (original price 24 RMB)

DeepSeek V4 Pro Pricing

The content creator noted that a 12 RMB top-up can cover approximately two medium-complexity projects, with each conversation costing just a few cents (RMB). However, be aware that prices may increase after the promotional period ends. Since DeepSeek operates domestically in China, users can call the official API directly without proxies, resulting in lower latency and better stability — a key reason for its rapid adoption among Chinese developers.

Real-World Cost Reference

Based on the content creator's long-term usage experience, the approximate cost per conversation for each model in Claude Code is:

Model	Cost Per Conversation	Notes
Qwen Coder	15-20 RMB	Extremely expensive, pay-per-use not recommended
Sonnet (via proxy)	1-2 RMB	Stable performance
DeepSeek V4 Pro	0.1-0.5 RMB	Current promotional period, best value

It's important to note that a "single conversation" in Claude Code is not a simple question-and-answer exchange. A complete programming conversation may involve multiple rounds of interaction: the model first reads the project structure, then analyzes relevant code files, generates a modification plan, and finally outputs complete code — each step generates token consumption that adds up to a considerable amount.

Money-Saving Strategy: Coding Plan vs Pay-Per-Use

For high-frequency users, prioritize a Coding Plan (programming subscription). A Coding Plan is essentially a "high-quota monthly subscription" pricing strategy, similar to a mobile data plan. For heavy users of AI programming tools, the pay-as-you-go model makes monthly costs highly unpredictable since every code generation, refactoring, and debugging session produces massive token consumption. A subscription locks costs at a fixed amount, allowing users to work freely without constantly worrying about their balance.

Specific plan references:

Qwen offers a 200 RMB/month Coding Plan with 6,000 calls per 5-hour window, with quota automatically refreshing after use
Compared to pay-per-use (37 RMB for two conversations), the monthly plan is clearly more economical

Using Qwen as an example, if you use pay-per-use 2-3 times daily, monthly costs could exceed 1,000 RMB — the subscription's cost advantage is obvious. The key to choosing between subscription and pay-per-use is evaluating your usage frequency — if you use Claude Code more than 10 times per week, a subscription is almost certainly more cost-effective.

Configuration Tutorial: How to Connect Third-Party Models in Claude Code

Environment Variable Configuration

No need to configure within the Claude Code interface — set it directly in system environment variables. Environment variables are OS-level key-value pair configurations that applications can read at runtime to obtain settings. Claude Code reads the ANTHROPIC_API_KEY and ANTHROPIC_BASE_URL environment variables to determine API authentication credentials and service endpoint addresses. This design follows the "Twelve-Factor App" configuration management principle — separating configuration from code so users can switch between different model providers without modifying the application itself.

# Set API Token
export ANTHROPIC_API_KEY=your_token

# Set Base URL (provided by proxy service)
export ANTHROPIC_BASE_URL=proxy_service_url

Environment Variable Configuration

Windows Users Note: The export commands above are for Linux/macOS bash/zsh terminals. Windows users need to use the setx command (e.g., setx ANTHROPIC_API_KEY "your_token") or manually add variables through "System Properties → Advanced → Environment Variables" panel. You'll need to restart your terminal after making changes.

Choosing Between Proxy and Direct Connection

Risks of Direct Connection:

OpenAI/Anthropic strictly block Chinese IP addresses
Even with identity verification completed, IP changes can trigger account bans
The content creator mentioned that 4 accounts purchased through overseas friends were all banned

Pros and Cons of Proxy Services:

Proxy services are essentially API relay providers. They deploy servers overseas with legitimate Anthropic or OpenAI accounts, then forward API requests from domestic users to official service endpoints. Users simply point their API Base URL to the proxy server address to indirectly access official models. Proxy services profit by purchasing API quotas in bulk at discounts and reselling to end users at a markup.

Pros: Stable and reliable, no ban risk
Cons: Slightly slower (due to the additional network hop), need to vet reliable providers
Privacy Warning: All request content passes through the proxy's servers, so exercise extra caution with projects involving sensitive code or trade secrets
Recommendation: Available on Taobao, but choose vendors with good reputations

DeepSeek V4 Pro Configuration Example

If you choose DeepSeek (a domestic provider), configuration is relatively simple — use the official API directly without any proxy:

export ANTHROPIC_API_KEY=your_DeepSeek_API_Key
export ANTHROPIC_BASE_URL=DeepSeek_official_API_address

Model name: deepseek-v4-pro

Since DeepSeek's API servers are deployed domestically in China, latency for Chinese users is typically 50-200ms — far lower than accessing overseas models through proxies (usually 300-800ms). This difference is very noticeable in programming scenarios that require frequent interaction.

Programming Performance Evaluation

The content creator gave a clear ranking of each model's programming capabilities:

DeepSeek V4 Pro > Sonnet ≈ GPT series > MiniMax > Others

He specifically emphasized that DeepSeek V4 Pro excels in code generation accuracy and logical coherence. Combined with its current promotional pricing, it's the top configuration choice for Claude Code users.

It's worth adding that model programming capability evaluations are often closely tied to specific task types. For common tasks like frontend development and script writing, the gap between mainstream models may not be significant. However, in high-difficulty scenarios involving complex algorithm design, large-scale project refactoring, and multi-file coordinated modifications, the differences between models truly emerge. The creator's ranking primarily reflects the overall experience in daily medium-complexity project development — readers should conduct their own testing based on their tech stack and project characteristics.

Summary and Recommendations

Beginners: Start with DeepSeek V4 Pro — low cost, great results
Sufficient Budget: Consider Sonnet via proxy — strongest overall capabilities
High-Frequency Users: Always choose a Coding Plan subscription to avoid "sky-high bills" from pay-per-use
Avoid Pitfalls: Don't attempt direct connections to overseas APIs — ban probability is extremely high
Cost Monitoring: Enable usage alerts on your API platform, set daily or monthly spending caps to avoid unexpected charges from forgotten sessions or code stuck in loop calls

Claude Code Model Configuration & Cost Comparison: A Practical Money-Saving Guide

Introduction: The Hidden Costs of AI Programming Tools

Mainstream Model Price Comparison

Anthropic Official Model Pricing

Lessons Learned from Domestic Models

DeepSeek V4 Pro: Currently the Best Value

Real-World Cost Reference

Money-Saving Strategy: Coding Plan vs Pay-Per-Use

Configuration Tutorial: How to Connect Third-Party Models in Claude Code

Environment Variable Configuration

Choosing Between Proxy and Direct Connection

DeepSeek V4 Pro Configuration Example

Programming Performance Evaluation

Summary and Recommendations

Related articles

Cursor + Codex Dual-IDE Collaboration: A Practical Methodology for Open-Source Project Customization

Cursor Multi-Agent in Practice: Building a Full-Stack Next.js Blog in 50 Minutes

Building an AI Software Factory from Scratch: A Cursor Engineer's Hands-On Experience with Multi-Agent Collaboration