Microsoft Copilot Cowork Introduces DeepSeek: Multi-Model Architecture and Enterprise AI Agent Strategy Explained

Microsoft launches Copilot Cowork with multi-model architecture, potentially integrating DeepSeek V4 for cost efficiency.
Microsoft's Copilot Cowork is now globally available as an agentic AI platform for complex enterprise tasks. It features a multi-model architecture routing tasks to OpenAI, Anthropic, DeepSeek, or Microsoft's own Cowork 1 model based on complexity and cost. With credit-based usage pricing, the new WebIQ real-time search system, and deep Azure integration, Microsoft is building a complete enterprise AI agent platform while navigating complex geopolitical dynamics between U.S. and Chinese AI ecosystems.
Copilot Cowork Officially Launches: DeepSeek May Become a Low-Cost Option
Microsoft recently announced the global launch of its agentic Copilot product — Copilot Cowork. Unlike the standard Copilot chat assistant, this product is designed to handle complex tasks: the AI can take on an entire job, break it down into multiple steps, leverage company data, invoke tools, collaborate across files, and continue running in the cloud until it delivers the final result.
It's important to understand the fundamental difference between AI agents and traditional chat assistants. Traditional AI chat assistants operate on a "request-response" model: the user inputs a question, the model returns an answer, and the interaction ends. AI agents, on the other hand, are capable of autonomous planning, tool invocation, and continuous execution. They can decompose a complex goal into multiple subtasks, independently determine the execution order, call external tools during the process (such as database queries, API calls, and file read/write operations), and dynamically adjust strategies based on intermediate results. This architecture means that completing a single task may involve dozens or even hundreds of model inference calls — which is the fundamental reason why its compute costs far exceed those of ordinary chat interactions.
Even more noteworthy, according to Axios, Microsoft is considering adopting a fine-tuned DeepSeek V4 model as a low-cost alternative for Cowork. DeepSeek is a large language model series developed by the Chinese company DeepSeek, which rose to prominence in late 2024 to early 2025 with its DeepSeek-R1 reasoning model. Its core innovation lies in the Mixture of Experts (MoE) architecture, which activates only a subset of parameters during inference, thereby dramatically reducing compute costs while maintaining high performance. DeepSeek V4, as its latest-generation foundation model, continues this high cost-efficiency approach. The term "fine-tuned DeepSeek V4" refers to Microsoft further training the original model using specific enterprise task data to better adapt it to Copilot Cowork's work scenarios — a practice known in the industry as domain-specific fine-tuning, and a common strategy for enterprises deploying open-source or third-party models. This means one of the world's largest Western tech giants may be incorporating a Chinese AI model into its core enterprise product line.
Microsoft states that more than half of the Fortune Global 500 companies used Cowork during the preview period. Real-world use cases include: engineering teams batch-editing work orders and automatically generating dependency diagrams, comparing nearly 4,000 cross-version files (work that would have previously taken weeks of manual effort), and analyzing stalled sales pipelines to generate churn risk ranking tables.

Usage-Based Pricing: The Cost Dilemma and Solution for AI Agents
Why Must Agentic AI Shift to Usage-Based Pricing?
Agentic AI is fundamentally different from ordinary chat interactions. It's no longer about sending one message and getting one answer — it repeatedly calls models to retrieve context, use tools, search files, generate outputs, check results, and run for extended periods. Charles Lamanna, Microsoft's Executive Vice President of Platform, stated bluntly: testing showed that Cowork could no longer sustain unlimited usage, as some users were running hundreds of tasks per week, making compute costs extremely high.
This reveals the core contradiction in today's AI agent economy: the better the tool, the more frequently users use it; but each task involves multiple model calls, tool invocations, and retrieval steps, causing costs to skyrocket.
Copilot Cowork Pricing System Explained
Cowork uses a credit-based usage pricing model at one cent per credit, with costs determined by four dimensions: model calls, context retrieval, tool invocations, and runtime. This credit-based pricing is a landmark move in the transition of AI agent products from subscription to consumption-based models. Traditional SaaS subscription models (such as Microsoft 365's fixed per-user monthly fee) assume relatively uniform resource consumption across users, but AI agents break this assumption — a high-frequency user may consume over a hundred times more compute than a low-frequency user. The essence of the credit system is to transparently pass underlying GPU compute costs through to users, enabling enterprises to manage AI spending with the same granularity as cloud computing resources.
Tasks are divided into three tiers:
- Lightweight: Minimal knowledge base calls, simple reasoning logic, single output
- Medium: Multi-source data calls, structured reasoning, more outputs
- Heavyweight: Extensive data aggregation and deep reasoning, large volumes of output
Microsoft also segments users into four categories (enterprise knowledge workers, management, frontline business staff, and technical personnel) to help organizations estimate costs based on headcount, frequency, and task types. This classification is also giving rise to new enterprise IT governance needs: budget allocation, usage monitoring, and cost attribution are becoming indispensable management dimensions in enterprise AI deployment.
Multi-Model Architecture Explained: DeepSeek's Strategic Positioning
Not Replacing OpenAI, but Task-Based Tiered Routing
To be clear, Microsoft's introduction of DeepSeek is not about replacing OpenAI. The reality is that Microsoft is transitioning Copilot to a multi-model architecture, calling different models for different tasks. Multi-model architecture is the mainstream trend in enterprise AI deployment in 2025, built on the core principle that "no single model fits all tasks." Different AI models exhibit significant differences in reasoning capability, response speed, context window length, and cost. Through an intelligent routing layer that automatically assesses task complexity and dispatches requests to the most suitable model, enterprises can achieve optimal performance-cost balance at the system level while reducing dependency risk on any single model provider.
Specifically, Cowork's model tiering strategy looks like this:
- Frontier exploration tasks: Anthropic Opus 4.8
- Routine processing tasks: Anthropic Sonnet 4.6
- Low-cost everyday tasks: GPT 5.5 or the in-house Cowork 1
- Extremely cost-sensitive tasks: DeepSeek on Azure

Microsoft also stated that even if DeepSeek is selected, the feature will not be enabled by default — it will be offered as an opt-in option for customers. The model will be fully hosted on Azure, customer data will remain within Microsoft's cloud, and it will be covered by Azure's enterprise-grade security and compliance protections.
Cowork 1: Microsoft's In-House Low-Cost AI Model
Beyond external models, Microsoft is also set to release Cowork 1 — a proprietary, safety-tuned model designed specifically for handling Cowork tasks. Microsoft claims that through post-training, it can dramatically reduce task processing costs, making it an excellent choice for everyday Copilot tasks, especially in cost-sensitive business scenarios. "Post-training" refers to the process of optimizing a model's performance in specific scenarios after pre-training is complete, through instruction tuning, Reinforcement Learning from Human Feedback (RLHF), or further training on task-specific data. The advantage of Microsoft's in-house model is that it can be deeply optimized for Cowork's task patterns — for example, reducing unnecessary reasoning steps and compressing output length — thereby significantly lowering token consumption per call while maintaining task completion quality.
Microsoft's AI Strategy in China: Playing the Bridge Between East and West
According to Bloomberg, Microsoft has built a massive business selling AI models to Chinese companies. ByteDance is Microsoft's largest AI customer in China, on track to spend over $1 billion annually on Microsoft's AI cloud services. Ant Group, Meituan, and Tencent are also major customers of Azure AI models.

Judson Althoff, then Microsoft's Chief Commercial Officer, revealed at an internal sales meeting that Azure AI revenue growth in China was faster than in any other market, tripling during the fiscal year after quadrupling the year before. One of his statements captured the essence of Microsoft's strategy: "The world's most cutting-edge AI solutions are being born on the West Coast of the United States and the East Coast of China, and Microsoft is playing the role of the bridge connecting the two."
Intellectual Property Disputes and Gray Areas
This bridge role also brings controversy. Microsoft's AI business in China operates in an extremely complex geopolitical environment. Since 2022, the U.S. government has continuously tightened AI chip export controls to China, restricting companies like NVIDIA from selling high-end GPUs to Chinese buyers. However, cloud-based AI model services (i.e., accessed via API calls rather than local deployment) have not yet been explicitly included in export control regulations, creating a policy gray area for Microsoft to provide AI model services to Chinese customers through Azure.
Reportedly, OpenAI has privately expressed dissatisfaction to Microsoft, arguing that Microsoft is not doing enough to prevent Chinese companies from using its models to replicate or optimize their own models. While Microsoft employs automated monitoring to prevent customers from using AI models to develop competing products, the line between normal usage and model optimization is often blurry. This tension reflects a deeper conflict: model developers want to protect intellectual property, while cloud service providers tend to maximize customer reach.
A notable detail: these Chinese companies are not merely customers — ByteDance has launched Doubao, and Ant Group is independently developing AI models, with their core products not relying on external models. This kind of "co-opetition" is not uncommon in tech history, but it's particularly sensitive against the backdrop of the current U.S.-China tech decoupling.
WebIQ: A Real-Time Search System Built for AI Agents
Microsoft also launched WebIQ — a new real-time grounding system based on Bing, built specifically for AI agents. Rather than presenting search results for humans, it provides agents with the latest web information needed for reasoning verification.

The launch of WebIQ represents a major evolution in Retrieval-Augmented Generation (RAG) technology — from human-facing search to AI agent-facing search. Traditional RAG systems typically perform a single retrieval: converting the user's question into a query, extracting relevant passages from a knowledge base, and then feeding them to the model for answer generation. But agents search in fundamentally different ways from humans: they issue multiple queries, retrieve passages, compare sources, conduct divergent searches across multiple topics, and then synthesize the results. This requires the underlying search system to have extremely low latency (since each task may trigger hundreds of search calls), highly structured return results (designed for model parsing rather than human reading), and real-time freshness (ensuring information timeliness).
Microsoft claims WebIQ is more than 2.5x faster than the best comparable solutions. This speed advantage is amplified by orders of magnitude in agent scenarios — when a complex task requires hundreds of search calls, the millisecond-level latency savings on each call accumulate into minute-level efficiency gains, which is critical for the usability of enterprise-grade agent workflows.
The Big Picture: Microsoft's Complete Enterprise AI Agent Platform Architecture
Taken together, Microsoft's ambition extends far beyond a chat assistant. It is building a complete enterprise-grade AI agent platform:
- Model Layer: Multi-model architecture, matching the optimal model to each task
- Search Layer: WebIQ providing real-time web grounding
- Enterprise Data Layer: Integration with the Microsoft 365 ecosystem
- Security & Governance Layer: Permission management, budget controls, audit logs
- Billing System: Usage-based pricing with credit-based settlement
- Cloud Runtime Environment: Fully managed on Azure
The design philosophy behind this architecture is worth noting: it essentially treats AI agents as a new type of cloud computing resource to be managed. Just as enterprises shifted from buying servers to buying cloud computing resources a decade ago, they are now shifting from buying software licenses to buying AI compute. The multi-model strategy at the model layer corresponds to instance type selection in cloud computing (choosing compute resources of different performance and price tiers on demand), the credit-based pricing corresponds to cloud computing's pay-per-use model, and the security governance layer corresponds to a cloud platform's IAM (Identity and Access Management) framework.
DeepSeek's integration into Copilot isn't just about saving money — it signals that the AI market is becoming more pragmatic, more inclined toward multi-model coexistence, and geopolitically more complex than ever. Microsoft finds itself in a delicate position: on one hand, selling Western AI models to Chinese companies, and on the other, considering incorporating a Chinese AI model into products aimed at Western enterprises. This bidirectional flow is a microcosm of the current global AI industry landscape.
Related articles

Local AI Coding Real-World Test: Can It Replace Cloud Models? A True Codebase Comparison
Real-world testing of local AI coding models Qwen 3 Coder Next and Qwen 3.6 on Excalidraw and Warp terminal codebases, comparing against cloud Opus for compliance-restricted scenarios.

Testing Claude Code for WordPress Publishing: Does AI Cut Corners on Batch Writing?
Real-world test of Claude Code publishing to WordPress in batch. Discover how AI quality drops after the first article, plus tips for quality control and automation.

Auto-Editing Videos with Claude Code: A Complete Hands-On Guide from Raw Footage to Final Cut
Learn how to use Claude Code with the open-source VideoIn project for automated video editing — from audio extraction and subtitle generation to transitions and final output.