Microsoft Build 2026: In-Depth Analysis of the In-House Reasoning Model MAI Thinking-E and the Full AI Product Suite

Microsoft launches its first in-house reasoning model MAI Thinking-E and a full AI model suite at Build 2026.
At Build 2026, Microsoft unveiled MAI Thinking-E, its first proprietary reasoning model with 1T parameters and MoE architecture, excelling in math and science benchmarks. Alongside it, Microsoft released 6 vertical models for image, voice, and transcription, plus MAI Code 1 for GitHub Copilot. The article also covers OpenAI's major service outage, Qwen's open ecosystem strategy, and Gemini Agent's desktop client launch.
Microsoft Build 2026: First In-House Reasoning Model MAI Thinking-E Officially Unveiled
Microsoft Build 2026, the developer conference held on June 4, saw the release of multiple new AI products and technologies. The most notable announcement was Microsoft's first in-house advanced reasoning model, MAI Thinking-E (MAI Thinking-1 series). This marks a pivotal shift—Microsoft is no longer relying solely on OpenAI's model capabilities and is now making a serious push into proprietary large model development.
According to official disclosures, MAI Thinking-E has approximately 1T (trillion) total parameters with roughly 35B active parameters, built on a MoE (Mixture of Experts) architecture. MoE is one of the mainstream technical approaches in the large model space today. Its core idea is to split the model into multiple "expert" sub-networks, activating only a subset of experts to process each input during inference rather than engaging all parameters simultaneously. This explains why MAI Thinking-E has a total parameter count of 1T but only about 35B active parameters—during actual inference, the model uses a Gating Network to dynamically select the most relevant expert modules, maintaining the knowledge capacity of a massive model while significantly reducing computational costs. Well-known models like Google's Switch Transformer, Mixtral, and DeepSeek V3 all employ similar architectures, and this design makes the commercial deployment of ultra-large-scale models feasible.
Microsoft specifically emphasized that the model was not distilled from any third-party models and was trained entirely on clean training data. This statement carries special significance in the current industry context. Knowledge Distillation refers to using the outputs of a large "teacher model" to train a smaller "student model," enabling the latter to achieve near-equivalent capabilities at lower cost. Previously, multiple companies in the industry were found to be using outputs from leading models like OpenAI's as training data to boost their own model performance, sparking widespread controversy around intellectual property and technological independence. Microsoft's explicit declaration of "clean training" is both a demonstration of its technical prowess and a clear boundary drawn on commercial compliance, proving that its model capabilities are entirely the result of in-house R&D and showcasing Microsoft's independent capabilities in foundational AI research.

Based on benchmark data, MAI Thinking-E shows a notably uneven performance profile: it only matches DeepSeek V3.2's level on two coding benchmarks, but performs impressively in math and science—reasoning-intensive domains—demonstrating strong logical reasoning capabilities. It's worth noting that MAI Thinking-E, as a "Reasoning Model," is fundamentally different from traditional large language models. Traditional language models primarily generate text by predicting the next token, while reasoning models build on this with a Chain-of-Thought mechanism, enabling multi-step logical reasoning before generating a final answer. OpenAI's o1/o3 series and DeepSeek R1 fall into this category. Reasoning models typically outperform traditional models significantly on tasks requiring multi-step logical deduction, such as mathematical proofs and scientific analysis, but at the cost of higher inference latency and greater computational expense. MAI Thinking-E's strong performance in math and science but relative weakness in coding also reflects an industry-wide challenge: reasoning capabilities don't transfer uniformly across domains. Overall, the model is closing in on Anthropic's Claude Sonnet series—there's still a gap, but as a "first step" for Microsoft's in-house models, it's an impressive achievement.
Microsoft's Full AI Suite: 6 Vertical Models Covering Image, Voice, and Transcription
Beyond the flagship reasoning model, Microsoft also released 6 vertical domain models in one sweep, covering image generation, voice synthesis, transcription, and more:
- MAI Image 2.5 and Flash version: Image generation models
- MAI Voice 2 and Flash version: Voice generation models
- MAI Transcribe 1.5: Language transcription model
According to Microsoft, these vertical models have each achieved top-three performance in their respective domains. Users can try them out on the Microsoft Foundry platform.

Additionally, Microsoft launched the coding model MAI Code 1, emphasizing high speed and stability, positioned against competitors like Codestral 4.5. This model will be integrated into GitHub Copilot and VS Code, directly serving developers' daily coding workflows. GitHub Copilot is one of Microsoft's most successful AI commercial products—since its launch in 2021, it has accumulated over a million paying users. It embeds into developer workflows as a VS Code plugin, offering code completion, code generation, and code explanation features. Previously, Copilot relied primarily on OpenAI's Codex and GPT series models, and the launch of MAI Code 1 signals that Microsoft is beginning to inject its own model capabilities into this core product. Its competitor, Codestral 4.5, is a dedicated coding model from Mistral AI known for strong code generation speed and accuracy. By integrating its in-house coding model directly into Copilot and VS Code—products with massive user bases—Microsoft's "model-as-a-service" distribution strategy can rapidly capture real user feedback and iterate, creating a data flywheel effect. This move signifies that Microsoft is building a complete AI technology stack from foundational models to the application layer.
OpenAI's Major Service Outage and Codex Feature Updates
Just before the Microsoft conference, OpenAI experienced a severe service incident. From the evening of June 3 to the morning of June 4, ChatGPT, Codex, and API services all suffered large-scale outages and errors, with some users completely unable to access services for hours. Even more surprisingly, the GPT-MH2 model disappeared entirely from Codex. OpenAI's team intervened urgently, and most services weren't restored until around 4 PM.
This large-scale service outage is not an isolated incident. As ChatGPT's monthly active users surpass hundreds of millions and API call volumes continue to climb, OpenAI's infrastructure faces unprecedented pressure. Its service architecture is heavily dependent on GPU compute clusters provided by Microsoft's Azure cloud platform, meaning any underlying hardware failure, network fluctuation, or software update can trigger a cascade of issues. The disappearance of the GPT-MH2 model from Codex suggests potential model deployment or version management problems rather than simple traffic overload. This also illustrates from another angle why Microsoft is developing its own models—the risks of over-reliance on a single model provider are becoming increasingly apparent.

That said, OpenAI also rolled out several important Codex updates during the same period:
6 Domain Plugins
Covering sales, stocks, banking, design, creative, and data analysis domains, each plugin integrates multiple applications and skills for an out-of-the-box experience.
Annotations Feature
Users can use annotations on AI-generated content to make targeted edits or ask questions, dramatically improving the precision of human-AI collaboration. The design philosophy behind this feature is similar to the annotation and comment mechanisms in document collaboration tools, but applied to AI-generated content. It enables users to provide precise feedback and adjustment instructions for specific paragraphs, sentences, or even individual words, rather than re-describing the entire requirement—significantly reducing the communication cost of iterative revisions.

Sites: Interactive Website Design
This feature allows users to transform ideas into hosted interactive websites or applications, accessible and shareable via URL. These features will be rolled out first to Business and Enterprise users.
Qwen App's Open Ecosystem and Gemini Agent Desktop Client
On the domestic front, the Qwen App announced full openness to third-party Agents and Skills. All enterprises can now integrate Skills, and in the future, they'll be able to operate enterprise-exclusive Agents on the Qwen platform. This initiative essentially replicates the "app store" model from the mobile internet era. In the AI Agent context, an Agent refers to an intelligent entity capable of autonomous planning, tool invocation, and complex task completion, while Skills are the specific capability modules an Agent can call upon (such as querying flights, booking hotels, analyzing data, etc.). Qwen's open strategy is similar in concept to OpenAI's GPTs Store and ByteDance's Coze platform—all attempting to become the "super entry point" of the AI era, where users can access various third-party services through a single unified AI assistant without switching between multiple apps. The core competitive advantage of this platform strategy lies in user scale and ecosystem richness, with first-mover advantage and network effects being the decisive factors. The first batch of enterprises has already begun offering services, and the Agent and Skill integration platform will go live soon. Qwen aims to create a "universal AI assistant" for users through its open ecosystem.
Meanwhile, Gemini Agent (under Google) officially released a native desktop client. The overall frontend design leans toward a minimalist style, doesn't yet support Chinese, and will be available for macOS, Windows, and Linux. Users can download and install it from the official page. The launch of a native desktop client for Gemini Agent is noteworthy—compared to web-based interfaces, a desktop client can integrate more deeply with the operating system, enabling file management, application invocation, system-level shortcuts, and other features. This is crucial for AI Agents executing complex cross-application tasks and signals that AI assistants are evolving from "chat windows" to "OS-level companions."
Industry Observation: From Single-Model Competition to Full-Stack Capability Competition
Microsoft's concentrated release of its in-house model lineup sends a clear signal: even while maintaining a deep partnership with OpenAI, Microsoft is actively building its own AI model capabilities. While MAI Thinking-E still has room for improvement in coding, its performance in reasoning and scientific computation proves Microsoft's accumulated technical expertise.
From a broader perspective, the AI industry is shifting from "single-model competition" to "full-stack capability competition." Microsoft is simultaneously investing in reasoning models, vertical models, and coding models, distributing them through developer tools like GitHub Copilot and VS Code. This "model + platform + tools" trinity strategy may hold more commercial value than simply chasing benchmark scores. This trend already has precedents in the industry: Google forms a closed loop through Gemini models + Android/Chrome ecosystem + Google Cloud, while Meta builds differentiated advantages through Llama open-source models + social platforms + advertising systems. What makes Microsoft unique is its ownership of the world's largest enterprise software ecosystem (Office 365, Azure, GitHub, LinkedIn, etc.), meaning its AI models have access to massive enterprise application scenarios and distribution channels from day one. This "ecosystem moat" is a competitive barrier that pure-play model companies find extremely difficult to replicate.
Related articles

Deep Dive into Claude Sonnet 4: Replicating Lovable with Just Two Prompts
Deep dive into Claude Sonnet 4: replicate Lovable with two prompts, generate McKinsey-grade reports, build 2D games, and explore the AI Agent building block economy.

Replit's Domain-Specific Agents: One-Click Batch Fixes for SEO and Security Vulnerabilities
Deep dive into Replit's domain-specific AI Agents: Growth Agent for SEO issues and Security Agent for vulnerability detection, with select-all one-click batch fixing.

APImart Review: One-Stop Low-Cost Access to GPT, Claude, and Other Leading AI Models
Hands-on review of APImart, an API aggregation platform supporting GPT-4o, Claude, Veo and more. GPT image generation from $0.006/image. Full walkthrough, results, pricing, and risk analysis.