MiniMax M3 Hands-On Comparison with GPT, Claude, and Gemini: Who Wins Across Five Real-World Tasks?

MiniMax M3 tested against GPT, Claude, and Gemini across five real-world tasks with surprising results.
A Bilibili creator tested MiniMax M3 against Claude, GPT Codex, and Gemini across five real tasks: web page generation, coding, Apple earnings analysis, video understanding, and Computer Use. M3 decisively beat Gemini 3.1 Pro in document analysis by catching hidden data errors, showed unique native video understanding, and matched top models in code generation via its Agent Team feature — all at a fraction of the price.
Introduction: Can a Chinese AI Model Compete with the Top Tier?
When the same Apple earnings report image was handed to two different AI models, one discovered a hidden data error in the chart and proactively flagged it, while the other copied the error straight into its spreadsheet. The one that caught the mistake? MiniMax M3, a newly released Chinese AI model.
MiniMax M3 highlights three key strengths: native multimodal capabilities built on context, cutting-edge code generation, and positioning that directly targets GPT and Claude. To verify its real-world performance, Bilibili creator "Digital Fun" (数码趣) designed five practical, everyday tasks and used identical prompts to pit M3 against the "Big Three" — Claude, GPT, and Gemini — one by one.
Web Page Generation: M3 vs Claude — Two Design Philosophies
The first test was web page generation, one of M3's officially promoted strengths. The task: invent a fictional brand called "DXU" and, referencing Nike's official website layout, color scheme, typography hierarchy, and hover interactions, generate a webpage that opens directly in a browser.
The opponent was Claude's latest Opus 4.8 capability (Max tier). In terms of time, Claude took around ten minutes, while M3 needed about half an hour.

Viewed individually, both outputs had impressive levels of completeness, and both implemented animations. But comparing them side by side revealed clear differences in approach:
- M3: Generated a longer page with more complete content and even auto-sourced images — it tends to deliver the most comprehensive version possible even from a one-line prompt
- Claude: Produced a shorter, cleaner, more streamlined page — strictly following the requirements without unnecessary additions
The verdict for this round: both delivered solid aesthetics, but reflected fundamentally different design philosophies. With enough iteration time, either could produce excellent results.
Coding & Development: M3 Agent Team vs OpenAI Codex
The second test chose a task nearly impossible to complete in one shot: build a video editing tool that can recognize speech in talking-head videos, automatically cut out repeated takes and silent segments, keep only the clean content and stitch it back together, with preview support.
The opponent was OpenAI's Codex, running a GPT 5.5-level model. M3's side used its official Agent Team tool for execution.

OpenAI Codex's Performance
In about 20 minutes, it delivered a working HTML application with a preview interface that could call a local speech recognition model to transcribe text. However, there were some issues with the editing interactions. Overall, it was a complete framework that needed polish on the details.
M3 + Agent Team's Performance
A single instruction set up the development team, splitting the work across five Agents responsible for audio transcription, content analysis, video editing, UI, and integration. The entire development process took nearly three hours and ultimately delivered a solution combining Python + speech recognition + FFmpeg + a desktop GUI.
The most impressive aspect was M3's collaborative process — a unified coordinator directed multiple Agents that passed data to each other and took turns executing their respective tasks. This Agenting logic feels more suited for long-term, large-scale projects, while Codex prioritizes "get it working" speed.
AI Agent Collaboration: A Clear Trend
Agenting has become a definitive trend. It breaks complex projects into parallel workflows where different roles collaborate, can run continuously for hours or even days, and handles complexity levels that a single Agent simply can't manage in one pass. This is a direction every AI user should be paying attention to.
Earnings Report Analysis: M3 Decisively Beats Gemini 3.1 Pro
This was the most convincing round of the entire test. The material was a visualization of Apple's Q2 2026 earnings report (generated by an Image2 model, which was later discovered to contain a hidden data error). Both models were asked to identify all information in the chart and organize it into an Excel spreadsheet.
The opponent was Gemini 3.1 Pro — the model the creator had used most over the past six months.

Gemini 3.1 Pro's Performance
It finished quickly with a neat, fully populated table — but essentially just "copied" the chart without performing any data validation.
MiniMax M3's Performance
It took nearly 20 minutes and chose not to deliver immediately. Instead, it spent extra time cross-checking numbers, flagged suspicious data points, deliberated repeatedly, and ultimately marked the chart's error explicitly in its output, filling in the correct value through logical analysis.
The differences went beyond accuracy:
- M3 produced 7 sheets; Gemini produced 4
- M3 additionally calculated year-over-year changes, profit margins, and breakdowns by product and region
- M3 essentially re-audited the entire report based on practical analytical needs
For further verification, the creator had Claude act as a judge using Apple's original earnings data. The final conclusion: M3's version was the clear recommendation. In multimodal document understanding and structured data conversion scenarios, M3 decisively outperformed Gemini 3.1 Pro.
Video Understanding: M3's Unique Native Multimodal Capability
This test showcased an ability unique to M3 — directly understanding long-form video content. It was given a 12-minute NVIDIA keynote video and asked to watch it, then generate an HTML-formatted news briefing with accompanying images.
The key point: M3 actually "watches" the video itself rather than relying on speech-to-text transcription.

The final output was a fully formatted news page that distilled the keynote highlights into a briefing with images. After manual verification, the content extraction was accurate and the briefing structure was objective and clear. The creator called this "the most impressive segment of the entire test besides the earnings analysis."
Computer Use: An Unfinished Challenge for the Entire Industry
The final test evaluated M3's new Computer Use feature, tasking it with directly operating a computer to navigate to a webpage and read a Xiaohongshu (RED) account's follower count. M3 did accomplish the task, but the process was slow and the experience is still far from polished.
However, this isn't a problem unique to M3 — Codex and Claude's Computer Use features are similarly in the refinement stage. This remains a shared challenge across the entire industry.
MiniMax M3 Pricing: A Clear Cost Advantage
Based on official pricing, M3 costs roughly 20% of Claude Sonnet and just over 10% of Opus. The API currently offers a 50% discount, bringing the price to 2.1 RMB per million input tokens and 8.4 RMB per million output tokens.
On the subscription side, the lowest tier starts at 49 RMB/month and includes not just M3's text capabilities but also quotas for image, voice, and music models. Higher tiers even include daily video generation. While going toe-to-toe with the Big Three in capability, M3's pricing is remarkably accessible.
Conclusion: MiniMax M3 Is Ready to Compete with the Top Tier
MiniMax M3 doesn't comprehensively surpass the Big Three — the company itself acknowledges a gap with Claude's strongest model on the most challenging autonomous research tasks. But based on these five real-world tests, the following conclusions are clear:
- Multimodal Document Understanding: M3 demonstrated greater rigor than Gemini 3.1 Pro, proactively catching data errors
- Code Generation: Competitive with Claude and Codex in different ways — Agent Team mode is well-suited for complex projects
- Native Video Understanding: A current standout feature — directly watching video rather than relying on speech transcription
- Price Advantage: API and subscription pricing far below overseas competitors, ideal for heavy AI users
Can a Chinese AI model compete at the top tier? This test gives a definitive yes. For users with limited budgets who still need high-quality AI output, MiniMax M3 is a choice worth serious consideration.
Related articles

Trae + WPS: Building a Zero-Code JSA Login Authorization System — A Practical Tutorial
Learn how to use Trae AI programming tool with WPS Bitable to build a JSA login authorization system with zero handwritten code, covering online tables, Web API auth scripts, and remote user management.

Superpowers: Installing Work Standards for Your AI Coding Assistant
How the Superpowers methodology constrains AI coding assistants through requirement clarification, task decomposition, TDD, and verification loops — with setup tips for Trae.

Scientific Achievements Deserve Public Applause: Why We Should Give Standing Ovations for Scientific Breakthroughs
Scientific achievements receive far less public attention than entertainment and sports. This article explores why we should applaud scientific breakthroughs and how AI is leading a new celebration culture.