MiniMax M3 Hands-On Comparison with GPT, Claude, and Gemini: Who Wins Across Five Real-World Tasks?

Introduction: Can a Chinese AI Model Compete with the Top Tier?

When the same Apple earnings report image was handed to two different AI models, one discovered a hidden data error in the chart and proactively flagged it, while the other copied the error straight into its spreadsheet. The one that caught the mistake? MiniMax M3, a newly released Chinese AI model.

MiniMax M3 highlights three key strengths: native multimodal capabilities built on context, cutting-edge code generation, and positioning that directly targets GPT and Claude. To verify its real-world performance, Bilibili creator "Digital Fun" (数码趣) designed five practical, everyday tasks and used identical prompts to pit M3 against the "Big Three" — Claude, GPT, and Gemini — one by one.

Web Page Generation: M3 vs Claude — Two Design Philosophies

The first test was web page generation, one of M3's officially promoted strengths. The task: invent a fictional brand called "DXU" and, referencing Nike's official website layout, color scheme, typography hierarchy, and hover interactions, generate a webpage that opens directly in a browser.

The opponent was Claude's latest Opus 4.8 capability (Max tier). In terms of time, Claude took around ten minutes, while M3 needed about half an hour.

MiniMax M3 vs Claude web page generation comparison

Viewed individually, both outputs had impressive levels of completeness, and both implemented animations. But comparing them side by side revealed clear differences in approach:

M3: Generated a longer page with more complete content and even auto-sourced images — it tends to deliver the most comprehensive version possible even from a one-line prompt
Claude: Produced a shorter, cleaner, more streamlined page — strictly following the requirements without unnecessary additions

The verdict for this round: both delivered solid aesthetics, but reflected fundamentally different design philosophies. With enough iteration time, either could produce excellent results.

Coding & Development: M3 Agent Team vs OpenAI Codex

The second test chose a task nearly impossible to complete in one shot: build a video editing tool that can recognize speech in talking-head videos, automatically cut out repeated takes and silent segments, keep only the clean content and stitch it back together, with preview support.

The opponent was OpenAI's Codex, running a GPT 5.5-level model. M3's side used its official Agent Team tool for execution.

MiniMax M3 Agent Team collaborative development workflow

OpenAI Codex's Performance

In about 20 minutes, it delivered a working HTML application with a preview interface that could call a local speech recognition model to transcribe text. However, there were some issues with the editing interactions. Overall, it was a complete framework that needed polish on the details.

M3 + Agent Team's Performance

A single instruction set up the development team, splitting the work across five Agents responsible for audio transcription, content analysis, video editing, UI, and integration. The entire development process took nearly three hours and ultimately delivered a solution combining Python + speech recognition + FFmpeg + a desktop GUI.

The most impressive aspect was M3's collaborative process — a unified coordinator directed multiple Agents that passed data to each other and took turns executing their respective tasks. This Agenting logic feels more suited for long-term, large-scale projects, while Codex prioritizes "get it working" speed.

AI Agent Collaboration: A Clear Trend

Agenting has become a definitive trend. It breaks complex projects into parallel workflows where different roles collaborate, can run continuously for hours or even days, and handles complexity levels that a single Agent simply can't manage in one pass. This is a direction every AI user should be paying attention to.

Earnings Report Analysis: M3 Decisively Beats Gemini 3.1 Pro

This was the most convincing round of the entire test. The material was a visualization of Apple's Q2 2026 earnings report (generated by an Image2 model, which was later discovered to contain a hidden data error). Both models were asked to identify all information in the chart and organize it into an Excel spreadsheet.

The opponent was Gemini 3.1 Pro — the model the creator had used most over the past six months.

MiniMax M3 vs Gemini earnings analysis comparison

Gemini 3.1 Pro's Performance

It finished quickly with a neat, fully populated table — but essentially just "copied" the chart without performing any data validation.

MiniMax M3's Performance

It took nearly 20 minutes and chose not to deliver immediately. Instead, it spent extra time cross-checking numbers, flagged suspicious data points, deliberated repeatedly, and ultimately marked the chart's error explicitly in its output, filling in the correct value through logical analysis.

The differences went beyond accuracy:

M3 produced 7 sheets; Gemini produced 4
M3 additionally calculated year-over-year changes, profit margins, and breakdowns by product and region
M3 essentially re-audited the entire report based on practical analytical needs

For further verification, the creator had Claude act as a judge using Apple's original earnings data. The final conclusion: M3's version was the clear recommendation. In multimodal document understanding and structured data conversion scenarios, M3 decisively outperformed Gemini 3.1 Pro.

Video Understanding: M3's Unique Native Multimodal Capability

This test showcased an ability unique to M3 — directly understanding long-form video content. It was given a 12-minute NVIDIA keynote video and asked to watch it, then generate an HTML-formatted news briefing with accompanying images.

The key point: M3 actually "watches" the video itself rather than relying on speech-to-text transcription.

MiniMax M3 video understanding news briefing generation

The final output was a fully formatted news page that distilled the keynote highlights into a briefing with images. After manual verification, the content extraction was accurate and the briefing structure was objective and clear. The creator called this "the most impressive segment of the entire test besides the earnings analysis."

Computer Use: An Unfinished Challenge for the Entire Industry

The final test evaluated M3's new Computer Use feature, tasking it with directly operating a computer to navigate to a webpage and read a Xiaohongshu (RED) account's follower count. M3 did accomplish the task, but the process was slow and the experience is still far from polished.

However, this isn't a problem unique to M3 — Codex and Claude's Computer Use features are similarly in the refinement stage. This remains a shared challenge across the entire industry.

MiniMax M3 Pricing: A Clear Cost Advantage

Based on official pricing, M3 costs roughly 20% of Claude Sonnet and just over 10% of Opus. The API currently offers a 50% discount, bringing the price to 2.1 RMB per million input tokens and 8.4 RMB per million output tokens.

On the subscription side, the lowest tier starts at 49 RMB/month and includes not just M3's text capabilities but also quotas for image, voice, and music models. Higher tiers even include daily video generation. While going toe-to-toe with the Big Three in capability, M3's pricing is remarkably accessible.

Conclusion: MiniMax M3 Is Ready to Compete with the Top Tier

MiniMax M3 doesn't comprehensively surpass the Big Three — the company itself acknowledges a gap with Claude's strongest model on the most challenging autonomous research tasks. But based on these five real-world tests, the following conclusions are clear:

Multimodal Document Understanding: M3 demonstrated greater rigor than Gemini 3.1 Pro, proactively catching data errors
Code Generation: Competitive with Claude and Codex in different ways — Agent Team mode is well-suited for complex projects
Native Video Understanding: A current standout feature — directly watching video rather than relying on speech transcription
Price Advantage: API and subscription pricing far below overseas competitors, ideal for heavy AI users

Can a Chinese AI model compete at the top tier? This test gives a definitive yes. For users with limited budgets who still need high-quality AI output, MiniMax M3 is a choice worth serious consideration.