MiniMax M3 Hands-On Review: 7 Hardcore Tasks Reveal Its True Coding Ability

MiniMax M3 aces the theory but fails the engineering, scoring 58.3 across 7 hardcore coding tasks.
MiniMax M3's full release was tested across 7 challenging tasks including 3D scene reconstruction, physics simulations, optical refraction, and frontend development. While the model demonstrated strong theoretical understanding — correctly implementing Snell's law, SPH equations, and rigid body physics — it consistently failed at the engineering level with broken rendering, unusable default parameters, and missing visual output. The average score of 58.3 reveals a model that can write the formulas but can't deliver working demos.
Overview
MiniMax M3 launched its full public release today, touting native multimodal capabilities and ultra-long context support. The official report uses bold language, implying performance on par with top-tier models like Gemini, Claude, and GPT. But marketing is marketing — how does it actually perform? A Bilibili content creator put it through a systematic battery of 7 high-difficulty tasks covering 3D scene reconstruction, physics simulation, frontend development, and more, ultimately arriving at an average score of just 58.3.
What does that score mean? In short: The formulas are beautiful, but the demos are broken. M3 demonstrates solid theoretical understanding, but repeatedly falls apart at the engineering level.



Native Multimodal: Vibes Are Right, Details Are Gone
M3's core selling point is its native multimodal capability. The tester provided a landscape photo and asked it to recreate the 3D scene using Three.js.
The results were mixed. The three-layer composition was correctly identified — castle, walls, river valley, row buildings — and the overall color direction was close to the original, with cream-white walls, gray stone slabs, and warm lighting tones largely preserved. But here's the problem: the most prominent river in the original image simply vanished, along with water reflections, atmospheric perspective, flags, cars, and other details.
Final score: 62. The verdict: "Atmosphere right, geometry right, details terrible." Native multimodal is clearly not just a gimmick, but it's far from impressive — it gets the big picture, but the fine details still need human intervention.
Physics Simulation Triple Feature: Formulas All Correct, Visuals All Broken
Rotating Hexagonal Bouncing Ball
The task: implement a bouncing ball inside a rotating hexagon using a single HTML file with no external libraries. M3 understood the core concept of "edges are rotating" and applied rigid body rotation formulas, precisely calculating linear velocity at the contact point — a clear improvement over M2.7, which treated the boundaries as stationary bricks.
But the default parameters were a disaster: air resistance set to 0.06, decaying only 5.8% per second, meaning the ball bounces forever. For comparison, Kimi's solution used a friction coefficient of 0.985, with wall collisions that realistically dissipate energy. M3 exposes configuration options to users, which is flexible, sure, but "works out of the box" is not in its vocabulary.
3D Cone Spinning Top
This was the most popular challenge in the test series, requiring simultaneous handling of moment of inertia, friction, and gravity. M3's physics derivation was correct — contact angle and no-sleep physics were both properly implemented.
However, on screen — the cone simply didn't render. All you could see was a red dot, a white line, and a bouncing trajectory circle on the ground. The issue was in the code: the base center position was correct, but the base ring was placed at H=280, putting the base center 76 units further than the base ring, stretching the base into an extended conical surface. Combined with P5.js's default back-face culling, the entire side surface was clipped away.
The physics was right, but what you saw was an invisible spinning top. Score: 50.
SPH Particle Fluid
SPH (Smoothed Particle Hydrodynamics) sounds intimidating, but M3's output was laughably bad: 720 particles all piled up at the bottom of the screen in a thin layer, like a handful of scattered millet. 80% of the screen was black — no water column, no splashing, no density gradient, no surface ripples.
At the code level, every physics formula was correct. But the initialization placed particles directly at the bottom with gravity at 280 from the start, settling completely within 3 seconds — no inflow source, no initial kinetic energy, no bulk flow effects.
As the tester put it: "Wrote the paper, but couldn't even draw the bowl right for the demo." Score: 50.
Optical Refraction: The Most Ironic Failure
The refraction ray tracing test was the most ironic failure of the entire evaluation. The task: render a glass sphere showing the refracted background visible through the sphere.
What M3 produced was a dark metallic sphere that only showed a checkerboard reflection on the lower half, with none of the expected inverted, refracted background image visible. What should have been a glass ball, M3 turned into a chrome steel bearing.
Code analysis revealed that the Snell's law formula was implemented perfectly, but the parameter configuration resulted in almost entirely reflective components, with refracted rays actually hitting the dark sky background instead of transmitting through to the ground.
Physics all correct, optics all wrong. Score: 38 — the lowest of the entire test.
Highlight Projects: Boids Flocking & Frontend Kanban
Boids Flocking — Best of Show
M3 used Three.js to upgrade the classic Boids flocking algorithm to 3D, with over 100 birds moving in real time. Five forces were implemented: separation, alignment, cohesion, plus boundary and mouse attraction forces. OrbitControls dragging and velocity-to-hue color mapping were both included, making this the most visually complete project of the entire test.
The only issue was in the mouse attraction gravity field implementation: force increases as distance decreases, causing particles to overshoot and then oscillate back and forth in spiraling acceleration — a classic gravity well trap caused by not implementing a minimum distance cutoff. Overall though, it scored 78, M3's best performance.
Drag-and-Drop Kanban Board
Functionally adequate: To Do / In Progress / Done columns with full create, read, update, delete, and drag operations all working, LocalStorage persistence, smooth dragging, and a UI style reminiscent of Linear / Notion.
But production-level issues abound: requirements were written in Chinese prompts, yet the UI is entirely in English; the Add Column feature hardcodes new column titles as "New Column" and doesn't accept Chinese input; column widths are fixed at 208px in multi-column layouts, forcing horizontal scrolling when exceeding screen width with zero responsive design.
Drag logic: full marks. i18n: zero. Responsive design: zero. It runs, but it's not production-ready. Score: 60.
Conclusion: Strong in Theory, Weak in Engineering
| Test Project | Score | Core Issue |
|---|---|---|
| Multimodal Scene Reconstruction | 62 | All details lost |
| Rotating Bouncing Ball | - | Default parameters unusable |
| Boids Flocking | 78 | Gravity field bug |
| 3D Cone Spinning Top | 50 | Geometry rendering absent |
| Optical Refraction | 38 | Glass turned to steel |
| SPH Fluid | 50 | All particles sank to bottom |
| Drag-and-Drop Kanban | 60 | Missing i18n/responsive design |
Average score: 58.3 — this result shows that MiniMax M3 has made clear progress in theoretical derivation and algorithm comprehension, but still has significant shortcomings in parameter tuning, geometry rendering, and engineering details.
M3's typical pattern: the formulas are correct, but the demos are broken. It can write Snell's law, SPH equations, and rigid body rotation formulas, but parameter initialization, boundary conditions, and rendering pipelines — these "last mile" engineering problems — repeatedly prove fatal.
Can it really compete with Gemini, Claude, and GPT? Based on current hands-on testing, MiniMax M3 still has considerable ground to cover on complex programming tasks. But to be fair, its native multimodal and ultra-long context foundational capabilities are genuinely solid — it's just that between "can understand" and "can execute well," there's still a lot of engineering polish needed.
Related articles

AI Agent Development: A Complete 6-Week Systematic Learning Roadmap
A 6-week systematic learning roadmap for AI Agent development, covering core architecture, ReAct principles, multi-agent collaboration, RAG integration, and deployment.

Four Core Advantages Frontend Developers Have When Transitioning to AI Agent Development
Frontend developers have key advantages for AI Agent development: TypeScript ecosystem fit, low-barrier full-stack bridging, and state management isomorphism. Learn the transition path here.

DiffusionGemma: Google's Open-Source Diffusion Language Model Exceeding 500 Tokens/s
Google releases DiffusionGemma, an open-source diffusion language model with Apache 2.0 license. The 26B-parameter MoE model achieves over 500 tokens/s in real-world tests.