MiniMax M3 Hands-On Review: 7 Hardcore Tasks Reveal Its True Coding Ability

Overview

MiniMax M3 launched its full public release today, touting native multimodal capabilities and ultra-long context support. The official report uses bold language, implying performance on par with top-tier models like Gemini, Claude, and GPT. But marketing is marketing — how does it actually perform? A Bilibili content creator put it through a systematic battery of 7 high-difficulty tasks covering 3D scene reconstruction, physics simulation, frontend development, and more, ultimately arriving at an average score of just 58.3.

What does that score mean? In short: The formulas are beautiful, but the demos are broken. M3 demonstrates solid theoretical understanding, but repeatedly falls apart at the engineering level.

圆锥压根没显示出来

写出了Snell公式

直接被切到屏幕外

Native Multimodal: Vibes Are Right, Details Are Gone

M3's core selling point is its native multimodal capability. The tester provided a landscape photo and asked it to recreate the 3D scene using Three.js.

The results were mixed. The three-layer composition was correctly identified — castle, walls, river valley, row buildings — and the overall color direction was close to the original, with cream-white walls, gray stone slabs, and warm lighting tones largely preserved. But here's the problem: the most prominent river in the original image simply vanished, along with water reflections, atmospheric perspective, flags, cars, and other details.

Final score: 62. The verdict: "Atmosphere right, geometry right, details terrible." Native multimodal is clearly not just a gimmick, but it's far from impressive — it gets the big picture, but the fine details still need human intervention.

Physics Simulation Triple Feature: Formulas All Correct, Visuals All Broken

Rotating Hexagonal Bouncing Ball

The task: implement a bouncing ball inside a rotating hexagon using a single HTML file with no external libraries. M3 understood the core concept of "edges are rotating" and applied rigid body rotation formulas, precisely calculating linear velocity at the contact point — a clear improvement over M2.7, which treated the boundaries as stationary bricks.

But the default parameters were a disaster: air resistance set to 0.06, decaying only 5.8% per second, meaning the ball bounces forever. For comparison, Kimi's solution used a friction coefficient of 0.985, with wall collisions that realistically dissipate energy. M3 exposes configuration options to users, which is flexible, sure, but "works out of the box" is not in its vocabulary.

3D Cone Spinning Top

This was the most popular challenge in the test series, requiring simultaneous handling of moment of inertia, friction, and gravity. M3's physics derivation was correct — contact angle and no-sleep physics were both properly implemented.

However, on screen — the cone simply didn't render. All you could see was a red dot, a white line, and a bouncing trajectory circle on the ground. The issue was in the code: the base center position was correct, but the base ring was placed at H=280, putting the base center 76 units further than the base ring, stretching the base into an extended conical surface. Combined with P5.js's default back-face culling, the entire side surface was clipped away.

The physics was right, but what you saw was an invisible spinning top. Score: 50.

SPH Particle Fluid

SPH (Smoothed Particle Hydrodynamics) sounds intimidating, but M3's output was laughably bad: 720 particles all piled up at the bottom of the screen in a thin layer, like a handful of scattered millet. 80% of the screen was black — no water column, no splashing, no density gradient, no surface ripples.

At the code level, every physics formula was correct. But the initialization placed particles directly at the bottom with gravity at 280 from the start, settling completely within 3 seconds — no inflow source, no initial kinetic energy, no bulk flow effects.

As the tester put it: "Wrote the paper, but couldn't even draw the bowl right for the demo." Score: 50.

Optical Refraction: The Most Ironic Failure

The refraction ray tracing test was the most ironic failure of the entire evaluation. The task: render a glass sphere showing the refracted background visible through the sphere.

What M3 produced was a dark metallic sphere that only showed a checkerboard reflection on the lower half, with none of the expected inverted, refracted background image visible. What should have been a glass ball, M3 turned into a chrome steel bearing.

Code analysis revealed that the Snell's law formula was implemented perfectly, but the parameter configuration resulted in almost entirely reflective components, with refracted rays actually hitting the dark sky background instead of transmitting through to the ground.

Physics all correct, optics all wrong. Score: 38 — the lowest of the entire test.

Highlight Projects: Boids Flocking & Frontend Kanban

Boids Flocking — Best of Show

M3 used Three.js to upgrade the classic Boids flocking algorithm to 3D, with over 100 birds moving in real time. Five forces were implemented: separation, alignment, cohesion, plus boundary and mouse attraction forces. OrbitControls dragging and velocity-to-hue color mapping were both included, making this the most visually complete project of the entire test.

The only issue was in the mouse attraction gravity field implementation: force increases as distance decreases, causing particles to overshoot and then oscillate back and forth in spiraling acceleration — a classic gravity well trap caused by not implementing a minimum distance cutoff. Overall though, it scored 78, M3's best performance.

Drag-and-Drop Kanban Board

Functionally adequate: To Do / In Progress / Done columns with full create, read, update, delete, and drag operations all working, LocalStorage persistence, smooth dragging, and a UI style reminiscent of Linear / Notion.

But production-level issues abound: requirements were written in Chinese prompts, yet the UI is entirely in English; the Add Column feature hardcodes new column titles as "New Column" and doesn't accept Chinese input; column widths are fixed at 208px in multi-column layouts, forcing horizontal scrolling when exceeding screen width with zero responsive design.

Drag logic: full marks. i18n: zero. Responsive design: zero. It runs, but it's not production-ready. Score: 60.

Conclusion: Strong in Theory, Weak in Engineering

Test Project	Score	Core Issue
Multimodal Scene Reconstruction	62	All details lost
Rotating Bouncing Ball	-	Default parameters unusable
Boids Flocking	78	Gravity field bug
3D Cone Spinning Top	50	Geometry rendering absent
Optical Refraction	38	Glass turned to steel
SPH Fluid	50	All particles sank to bottom
Drag-and-Drop Kanban	60	Missing i18n/responsive design

Average score: 58.3 — this result shows that MiniMax M3 has made clear progress in theoretical derivation and algorithm comprehension, but still has significant shortcomings in parameter tuning, geometry rendering, and engineering details.

M3's typical pattern: the formulas are correct, but the demos are broken. It can write Snell's law, SPH equations, and rigid body rotation formulas, but parameter initialization, boundary conditions, and rendering pipelines — these "last mile" engineering problems — repeatedly prove fatal.

Can it really compete with Gemini, Claude, and GPT? Based on current hands-on testing, MiniMax M3 still has considerable ground to cover on complex programming tasks. But to be fair, its native multimodal and ultra-long context foundational capabilities are genuinely solid — it's just that between "can understand" and "can execute well," there's still a lot of engineering polish needed.