Claude Sonnet 4.5 vs GPT-5 Codex: A Hands-On Soft-Body Physics Programming Showdown

A Hardcore AI Programming Showdown

Now that AI coding assistants can easily build web apps, a more interesting question emerges: can they handle truly hardcore programming tasks? A Chinese tech YouTuber recently published a highly technical comparison test, pitting Anthropic's newly released Claude Sonnet 4.5 against OpenAI's GPT-5 Codex high-performance model in a head-to-head battle. The task: recreate the soft-body physics driving simulation from the 1996 classic game Terep 2 (Deformers) in C++.

bilibili source

This classic game was famous for its stunning soft-body physics vehicle deformation effects, which remain impressive even by today's standards. The brilliance of choosing this task lies in its complexity: it requires not just graphics rendering, but the coordinated work of multiple complex subsystems — mass-spring physics, terrain generation, collision detection, and more — making it far more challenging than typical web development tests.

C++ in Practice: From Compilation Errors to Barely Running

Initial Code Generation Phase

The tester used identical prompts, asking both models to create a game in C++ featuring soft-body physics, simple vehicle deformation, and an off-road track, generating all necessary files. Claude was set to thinking mode (Sonnet 4.5), GPT-5 Codex was set to high-performance mode, and both were kicked off simultaneously.

Initial project build phase

Both models quickly began planning their project structures. Claude's output was noticeably more thorough — not only generating extensive code but also helpfully explaining how to handle different Linux distributions. Codex demonstrated a methodical approach to designing the C++ mass-spring physics system. However, neither model's initial code could compile and run directly; both hit dependency and compilation errors.

Iterative Fix Process

The tester pasted error messages back to each model, letting them fix their own issues rather than searching for solutions the traditional way. This setup was crucial — it simulated a real-world scenario where an amateur developer relies entirely on an AI coding assistant.

Initial results after fixing compilation issues

After multiple rounds of fixes, both models produced compilable, runnable programs, though the initial results were underwhelming:

Claude Sonnet 4.5: Decent terrain rendering with an off-road feel, but the vehicle shape was subpar and movement controls didn't work
GPT-5 Codex: WASD vehicle controls worked, but terrain rendering had issues

The tester admitted disappointment that neither model could draw a decent vehicle model, especially since in browser-based JS tests, they could at least render basic vehicle shapes.

Final C++ Results Comparison

After further feedback iterations, both models showed significant improvement:

Claude's strengths: Superior terrain with peaks and valleys, better suited for off-road tracks; vehicle correctly followed terrain contours; faster working speed and quicker bug fixes
Codex's strengths: Better vehicle model; more pronounced soft-body physics effects with the vehicle bouncing like jelly, closer to the Terep 2 tech demo feel

Side-by-side comparison of both models' C++ implementations

Switching to Web Tech Stack: JS and HTML Performance

Out of curiosity, the tester had both models complete the same task using HTML, JS, and CSS. As expected, development with the web tech stack was much faster, with Claude finishing first once again.

Web tech stack test

Interestingly, regardless of programming language, both models exhibited similar bug patterns:

Claude's web version: Terrain and graphical style looked great, but improperly tuned spring forces caused the vehicle to constantly sink into the ground or shake apart explosively. Even after four iterations, it couldn't perfectly balance drivability and physics stability
Codex's web version: Delivered what was arguably the best result across all four tests — the soft-body physics deformation was impressive, and the way the vehicle deformed on impact closely resembled the feel of the original game

Key Findings and In-Depth Analysis

Programming Capability Profiles of Both Models

This test revealed distinct differences in programming capabilities between Claude and GPT-5:

Claude Sonnet 4.5 excels at UI polish and overall visual presentation, generates more natural terrain, provides more thorough code documentation, and responds faster. However, it struggled with physics parameter tuning, failing to balance the spring system's various parameters even after multiple iterations.

GPT-5 Codex may have the edge in mathematical and physics calculations, producing more realistic soft-body deformation effects and more reasonable collision responses. However, it fell slightly short in visual polish and terrain design.

Implications for Developers

You might not have noticed, but the tester specifically emphasized that Sonnet is not Anthropic's top-tier model (that would be Opus), and considering this, Sonnet 4.5's performance was already quite impressive. This hints at an important trend: AI coding assistants are rapidly evolving from "can build web pages" to "can handle complex systems-level programming."

That said, this test also exposed several current limitations of AI programming:

Physics parameter tuning remains a challenge: Both models struggled to balance spring forces, damping coefficients, and other parameters
Cross-language consistency issues: Similar bugs appeared in both C++ and JS implementations, indicating the problems stem from algorithmic understanding rather than language-specific issues
Human feedback loops are still necessary: Neither model could produce usable results on the first try; both required multiple iterations

Conclusion: No Clear Winner, But Each Has Its Strengths

This Claude vs GPT-5 programming showdown is hard to call with a definitive winner. If you prioritize development efficiency and visual quality, Claude Sonnet 4.5 is the better choice; if you care more about physics simulation accuracy and mathematical computation, GPT-5 Codex might be more suitable.

For everyday developers, the more important signal is this: AI coding assistants can now handle fairly complex graphics and physics programming tasks in systems-level languages like C++. While the results are far from perfect, considering this was a from-scratch process requiring almost no manual coding, the progress is exciting enough.