Replicating a 3D Personal Homepage with Codex: A Hands-On Comparison of Multiple AI Coding Tools

Replicating a 3D homepage reveals the massive capability gap between AI coding tools.
A content creator attempted to replicate a complex 3D gamified personal homepage using multiple AI coding tools. OpenAI Codex succeeded on the first try and proactively identified missing architecture, while free tools completely failed—MiniMax 2.7 couldn't even comprehend the code. The experiment reveals a cliff-like gap between top-tier and free models, suggesting a two-phase strategy: top-tier models for core challenges, free tools for routine modifications.
Introduction: What Does the World's Coolest Personal Homepage Look Like?
If you ask any large language model "What's the coolest personal homepage in the world?", the top answer is remarkably consistent—it's a personal homepage that transforms a traditional flat webpage into a 3D gamified experience. Upon entering, users see a map and can control a car using keyboard inputs to explore different locations, clicking on various spots to view contact information, Bilibili links, podcasts, and more.
Although this project is open source, its complexity is extremely high. A Bilibili content creator attempted to replicate this homepage using multiple AI coding tools, and the results revealed the enormous capability gap between current AI coding tools.

Project Complexity Analysis: Why Is This 3D Homepage So Hard to Replicate?
The complexity of this open-source personal homepage manifests on three levels:
Fragmented Code Architecture
The project author didn't originally maintain it as a standard open-source project—the code is scattered across multiple independent repositories. Downloading the main repository doesn't give you a runnable application; you need to locate multiple scattered sub-projects and assemble them. This places extremely high demands on an AI tool's code comprehension ability. In modern software engineering, this kind of "multi-repo architecture" is fairly common in large teams but typically comes with comprehensive documentation and build scripts. This personal project lacks such guidance, meaning AI tools must possess a senior developer's "code archaeology" ability—inferring complete dependency relationships and build processes from scattered clues.
Game-Level Frontend Complexity
This isn't an ordinary web application—it's a complete browser-based 3D game. It involves map rendering, vehicle control, interactive exploration, and other features, making its tech stack far more complex than typical frontend projects.
To understand this complexity, you need to appreciate the technology stack layers in web-based 3D development. Regular web development primarily relies on HTML, CSS, and JavaScript for two-dimensional layout and interaction, while 3D web games require WebGL (Web Graphics Library)—a low-level graphics rendering interface provided by browsers that allows developers to directly operate the GPU for three-dimensional graphics computation. In practice, developers typically use wrapper libraries like Three.js or Babylon.js to simplify WebGL calls, but even so, they still need to handle 3D scene graph management, shader programming, physics engine integration, collision detection, camera control systems, and a series of other game development-specific technical challenges. Additionally, vehicle control involves physics simulation (acceleration, steering, friction), and map exploration involves spatial indexing and LOD (Level of Detail) optimization—areas that traditional frontend developers rarely encounter. This means AI tools need to understand not just frontend development, but game development as well.
Hidden Backend Services
The open-source portion only includes frontend code, but actual operation requires a backend server—for example, the in-game leaderboard feature. The backend service code is not open-sourced, requiring AI tools to identify this gap and proactively propose solutions. This situation is not uncommon in the open-source community: many projects only publish client-side code while keeping server-side code involving data storage, user authentication, and API interfaces private. Whether an AI tool can infer the existence and general functionality of backend services from clues like API calls, environment variable configurations, and network request patterns in the frontend code is an important indicator of its "full-stack comprehension."
Hands-On Comparison Results of Multiple AI Coding Tools
Free Coding Tool (Unnamed) — Complete Failure
The content creator first tried a free-tier coding tool about a month ago. After providing the prompt "help me replicate an identical website," the tool claimed it was done, but the site threw errors on load and was completely non-functional.
More critically, this tool never mentioned that the project requires backend service support, and when pulling multiple sub-projects for assembly, it couldn't even pass compilation. This demonstrates a complete lack of understanding of the project's overall architecture.
OpenAI Codex — Nailed the Core Replication in One Shot
Codex's performance was impressive. Using the same prompt, it successfully replicated the project on the first attempt. But Codex's strength lies not just in execution—it's in its "intelligence":
- Proactively identified architectural gaps: Codex not only completed the frontend replication but also proactively informed the creator that "this project requires a backend service" and offered to help create another project to write the backend service code.
- Deep understanding of project structure: It was able to identify dependencies scattered across multiple repositories and correctly assemble them.
It's important to distinguish the fundamental difference between OpenAI Codex and ordinary code completion tools. Codex is not simply "auto-complete for code"—it's a full-fledged AI programming system with complete Agent capabilities. It runs in OpenAI's cloud sandbox environment with independent compute instances, capable of autonomously executing terminal commands, reading and writing file systems, installing dependencies, running tests, and iteratively correcting based on execution results. This "plan-execute-verify-correct" closed-loop workflow enables it to handle complex engineering tasks requiring multi-step reasoning, rather than merely generating code line by line. Under the hood, Codex calls OpenAI's latest reasoning models (such as o3/o4-mini), which have been specifically optimized for code comprehension and logical reasoning.
However, Codex's fatal flaw is its extremely limited free quota—the weekly allocation is very small. After the successful replication, entering the modification phase, the quota ran out after just a few changes. It provides access to top-tier models but cannot support sustained long-term work. OpenAI's pricing strategy reflects a common dilemma across the AI industry: the inference cost of top reasoning models is extremely high (each complex task may consume several dollars in compute resources), so the free tier can only offer very limited trial quotas—essentially a "try before you buy" funnel strategy.
Kilo + MiniMax 2.7 (NVIDIA Free Model) — Overwhelmed
After exhausting Codex's quota, the creator switched to the Kilo editor paired with NVIDIA's free MiniMax 2.7 model. The results were disappointing:
- Excessively long code reading time: It took about three hours to read the project's code structure and still hadn't finished
- Unable to understand source code: It couldn't even achieve basic code comprehension, let alone modification and optimization
- Clearly insufficient capability: Completely unable to handle a project of this complexity
MiniMax 2.7's failure exposes the fundamental limitations of free models when processing large codebases. First is the Context Window issue: the amount of text a model can simultaneously "see" and process is limited. A complex 3D game project may contain tens or even hundreds of thousands of lines of code, far exceeding most models' effective processing range. Even if a model claims to support ultra-long contexts, its actual "effective attention" often degrades significantly as input length increases—the model may have "read" the code but cannot establish effective logical connections across such a large volume of information. Second, code comprehension ability is directly related to the quality and scale of the model's pre-training data; freely available models typically fall short of closed-source top-tier models in the breadth and depth of code training data. NVIDIA offers free API access to these models through its NIM (NVIDIA Inference Microservices) platform to promote its inference infrastructure ecosystem, but the model's inherent capability ceiling means it's better suited for simple tasks rather than complex engineering.
Practical Strategy: A Two-Phase AI Coding Tool Combination
Based on hands-on experience, the creator summarized a pragmatic strategy for using AI coding tools:
Phase 1: Top-Tier Models for Core Challenges
Leverage Codex's weekly free quota or Google's free quota to use top-tier models for the most complex and critical parts of a project. These tools have strong Agent capabilities, making them suitable for architecture-level challenges.
The "Agent capability" mentioned here is the key concept that differentiates AI coding tool tiers. Traditional AI code assistants (like early GitHub Copilot) are essentially "completion tools"—they predict the next segment of content as you write code but lack the ability to think and act independently. AI coding tools with Agent capabilities are fundamentally different: they can understand high-level task objectives, autonomously formulate execution plans, execute commands in virtual environments and observe results, and independently debug and correct when encountering errors. Behind this capability is the breakthrough of "Reasoning Models"—these models engage in long chains of internal reasoning before generating answers, similar to how human developers analyze complex problems. Google's Gemini series and OpenAI's o-series models both belong to this category. Currently, free quotas for these top-tier models are typically limited by "task count" or "token consumption." The free Gemini API offered by Google through AI Studio and the Codex functionality offered by OpenAI through ChatGPT are both free entry points developers can leverage.
Phase 2: Free Tools for Routine Modifications
Once core problems are solved, small-scale code adjustments (like modifying styles or logic at specific locations) are frequent but not particularly difficult—ideally these should be handled with completely free tools.
But the current problem is: Phase 2 free tools are clearly not capable enough. MiniMax 2.7 can't even comprehend the code, making it unable to handle even simple modification tasks. This means the market currently lacks a daily coding assistant that is both free and possesses sufficient code comprehension ability.
Key Takeaways: Recommendations for Choosing AI Coding Tools
This hands-on comparison reveals several realities in the AI coding tool landscape:
- There is a cliff-like gap between top-tier models and free models, especially in architectural understanding of complex projects. This gap stems not only from differences in model parameter scale but also from vast differences in training data quality, RLHF (Reinforcement Learning from Human Feedback) tuning depth, and inference-time compute investment.
- An Agent's "intelligence" is reflected not just in execution ability but in whether it can proactively identify problems and propose solutions. This "metacognitive" ability—knowing what it doesn't know, knowing what the project is missing—is a hallmark of today's most advanced AI systems.
- Free quota limitations mean no single tool can cover the entire development workflow; combining tools is the current practical reality.
- Replicating complex open-source projects is an excellent litmus test for AI coding tool capabilities. Compared to simple algorithm problems or standalone function writing, complete project replication requires AI to simultaneously possess code comprehension, architecture analysis, dependency management, environment configuration, error diagnosis, and other multi-dimensional capabilities.
For regular developers, the most pragmatic approach currently is: use Codex or Google's free top-tier models to solve key challenges, then find suitable everyday tools for subsequent iterations. This field is still evolving rapidly and is worth continued attention.
Related articles
Product ReviewsQoder vs Cursor Real-World Comparison: Which $20/Month AI IDE Is Better?
Hands-on comparison of Qoder vs Cursor AI IDEs: Agent autonomy, human interaction count, and architecture decisions. Qoder needed only 2 interactions vs Cursor's 8.
Product ReviewsCursor Cloud Agent Demo: Eliminating Bottlenecks Across the Entire Software Development Lifecycle
Deep analysis of Cursor's Cloud Agent demo showing how cloud VMs, automated test artifacts, and a full-chain control plane systematically eliminate human bottlenecks across the software development lifecycle.
Product ReviewsCursor 3.0 Deep Dive: Multi-Agent Parallelism, Design Mode, and Best-of-N Model Comparison
Cursor 3.0 evolves from an AI coding assistant into an Agent fleet command center. Explore multi-agent parallelism, Design Mode, and Best-of-N model comparison.