Vibe Coding in Practice: Three Strategies for Building an English Learning Game with Dramatically Different Results

Introduction: A Vibe Coding Challenge for an English Learning Game

Recently, a Bilibili creator named Liu Yi (an AI PhD student and indie developer) published an exceptionally insightful hands-on Vibe Coding tutorial. His goal was to develop a "visual novel game for learning English," and through three controlled experiments, he clearly demonstrated how different Vibe Coding strategies can dramatically impact the quality of the final output.

The core thesis of the video is crystal clear: Vibe Coding isn't about "turning a single sentence into gold" — it's a systematic engineering methodology. Even when using the same AI model (OpenAI Codex), different levels of resource preparation and tool selection lead to vastly different results.

Two key concepts are worth explaining here. OpenAI Codex is an AI agent designed for software engineering tasks, built on the codex-1 model. It can autonomously execute multi-step coding tasks in a cloud sandbox environment — including writing functional code, fixing bugs, running tests, and more. Unlike traditional code completion tools, Codex is closer to a "virtual developer": you give it a task description, and it autonomously plans execution steps, reads and writes files, installs dependencies, runs debugging, and ultimately delivers working code. "Vibe Coding," meanwhile, is a concept coined by AI luminary Andrej Karpathy in early 2025. It refers to a development paradigm where you describe requirements in natural language and let AI handle most of the coding work. The developer plays the role of "director" rather than "actor" — you describe what you want, and AI figures out how to build it. The value of this video lies in its use of controlled experiments to prove that Vibe Coding isn't as simple as "say one sentence and get a finished product" — it requires systematic preparation and strategy.

Version 1: Pure Prompt, Zero Preparation — How Far Can AI Go?

The first experiment fully simulated a "casual user" approach — giving the AI a single simple instruction:

"Make me an English learning game with a black-haired, round-glasses, beautiful anime-style female character. Build a five-minute demo."

Version 1 generated result

Codex completed the development in 8 minutes and 40 seconds, consuming 18% of the five-hour Token quota. The result was a pure HTML/CSS web interactive page with a randomly AI-generated character. The creator admitted that while the completion level was "already quite impressive" for this level of Prompt, it was essentially just a web interactive page — not really a game.

One interesting detail: Codex drew on the creator's previous development history — the generated character and interaction patterns closely resembled a Gala Game he had built before. This reveals an important phenomenon: AI agents get "trained" by their users' habits. Over time, programmers and non-programmers using the same tool will produce significantly different results.

Version 2: Providing Assets to Reduce AI's Cognitive Load

Core Principle: Let AI Focus on Assembly, Not Creation

The key change in the second experiment was preparing all assets in advance, letting the AI focus on "assembly" rather than "creation."

Pre-prepared assets

The creator did the following preparation:

Character art: Generated two character images with different expressions using AI image generation tools, then manually processed transparent backgrounds in GIMP (AI-generated images often produce "fake Alpha" rather than true transparency)
Background images: Scene backgrounds in a matching art style
BGM: Background music sourced from royalty-free music websites
Outline: A game content outline pre-generated using an LLM

Regarding the character art processing, it's worth elaborating on a technical detail that's extremely common in AI-assisted development but often overlooked. The Alpha channel is the fourth channel in an image that controls transparency (in addition to the red, green, and blue color channels). True transparency means the Alpha value in background areas is 0. However, current mainstream AI image generation models (such as Stable Diffusion, DALL-E, Midjourney, etc.) are primarily trained on JPEG-format image datasets, and JPEG inherently doesn't support Alpha channels. As a result, these models typically use pure white or checkerboard patterns to "simulate" the visual appearance of transparency, but the Alpha values in those areas remain 255 (fully opaque). In game development, if you directly use these "fake Alpha" images as character sprites overlaid on backgrounds, the white areas will block the background, severely ruining the visual effect. GIMP (GNU Image Manipulation Program) is a free, open-source image editor that can select background areas using the "Select by Color" tool and delete them to produce truly transparent PNG files.

The underlying principle here is perfectly analogous to human cognition: If you stuff too much irrelevant information into the Context Window, the model's performance degrades. Just like a person whose mind is cluttered with trivial matters will see their decision-making ability and effective intelligence decline. By completing non-essential asset preparation in advance, you preserve the AI's thinking precision and depth for the truly important logic development.

The technical principle here deserves deeper understanding. The Context Window is one of the core concepts of large language models, referring to the total amount of text the model can "see" and process in a single inference, typically measured in Token count. A Token is the smallest unit of text the model processes — one English word corresponds to roughly 1-2 Tokens, and one Chinese character corresponds to roughly 1-2 Tokens. When the Context Window is filled with low-value information, the model's "attention" gets diluted — this is technically related to the attention mechanism in the Transformer architecture. The attention mechanism computes relevance weights between every Token and all other Tokens in the input sequence. When irrelevant information is excessive, key information receives lower attention weights, causing the model to perform worse at critical decision points. This is why preparing assets in advance is so important: by removing images, music, outlines, and other "already-decided content" from the AI's cognitive burden, you allow the limited Context Window and computational resources to focus on core logic reasoning and code generation.

Result: A Quantum Leap in Quality, Nearly Identical Token Cost

Codex hard at work

Using the exact same Prompt — only changing "generate new characters" to "use the provided assets" — Codex performed like a completely different tool. Development time extended from 8 minutes to 13 minutes — the model clearly entered a "serious mode," implementing features like three-choice quizzes, instant error correction, and automatic note archiving. It even proactively developed mobile responsiveness.

Token consumption only increased from 18% to 19% — just 1% more — but output quality improved by an order of magnitude. The game now had dynamic interactive effects, BGM, phrase review functionality, and a level of completeness far beyond Version 1.

Version 3: Introducing the Godot Game Engine — Stop Reinventing the Wheel

Why Godot?

The third experiment built on Version 2 with just one additional instruction: "Use the Godot engine."

Just one added instruction: use Godot engine

Godot is a completely free, open-source (MIT license) cross-platform game engine supporting both 2D and 3D game development. It has rapidly gained traction in the indie game development community in recent years and is considered an important alternative to Unity and Unreal Engine. Godot uses its own GDScript scripting language (with Python-like syntax and a gentle learning curve) while also supporting C# and C++. It's particularly well-suited for AI-assisted development for several key reasons: First, Godot uses a "scene-node" tree architecture where every game element is a node — this highly structured organization aligns well with AI's reasoning patterns. Second, GDScript's concise and intuitive syntax means AI-generated code has a lower error rate. Third, as an open-source project, Godot's documentation and community code are extensively represented in AI training data, giving models good "memory" of its APIs and best practices.

The creator's reasoning was crystal clear: HTML is too lightweight to handle anything complex. Every time you use HTML to develop a game, the AI has to start from a blank slate — first building an "engine," then filling in game content. With a mature game engine like Godot, the AI can jump straight to filling in content without worrying about the underlying architecture. By contrast, when developing games with pure HTML/CSS/JavaScript, there's no unified architectural constraint. The AI must design state management, render loops, collision detection, and other foundational systems on its own. This "reinventing the wheel" work not only consumes massive amounts of Tokens but also easily introduces architectural-level defects.

The creator used a brilliant analogy: It's like taking a school exam where the teacher hands out printed test papers and you just answer the questions. But if the teacher writes the questions on the blackboard, you have to copy them down before you can even start working.

Result: Fastest, Most Token-Efficient, Highest Quality

The data from this version is stunning:

Metric	Version 1 (Pure Prompt)	Version 2 (+Assets)	Version 3 (+Assets+Godot)
Dev Time	8 min 40 sec	13 min	8 min 57 sec
Token Consumption	18%	19% (+1%)	8% (lowest)
Output Quality	Web interactive page	High-completion web game	Near-production game quality

Version 3's minimal Token consumption is likely because: when building games with HTML, the AI has to construct an "engine" from scratch every time. With Godot, the engine already provides a complete framework, so the AI only needs to focus on the game content itself.

The data behind this experiment reflects an increasingly important dimension in Vibe Coding practice — Token economics. OpenAI Codex uses a per-Token billing or quota-based model, where users have a fixed Token allowance within a given time window. In this experiment, the five-hour Token quota was the developer's "budget." Version 1 consumed 18%, Version 2 consumed 19%, and Version 3 with the Godot engine consumed only 8% — meaning the same budget could support significantly more iterations and optimizations under the third strategy. As AI programming tools become mainstream, Token cost is emerging as a new cost category in software development. Choosing the right framework and toolchain affects not only code quality but also the economic efficiency of development.

The final game featured complete popup feedback, a notes system, BGM toggle controls, and a level of polish very close to a formally released indie game.

Three Core Vibe Coding Principles

From these three controlled experiments, we can distill three key principles:

1. Your Requirements Must Be Real

The first principle of Vibe Coding is that your requirements must be genuine. Real requirements enable effective judgment and iteration, rather than aimlessly asking AI to "just make something." Real requirements mean you have clear quality standards and use cases for the final product, which in turn guide better decisions at every step — what assets to choose, what framework to use, which features to prioritize.

2. Reduce AI's Burden by Preparing Assets in Advance

Don't make the AI handle both "creative design" and "engineering implementation" simultaneously. Assets, outlines, reference images — anything that can be prepared in advance should be prepared in advance, so the AI's Context Window and reasoning capacity can focus on core logic. The essence of this principle is cognitive load management — whether for humans or AI, handling too many heterogeneous tasks simultaneously degrades the quality of each individual task.

3. Leverage Mature Frameworks — Don't Reinvent the Wheel

Make good use of mature open-source tools and frameworks. Let AI develop on top of existing foundations rather than building infrastructure from scratch every time. This not only saves Tokens but, more importantly, improves the structural integrity and maintainability of the output. "Don't reinvent the wheel" takes on new meaning in the AI era — every time you ask AI to build infrastructure from scratch, you're consuming precious Token resources that could have been spent on core feature development.

Conclusion: The True Advantage of Programmers in the Vibe Coding Era

The creator made an observation in the video that many people overlook: Programmers are ordinary people too — they're just ordinary people whose specialty happens to be writing programs. But it's precisely this specialty that enables them to use AI tools more efficiently — not because they can write code, but because they understand how to decompose problems, prepare resources, and choose the right tools.

In the Vibe Coding era, the real competitive advantage isn't whether you can write code, but whether you can think like an engineer: breaking complex problems into modules that AI can process efficiently, achieving the best results with the fewest resources. These three experiments prove this point clearly — the same AI model, the same development task, yet merely by varying preparation strategy and tool selection, Token consumption dropped from 18% to 8% while output quality leaped from "web interactive page" to "near-production game quality." That is the value of engineering thinking in the AI era.