Building a Match-3 Game with AI and Letting the Agent Play It: A Complete Hands-On Walkthrough
Building a Match-3 Game with AI and Le…
AI builds a Match-3 game and plays it autonomously, showcasing Agent self-iteration and evaluation.
A front-end developer used Godot's MCP plugin to have AI build a Match-3 game from scratch, then designed a decoupled architecture where AI makes decisions through data interfaces while humans watch via visual playback. After failing the first round, the Agent automatically summarized its strategy and cleared the level on the second attempt, demonstrating self-iteration. The project's deeper goal is building an Agent evaluation platform to compare different models' decision-making, while validating prompt engineering's significant impact on AI performance.
Introduction: When AI Is Both the Developer and the Player
A front-end developer with zero game development experience used the power of AI not only to have an Agent build a Match-3 game from scratch, but also designed an architecture specifically for the Agent to play it — AI develops the game, AI plays the game, humans spectate. This fascinating project comes from a Chinese content creator named Qiqi, and the entire process demonstrates AI's potential in both game development and intelligent agent evaluation.
In the demo, the Agent (Claude Harmus) scored 1,465 points in 20 moves over 39 seconds, but failed the challenge because the target score was 2,000. However, after summarizing its improvement strategy, it cleared the level on the second attempt. This process itself is quite inspiring — it reveals the importance of prompt engineering and an Agent's capacity for self-iteration.
Building a Match-3 Game with Godot + MCP
Environment Setup: Configuring the MCP Plugin for Godot
The game engine chosen for this project is Godot, primarily because it natively supports MCP (Model Context Protocol) plugins that expose interfaces, allowing AI to directly write scripts to implement game logic.
About the MCP Protocol: MCP is an open protocol released by Anthropic in late 2024, designed to establish a standardized communication bridge between AI models and external tools or data sources. Before MCP, integrating each AI tool with external systems required custom development, resulting in massive duplication of effort. MCP uses a client-server architecture where AI applications act as clients and external tools (such as the Godot engine, databases, file systems, etc.) act as servers exposing their capabilities. This design frees AI from pure text-based conversations, enabling it to directly operate real development environments — reading project files, executing code, calling APIs, and more. Godot's support for MCP means AI can directly create scene nodes, write GDScript scripts, and modify project configurations, delivering a true "AI as developer" experience.
Why Godot? Godot is an open-source, free game engine that has been rapidly gaining traction in recent years thanks to its lightweight architecture and zero-royalty policy. Compared to Unity and Unreal Engine, Godot's core advantages include: fully open source (MIT license), small engine footprint (~40MB), and its proprietary GDScript language (with Python-like syntax that is extremely AI-friendly for code generation). Notably, GDScript's concise syntax results in higher accuracy when large language models generate game logic code, because its syntactic structure closely resembles the Python code that is abundantly present in AI training data. Additionally, Godot's scene tree architecture organizes game objects into hierarchical nodes — a structured representation that is naturally suited for AI comprehension and manipulation.
Here are the specific setup steps:
- Download the MCP plugin: Although Godot's built-in asset marketplace can install it, it's recommended to download it manually in advance to avoid network issues
- Place the plugin files: Put the extracted folder directly into the project directory
- Enable the plugin: Find and enable the MCP plugin in Godot's settings
- Critical step — enable remote access: This step is easy to overlook, but you must check the "Remote Access" option before running

Once launched, provide the generated address to Claude, Cursor, or other AI coding tools, and the AI can connect to Godot via MCP to start coding.
Game Design Document: The Prerequisite for AI Development
Before letting AI write code, the most important step is preparing a game design document. The creator also had AI generate this document — since Match-3 is a classic genre, AI is very familiar with its mechanics, making the document generation process smooth.
Algorithmic Complexity of Match-3 Games: Match-3 games may seem simple, but they actually involve multiple layers of algorithm design: board state detection (identifying three or more adjacent same-colored elements), gravity-based drop simulation after elimination, new element filling, cascade processing, and special power-up trigger logic. From a decision theory perspective, Match-3 is a sequential decision problem with a finite state space, where each move changes the board state and affects subsequent available moves. For an AI Agent, the optimal strategy must balance current elimination scores with future board layouts — this is essentially a trade-off between short-term gains and long-term planning, similar to the exploration-exploitation dilemma in reinforcement learning.
But here's an important reminder: if you're building a more complex game, make sure to thoroughly define the game mechanics before letting AI code. Otherwise, AI will get stuck during development and may fail to produce anything at all. The quality of the design document directly determines the success or failure of AI-driven development.
After completing the design document, the initial version generated by AI was a human-playable version composed of basic grids and dots. The recommended development strategy is: don't worry about visual polish first — make sure the game logic works, then replace all images and art assets at once at the end.
A Game Architecture Designed Specifically for Agents
Why Not Simulate Clicks?
Humans play Match-3 games through clicking and dragging. If AI also plays by simulating clicks, two problems arise:
- You can't distinguish between human and AI operations, so the game isn't truly "designed for Agents"
- It's slow and unnatural — the AI is essentially adapting to a human interaction paradigm

Interface-Based Front-End/Back-End Separation
The creator designed an elegant architecture that completely decouples "AI gameplay" from "human spectating":
AI Side (Data Interface): A small server built with Python provides the following core interfaces:
Create Game: Starts a new round and returns the initial board matrixExecute Move: AI selects which elements to eliminate based on the board layout and game rulesGet State: Returns board changes, scores, and other data after each move
Human Side (Visual Playback): Godot is exported as an HTML version that displays each of the AI's moves through a playback interface, including board changes, movement paths, and scoring.

The Deeper Philosophy of This Architecture: This design of separating the AI decision layer from the visualization layer actually draws from modern microservice architecture and game server design principles. In traditional game AI research (such as DeepMind's AlphaGo and OpenAI Five), AI similarly doesn't directly manipulate graphical interfaces but makes decisions through abstract state representations (such as board matrices or game frame data). This design eliminates the additional complexity of visual perception, allowing AI to focus on pure strategic reasoning. Meanwhile, the playback system is similar to a game's "replay" feature — recording state changes at each step for post-hoc analysis at any speed, which is crucial for debugging AI behavior and understanding decision processes.
The essence of this design is: AI doesn't need a UI — it only needs data to reason and make decisions; human spectators need a visual interface to understand the AI's decision process. The two are perfectly isolated through the interface layer, each getting exactly what they need.
Agent Self-Evolution: From Failure to Victory
Round 1: Defeat and Reflection
In the first round, the Agent read the rules document and started playing directly, with no strategic guidance. It scored only 1,465 points in 20 moves — a clear gap from the 2,000-point target.
But interestingly, after the game ended, the Agent automatically performed a post-mortem, analyzing the reasons for failure and proposing improvement ideas.

Round 2: Successful Completion After Strategy Adjustment
Based on the lessons from Round 1, the Agent adjusted its strategy — focusing more on triggering multi-row or multi-column eliminations for higher scores. It successfully cleared the level in Round 2, validating the Agent's self-iteration capability.
Technical Principles Behind Agent Self-Iteration: The Agent's ability to automatically summarize and improve its strategy after failure demonstrates the "In-Context Learning" characteristic of large language models. Unlike traditional reinforcement learning, which requires thousands of trial-and-error iterations, LLM-based Agents can achieve strategy optimization in very few attempts through natural language reflection — what academia calls the "Reflexion" architecture. The core of this capability lies in the model's ability to convert failure experiences into lessons described in natural language, then reference these lessons as contextual information in subsequent decisions, thereby avoiding repeated mistakes.
This process reveals a key insight: if these experiential strategy insights were included in the initial prompt, the Agent might have succeeded on the first try. This is precisely the value of prompt engineering — encoding experiential knowledge into initial instructions is equivalent to providing the Agent with domain expert prior knowledge, dramatically reducing exploration costs. This also explains why in enterprise AI applications, prompt optimization often delivers more significant performance improvements than model upgrades.
A Deeper Purpose: Agent Capability Evaluation
This project's significance goes far beyond just being "fun." The creator revealed two deeper objectives:
Horizontal Comparison of Decision-Making Capabilities Across Models
Using the same game scenario and identical prompts, you can objectively evaluate:
- The reasoning and decision-making capabilities of different large models (such as Claude, GPT, etc.)
- Performance differences of the same model under different frameworks
This is far more intuitive and practical than traditional benchmark testing.
Industry Context: Current AI model evaluation primarily relies on standardized benchmarks (such as MMLU, HumanEval, GSM8K, etc.), but these tests face risks of data contamination and disconnection from real-world applications. Using games as evaluation scenarios is an emerging trend — games provide controlled environments with clear rules, quantifiable outcomes, and adjustable difficulty. Compared to static test questions, game-based evaluation can assess an Agent's multi-step reasoning, strategic planning, and environmental adaptation capabilities holistically. Similar approaches have already appeared in academia, such as Stanford's "Generative Agents" using a virtual town to evaluate social abilities, and Google DeepMind using various games to evaluate general intelligence. What makes this project unique is that it targets everyday developers, providing a low-barrier Agent evaluation framework.
Validating the Real-World Impact of Prompt Engineering
By comparing Agent performance with "no strategy prompts" versus "strategy-enriched prompts," you can very intuitively see the impact of prompt quality on model output. This is especially helpful for beginners to understand the importance of prompt engineering.
Conclusion and Future Outlook
This project demonstrates a complete AI-driven game development loop: AI designs the document → AI writes the code → AI plays the game → AI self-optimizes. For developers who want to try game development but lack experience, the combination of Godot + MCP + AI coding tools provides a viable path.
The creator plans to develop more complex games in the future to test the boundaries of Agent capabilities. This direction is worth following closely — when games are no longer designed solely for humans but become touchstones for AI capabilities, game development itself will enter a new paradigm.
Key Takeaways
- Using Godot engine's MCP plugin, AI can write game scripts directly through interfaces, enabling Match-3 game development with zero game dev experience
- A front-end/back-end separated architecture was designed: AI plays through data interfaces while humans spectate through visual playback, completely decoupled
- After failing the first round, the Agent automatically summarized improvement strategies and cleared the level on the second attempt, demonstrating AI's self-iteration capability
- The project's deeper purpose is building an Agent capability evaluation platform, using the same game scenario to horizontally compare different models and frameworks
- Validates the importance of prompt engineering: including strategy information in initial prompts can significantly improve the Agent's first-attempt success rate
Related articles
TutorialsCursor + Codex Dual-IDE Collaboration: A Practical Methodology for Open-Source Project Customization
A complete methodology for open-source project customization based on real-world experience, detailing the Cursor+Codex dual-IDE workflow, seven-stage process, MVP validation, and AI source code reading techniques.
TutorialsCursor Multi-Agent in Practice: Building a Full-Stack Next.js Blog in 50 Minutes
Build a full-stack blog in 50 minutes using Cursor IDE's multi-Agent mode with Next.js, Clerk auth, and Supabase. Learn the 4-phase AI Agent workflow and key integration pitfalls.
TutorialsBuilding an AI Software Factory from Scratch: A Cursor Engineer's Hands-On Experience with Multi-Agent Collaboration
Cursor engineer Eric shares practical insights on building an AI software factory: automation levels, guardrail design, parallel Agent management, and scaling to 1000+ Agents for 24/7 development.