Vibe Engineering in Practice: AI Evolves from Code Generator to Programming Teammate

From Vibe Coding to Vibe Engineering

Earlier this year, Andrej Karpathy introduced the concept of "Vibe Coding," sparking a wave of excitement across the tech community. Karpathy, former Tesla AI Director and OpenAI co-founder, described through social media a programming approach where developers rely entirely on AI-generated code without reviewing it line by line—perfectly capturing the real experience of countless developers using tools like GitHub Copilot and Cursor. The core idea can be summarized as three P's: Prompt, Paste, Pray—you send the AI a prompt based on intuition, don't bother understanding the code it writes, paste it directly into your project, and pray it works. If something breaks, you feed the error message back to the AI and repeat the cycle.

Indeed, for prototyping, hackathons, or exploring new tech stacks, Vibe Coding is blazingly fast. But it has a fatal weakness: Context Amnesia. The AI doesn't remember promises made in previous conversations—it only recognizes what's in the current session. This problem stems from a fundamental architectural limitation of large language models. Current LLMs are built on the Transformer architecture, where the attention mechanism is constrained by a fixed context window. Even when windows expand to millions of tokens, models still cannot maintain memory across sessions—every new conversation starts from scratch. This means AI cannot remember architectural decisions, coding standards, or business logic constraints agreed upon in previous conversations unless that information is explicitly re-injected into the current context. Without proper context, AI makes assumptions it considers "reasonable," choosing the most direct but potentially most dangerous path—like quietly introducing security vulnerabilities into your code when you're not paying attention.

This brings us to today's protagonist: Vibe Engineering—a responsible new model of AI collaboration. In this paradigm, AI is no longer an emotionless code output tool but a trustworthy programming teammate; and you're no longer a simple prompt sender but an architect orchestrating the big picture.

Real-World Data from Inside OpenAI

According to data shared by OpenAI developers, their internally developed Codex has been integrated into the daily workflows of 92% of internal developers. Codex is OpenAI's AI Agent product designed for software engineering. Unlike earlier code completion tools, it can autonomously execute complete development tasks in a sandboxed environment—including reading codebases, writing code, running tests, and submitting PRs. Codex runs in an isolated cloud environment with access to complete project repositories and toolchains, giving it engineering capabilities that go far beyond simple code generation. The 92% internal adoption rate means it has transformed from an experimental tool into core infrastructure. Codex gates every Pull Request, resulting in fewer bugs and faster iterations. Even non-technical colleagues can collaborate through it, significantly boosting development collaboration efficiency across the entire company.

But what truly matters isn't "how many lines of code AI wrote"—it's how to ensure the quality of AI-produced code. This is precisely the core problem Vibe Engineering aims to solve.

Case Study: 12-Hour Kotlin-to-Rust Rewrite

OpenAI engineer Aaron Frio shared a highly compelling Vibe Engineering case study: having Codex rewrite a Kotlin project from scratch into Rust within 12 hours, with a requirement of 100% compatibility.

But this is an extremely difficult task

The difficulty level of this task was extremely high. Kotlin and Rust are languages with fundamentally different design philosophies—Kotlin runs on the JVM with garbage collection and a null-safe type system; Rust guarantees memory safety at compile time through its Ownership system and Borrow Checker, with no runtime garbage collection. Cross-language rewrites involve not just syntax translation but require redesigning memory management strategies, concurrency models, and error handling patterns. Additionally, Bazel is Google's open-source build system, known for its complex dependency management and sandboxed builds. Its documentation and community resources are far less abundant than Maven or Cargo, and there's insufficient Bazel open-source content on the internet, further increasing the difficulty for AI to complete the task. Frio's approach embodied the core principles of Vibe Engineering:

Planning First: Establishing a Clear Execution Blueprint

Frio's prompt required Codex to first create a planning document, enabling the Agent to execute tasks over extended periods while tracking objectives without drifting off course. This wasn't about telling AI to "start writing code"—it was about establishing a clear execution blueprint first.

Sub-Agent Collaboration: Multi-Agent Parallel Execution

Codex employed a sub-agent collaboration mechanism, representing the cutting edge of current AI system design:

A Watchdog was set up to monitor the main objective, preventing the primary Agent from hallucinating or cutting corners. The Watchdog pattern borrows from the sentinel monitoring concept in distributed systems—a dedicated Agent that doesn't execute specific tasks but continuously monitors whether the main Agent's output deviates from objectives or produces hallucinations (AI generating content that appears plausible but is actually incorrect)
Multiple Research Agents were dispatched in parallel to study upstream project code that needed to be replicated and investigate differences between Bazel versions. Research Agents are similar to technical research roles in human teams, responsible for collecting and organizing external information

This division-of-labor collaboration model allows complex tasks to be decomposed into manageable subtasks, with each Agent focused on its own responsibilities. The overall system reliability far exceeds that of a single Agent.

Autonomous Closed Loop: Writing Code, Running Tests, and Fixing Bugs as One

Over 12 hours, the AI autonomously completed the full loop of writing code, running tests, and fixing bugs. In traditional software engineering, the value of a Test Suite is primarily reflected in regression testing and refactoring confidence; in the AI collaboration era, the value of testing is exponentially amplified—it becomes the only reliable mechanism for AI self-verification. After AI completes code modifications, it can immediately run tests to verify correctness, forming an autonomous "write-test-fix" closed loop. When AI can check its own work, its performance improves dramatically—this is the core manifestation of Agentic Coding capabilities.

Ultimately, the AI delivered comprehensive documentation, CI pipelines, and nearly 600 test cases. This was a truly production-grade project ready to ship.

Core Principles of Vibe Engineering

One of the key factors determining whether AI can produce high-quality code

Context Engineering

The key factor determining whether AI can produce high-quality code is fundamentally Context Engineering. You need to provide AI with high-quality, structured context, including:

Code Style guidelines
Team Rules (collaboration conventions)
PRD documents (Product Requirements Documents)
Architecture documentation

With these, AI can behave like a newly onboarded colleague, following team best practices by reading these documents.

There's a critical engineering technique here called Context Engineering Primitives: decomposing team wisdom into multiple basic, reusable context documents, and persisting them in the project or a team-shared context repository. In practice, teams create a series of standardized Markdown or YAML files stored in specific locations in the project root (such as .cursor/rules, .github/copilot-instructions.md, or AGENTS.md), covering API design standards, error handling strategies, database operation conventions, secure coding guidelines, and more. These files serve as both reference documentation for human team members and behavioral guides for AI Agents. When a new AI session starts, the system automatically loads relevant context, ensuring AI output conforms to team standards and fundamentally solving the context amnesia problem.

The Amplification Effect of Engineering Practices

AI dramatically amplifies the value of existing engineering practices. The more mature a project's engineering foundation, the more powerful AI becomes. Only with a robust, comprehensive Test Suite can you confidently let AI refactor and iterate while verifying correctness. In code areas without test coverage, AI can only rely on pattern matching from its training data to judge correctness, which easily introduces subtle logic errors. Therefore, investing in testing infrastructure is no longer merely an engineering best practice—it's a prerequisite for effectively using AI.

In other words, if your project already lacks test coverage and documentation standards, AI will only amplify these problems rather than solve them.

Humans Remain the Ultimate Guarantors

A clear plan and design document makes AI execution more precise, but code review is more important than ever. You remain the ultimate guarantor of code quality. Vibe Engineering isn't about relinquishing control—it's about exercising control at a higher level.

Sub-Agent Parallel Exploration: Data-Driven Architecture Decisions

The most transformative capability of Agentic Coding is support for Sub-Agent parallel exploration. Traditional architecture decisions often rely on senior engineers' experience and limited proofs of concept (PoCs). Constrained by manpower and time, teams can typically only deeply explore one or two options. Sub-Agent parallel exploration fundamentally changes this dynamic: you can run multiple sub-agents simultaneously, having AI provide complete prototypes of multiple solutions at once—with performance benchmark data, memory usage analysis, and maintainability assessments—then make architecture decisions in a data-driven manner through benchmarks and code review. Architects no longer need to make irreversible technology choices based on intuition; instead, they can make comparative decisions based on actual runtime data. This pushes architecture design from "experience-driven" to "evidence-driven," significantly reducing technical debt risk and completely transforming the traditional linear "design first, implement later" workflow.

From Tool to Teammate: The Paradigm Shift of Vibe Engineering

The essence of Vibe Engineering is an evolution in mindset. It doesn't wholesale reject Vibe Coding but adds engineering discipline and systematic thinking on top of it:

Dimension	Vibe Coding	Vibe Engineering
Human's Role	Prompt sender	Architect/Reviewer
AI's Role	Code generator	Programming teammate
Quality Assurance	Pray it works	Tests + Review + Documentation
Context	Current conversation	Structured knowledge base
Collaboration Model	Single-turn Q&A	Multi-Agent parallel

Following Vibe Engineering principles not only boosts individual productivity but is key to teams maintaining competitiveness in the AI era. Every technical expert has countless ideas shelved due to time and energy constraints—now is the time to transform AI from a simple code generator into your team's most professional and capable engineering teammate, building the great software that should have always existed.