The Complete AI Agent Engineering Tech Stack: A Practical Guide to 100x Development Efficiency

From Chat Boxes to Agent Frameworks: A Paradigm Shift

Developers who ship 100x faster aren't typing prompts into chat boxes — they're running multiple agent frameworks in parallel. This is the core insight shared by Mickey, a senior developer whose code is 95% AI-generated, on the David Andre podcast.

As he puts it: "Models aren't perfect yet, but they've reached a stage where they deliver massive productivity gains. Especially when you understand your vertical — a little brainpower plus AI can take you very far."

This workflow doesn't rely on proprietary tools and is entirely replicable. The key lies in understanding the relationship between models, frameworks, and context engineering, and building effective feedback loops.

Frameworks and Models: Understanding the Foundation of AI Development

Models Don't Think — Frameworks Give Them Capabilities

A common misconception is equating the model with AI's full capabilities. In reality, a model is just a "next-token predictor" — you feed it English, it converts it to tokens, maps them onto a graph to predict the next token, then converts it back to English. The model itself can't do anything on its own.

It's worth understanding the nature of tokens here. Tokens are the fundamental units that large language models use to process text. Models don't directly understand human language — they split input text into tokens, which might be a complete word, a subword fragment, or even a single character. For example, "understanding" might be split into "under" and "standing" as two tokens. During training, models learn statistical relationships between tokens from massive text corpora. During inference, they predict the most likely next token based on the existing token sequence. This autoregressive mechanism means models are essentially extremely complex conditional probability calculators, not true "thinkers." Understanding this is crucial for using AI correctly — output quality is highly dependent on the quality and structure of the input.

The framework is what matters. A framework is the wrapper around the model, containing API tools, specific system prompts, Agents MD files, and more. These components guide the model to execute or focus on specific tasks. When you see "read this file" or "searched for this thing" in Cursor or any agent, those are tool-calling capabilities granted by the framework.

Tool Use (also known as Function Calling) is one of the core capabilities of modern AI agent architecture. Traditional LLMs can only generate text, but through the framework layer's tool-calling mechanism, models can "decide" to invoke external tools during inference — such as reading files, executing searches, calling APIs, or running code. Technically, the framework defines available tools' names, parameter formats, and functional descriptions in the system prompt. When the model determines it needs external information during response generation, it outputs a structured tool-call request. The framework captures this request, executes the corresponding operation, and injects the result back into the context for the model to continue reasoning. This "reason-act-observe" loop (the ReAct pattern) is the critical leap from simple chatbots to intelligent agents.

Model Selection Strategy

Mickey's model choices are very deliberate:

GPT 5.5 Extra High: Primary model, especially strong at understanding large, complex codebases
OPUS 4.7 Max version: Specifically for UI and frontend modifications
Gemini 3: Long-context, high-volume, large codebase scenarios

Agents MD is its own thing

He emphasizes: "If someone asks me whether they can use free models — no. You have to use the best models. The difference is night and day."

Why Cursor Is the Best Agent Framework

Benchmark data shows that Cursor outperforms Cloud Code and Codex in model performance. While Cursor doesn't subsidize users like some competitors, it allows flexible switching between models, and its new agent view is excellent.

Context Engineering: The Decisive Factor in Agent Performance

Context engineering is the core element that determines whether an AI agent performs well or poorly. When agents underperform, the issue is usually not insufficient model capability — it's insufficient context.

Open Source Tools: Giving Agents Complete Code Context

Open Source (a repository tool developed by Verso) fetches the source code of whatever packages or libraries you're using and dumps it into your codebase. This gives the agent complete context when executing tasks.

Key techniques:

Place downloaded source code in a folder like /open_source
Use .gitignore to exclude these files from your repository
Add an instruction file telling the agent: "When in doubt, reference the code in the open_source folder"

Repo Prompt: Building Context from GitHub Links

Repo Prompt takes GitHub links and helps you build structured prompts. It quickly transforms the code structure of open-source projects into context information that agents can understand.

Yak: Codebase Knowledge Graphs

Yak chunks your codebase into a knowledge graph, scanning the entire codebase's tree structure and providing file set summaries and key dependency relationships, helping the model make better decisions.

Core Principles of Context Engineering

Keep context highly relevant — only include what's needed. Many people think more context is better, but in practice, too much context is just noise. Give the model a brief description; you don't need to describe every variable. Good frameworks perform multiple searches on queries, rank and re-rank results — essentially a mini RAG pipeline.

RAG (Retrieval-Augmented Generation) is a technical architecture that combines external knowledge bases with large language models. A standard RAG pipeline works as follows: first, documents are chunked and converted into vectors via embedding models, stored in a vector database. When a user submits a query, the system converts the query into a vector as well, using similarity search to find the most relevant document chunks. Finally, these chunks are injected as context into the model's prompt, allowing the model to generate answers based on retrieved information. The "multiple searches, ranking, and re-ranking" mentioned here are typical features of advanced RAG pipelines — using multi-path retrieval and cross-encoder re-ranking to improve retrieval precision, ensuring the context injected into the model is highly relevant rather than redundant noise.

Agents MD: Onboarding Your Agent

Agents MD is a file placed in the root directory of your codebase that the agent reads every time it opens a new conversation. Mickey includes:

Key design patterns
Languages used
File structure and where to find things
Architectural decisions

"It's like onboarding a new junior developer — you're onboarding the agent."

He recommends splitting by topic: one for backend patterns, one for frontend, one for specific package dependencies. This makes maintenance easier, and the agent learns from them far more effectively.

Automated Review Loops: Building a Quality Assurance Feedback Loop

CI Agent Hermes

Mickey uses Hermes as a CI agent, automatically triggering reviews via GitHub Actions when a PR is created. Hermes doesn't merge PRs, but it sets labels and performs initial reviews — essentially a PR review assistant that doesn't require a human.

CI (Continuous Integration) is a core practice in software engineering where developers frequently merge code changes into the main branch, with each merge verified through automated builds and tests. GitHub Actions is GitHub's CI/CD automation platform, allowing developers to define workflows that execute automatically on specific events (such as creating a PR or pushing code). Traditional PR (Pull Request) reviews require team members to manually read code changes, check for logic errors and style issues — often a bottleneck in the development process. Introducing AI agents into the CI pipeline means every code submission receives instant automated review feedback, dramatically shortening the cycle from submission to merge while maintaining consistent code quality.

He's pushing Hermes to go beyond PR reviews — also checking whether tests need updating, whether there are broken references, security issues, and more.

GrapLoop: Automated Fix Cycles

GrapTile provides a confidence score for code reviews (on a 5-point scale). GrapLoop works as follows:

The agent reads the PR and review feedback
Fixes the issues
Waits for new review results
If it hasn't reached 5/5, continues fixing
Only stops when it hits a perfect score

Clean code structure

"I've had sessions lasting 20-30 minutes where the agent in Cursor tells me I made a mistake, GrapTile catches it, fixes it, pushes to GitHub, and automatically stops when the new review hits 5/5."

Code Structuring: Service Layers and Continuous Refactoring

Why Structure Matters

A common problem with agents is that they won't reuse existing functions — they'll write a new one instead. This causes the codebase to grow endlessly, with too many moving parts, making it difficult for even the agent itself to debug.

The solution is a service layer — creating functions that can be reused repeatedly. The Service Layer is a classic design pattern in software architecture, originating from Domain-Driven Design (DDD) principles. It establishes a clear boundary layer between business logic and the presentation layer, encapsulating reusable business operations as independent service functions. For example, operations like "create user," "send notification," and "process payment" are each encapsulated in corresponding service modules, and any part of the system that needs that functionality calls the same function rather than reimplementing it. This pattern is especially important for AI agent development because current LLMs tend to "solve problems in place" when generating code — reimplementing functionality in the current context rather than searching for and reusing existing implementations. A service layer reduces the probability of agents producing duplicate code by providing a clear function directory.

Workflow:

Build features with GPT 5.5 + Cursor
Test locally until passing
Run structuring passes to find duplicate code and refactor

"When code structure is cleaner, agents find it easier to continue working in new sessions. If a human finds it hard, the agent probably does too."

The Logic Behind Tech Stack Choices

Svelte over React for frontend: Svelte stays close to core HTML and TypeScript principles, which agents handle better. React is currently the most widely used UI framework in frontend development, using a virtual DOM and JSX syntax to build user interfaces through a declarative component model. Svelte takes a fundamentally different compile-time approach — it compiles components into efficient native JavaScript during the build phase, eliminating the need for virtual DOM diffing at runtime. Svelte's syntax is closer to native HTML, CSS, and JavaScript, with intuitive component file structures and state management triggered by simple variable assignment. This "what you see is what you get" quality makes it easier for AI agents to understand and generate Svelte code, since models don't need to handle React's hooks rules, closure pitfalls, useEffect dependency arrays, and other error-prone abstractions. For AI-assisted development, a framework's cognitive complexity directly impacts agent code generation quality.
Convex for backend: Everything is code, giving agents complete backend context without needing to screenshot dashboards. Convex is a full-stack backend platform whose core design philosophy is "everything as code." Unlike traditional backend services, Convex expresses database schema definitions, server functions, permission rules, and scheduled tasks entirely in TypeScript code rather than managing them through web consoles or configuration files. This means agents can fully understand the backend's data models, API endpoints, and business logic by reading code files, without relying on screenshots or documentation. Convex also provides real-time data synchronization, ACID transactions, and automatic caching. This "code as infrastructure" philosophy aligns perfectly with how AI agents work — agents excel at processing structured code text, not parsing unstructured documents or GUI screenshots.
Code is the best context: Documentation is the worst context source

Investment and Mindset: The Hidden Barriers of Agent Engineering

Don't skip that $200

Financial Investment Is Necessary

Mickey is blunt: "This is going to be a money game. We're being subsidized right now, but one day the subsidies will end." A $200/month subscription may seem expensive, but he used AI to review a contract and secured triple the originally offered compensation. He used Cloud Code to complete accounting work in two hours that would have otherwise cost five or six thousand dollars.

Hackathon Mentality

Every engineer should participate in hackathons because they:

Force you to deliver within 48 hours
Train your prioritization skills
Build AI usage intuition on low-risk projects

Shipping Beats Perfection

"Competitors will move fast, burn more tokens, and their app might not be as good as yours — but they'll win." San Francisco founders have an almost delusional belief — ship when the product barely works, build hype, raise funding, then iterate. Overthinking a simple feature without shipping is the biggest trap.

Security Awareness: A Required Course in the Agent Era

While heavily leveraging AI agents, security cannot be overlooked:

Two-factor authentication is a must — never use SMS verification
Tell agents not to install packages published less than 14 days ago
Set up a family code word to guard against AI voice-cloning scams
Use a password manager and distribute keys to trusted family members
When you discover a security vulnerability, paste the information into an agent and have it analyze whether you've been exposed

The Complete Agent Engineering Formula

Stringing all these elements together gives you a complete AI agent engineering workflow:

Cursor as the framework + the best models (GPT 5.5 / OPUS 4.7)
Open Source to download package/library source code for complete agent context
Structure after every feature to keep code clean
Run GrapLoop after creating PRs for automated fixes until perfect score
Give agents just enough tools and context to let them work in loops

The essence of agent engineering isn't letting agents think for you — it's giving them sufficient information, guardrails, and feedback loops. Human approval and human thinking remain critically important. As Mickey says: "Caring matters, being thoughtful matters, taking time matters."