Orchestrating AI Agents as State Machines: Stop Being a Human Confirmation Button

Starting from the "Human Confirmation Button"

Anyone who's been coding with AI recently probably shares the same feeling: after every step, the AI asks "Should I continue?" and your answer is always "Yes, yes, yes." On the surface it seems efficient, but in reality you've devolved into a human confirmation button on an assembly line.

In the current AI coding ecosystem, Skills handle stable behaviors, Subagents handle parallel execution, and MCP handles connections to external systems—each capability is powerful on its own, but most people are still stuck at the single-point invocation stage, never truly orchestrating these combined capabilities.

It's worth noting that MCP (Model Context Protocol) is an open protocol released by Anthropic in late 2024, drawing inspiration from LSP (Language Server Protocol) in the editor domain—just as LSP unified communication between editors and language servers, MCP standardizes the connection interface between AI models and external tools, databases, and APIs, making "implement once, use everywhere" possible. The Subagent architecture, meanwhile, decomposes complex tasks to specialized sub-agents for parallel processing—frontend, backend, and QA each handle their own responsibilities without blocking each other, with each sub-agent's context window containing only information relevant to its role, improving both speed and focus.

The core problem isn't that the process is too slow—it's that the system hasn't learned to move forward on its own.

Rethinking AI Coding Through a Software Engineering Lens

The turning point came from an analogy: when building CI/CD pipelines, we design stages, set up Gates, use state machines to persist state, and support parallelism and checkpoint-based resumption. Since Agents are also execution units, why not manage them the same way?

Why not manage them the same way?

CI/CD (Continuous Integration/Continuous Delivery) is a core practice of modern software engineering, having evolved over decades into a mature methodology. Its underlying model is essentially a finite state machine: the system is in one of a finite number of states at any given moment (building, testing, awaiting approval, deployed...), transitioning between states through clearly defined conditions, with Gates serving as triggers for state transitions. Tools like Jenkins, GitHub Actions, and GitLab CI have engineered these concepts and accumulated extensive practical experience with parallel execution, failure retries, and checkpoint resumption.

Once this mental model clicks, the definition of an Orchestrator becomes crystal clear:

Doesn't create new Agents—only orchestrates existing ones
Doesn't write code itself, nor run tests directly
Only responsible for dispatching the right Agent and Skill at the right time
Waits for human decisions at critical nodes

The Four-Layer Architecture of an Orchestrator

The entire Orchestrator architecture is divided into four layers with clear separation of responsibilities:

YAML Templates: Define the workflow (Explore → Propose → Review → Implement → QA Test → Archive)
Orchestrator Skill: Enables the main Agent to execute orchestration logic
Pipeline Server: Manages state and APIs
Dashboard: Handles progress visualization, Gates, and log display

Orchestration logic, runtime state, and visualization are explicitly separated here. The benefit of this layered design is that each layer can iterate independently without coupling to the others.

The choice of YAML templates as the workflow definition language is particularly deliberate. YAML has become the de facto standard configuration language in the DevOps world, widely adopted by Kubernetes, Ansible, and GitHub Actions. Compared to code, YAML lowers the barrier for non-technical people to understand and modify workflows; compared to graphical interfaces, YAML naturally supports version control and can be diffed, reviewed, and rolled back just like code. This realizes the concept of "Pipeline as Code"—team best practices are no longer passed down by word of mouth but are captured in a structured, auditable, and reusable form, following the same evolutionary path as Infrastructure as Code.

Gate Mechanism: From Mechanical Confirmation to Quality Judgment

Gates are the soul of the entire orchestration system. Instead of asking "Is this okay?" at every step, they let the AI advance autonomously and only pause at critical nodes like proposal confirmation, quality review, and test acceptance, presenting three clear options: pass, fix, or abort.

Gate mechanism explained

The most illustrative real-world scenario: once Gate 1 (proposal review) passes, the system automatically marks the Implement stage as Active and spins up backend and frontend SubAgents in parallel. After both implementations complete, it automatically proceeds to the next stage. The entire process no longer requires someone watching and pushing things forward step by step.

The Real Value of QA Gates

After the QA Tester runs, it doesn't simply output "pass/fail"—it provides a structured report: which items are already fine, which items need fixing, and then leaves the decision to the human.

The real value of Gates

This way, humans focus on quality judgment rather than mechanical confirmation. This is a fundamental role shift—from operator to decision-maker.

Parallel Requirements and Team Collaboration

When multiple requirements are progressing simultaneously, the Dashboard displays each pipeline's stages, Gate status, active SubAgents, and audit logs on a single board. It delivers more than just visualization—it provides state memory and organizational collaboration capability when switching between multiple requirements.

Team-level value

The Amplification Effect in Team Collaboration

In team collaboration scenarios, the Orchestrator's value is further amplified:

Experience capture: Expert knowledge no longer lives only in people's heads—it's captured in reusable YAML templates
Lower barriers: New team members follow the pipeline and make decisions at Gates
Template reuse: The same set of Agents can be reused across backend, full-stack, and hotfix scenarios

This means a team's AI coding capability no longer depends on individual skill levels but on the quality of workflow templates.

Three Stages of AI Coding Evolution

Summarizing the evolution of AI coding into three stages makes the Orchestrator's position clearer:

Stage	Characteristics	Human Role
Tool invocation	Single-point use of AI capabilities	Operator
Process codification	Fixed steps, step-by-step confirmation	Confirmer
Orchestrator orchestration	State machine-driven, autonomous progression	Decision-maker

The real change isn't about having humans keep pressing confirm—it's about setting the flight path, letting AI cruise automatically, and having humans take over only at critical waypoints.

When Should You Use an Orchestrator

The core insight of this approach is actually simple: AI Agents, like microservices, are execution units that need orchestration. In the distributed systems domain, the Orchestrator pattern and the Choreography pattern are two classic approaches to service coordination: the orchestration pattern uses a central controller to explicitly direct the execution order of services—Kubernetes schedulers, Apache Airflow, and Netflix Conductor are typical implementations; the choreography pattern lets services coordinate autonomously through event responses. The orchestration pattern's core advantage lies in observability and controllability—all execution paths go through the central orchestrator, state changes are traceable, and exception handling follows clear protocols. This aligns perfectly with AI Agent scenarios. The decades of engineering practices accumulated in the CI/CD domain—state machines, Gates, parallelism, checkpoint resumption—can be directly transferred to AI coding scenarios.

However, it's important to note that the Orchestrator itself adds system complexity. For simple tasks, direct conversation may be more efficient; only when workflows are sufficiently complex, require multi-Agent collaboration, and have team reuse needs does this architecture truly deliver value. The key is finding that "worth orchestrating" tipping point.

Key Takeaways

The Orchestrator doesn't create new Agents—it only orchestrates existing ones, dispatching the right capability units at the right time
The Gate mechanism lets AI advance autonomously, pausing only at critical nodes (proposals, quality, testing) to await human decisions, transforming humans from mechanical confirmers to quality decision-makers
The four-layer architecture (YAML templates, Orchestrator Skill, Pipeline Server, Dashboard) achieves clear separation of orchestration logic, runtime state, and visualization
Team experience is captured and reused through YAML templates, lowering barriers for newcomers, with the same Agent set adaptable to different development scenarios
Three stages of AI coding evolution: tool invocation → process codification → Orchestrator orchestration—the core idea is letting AI cruise automatically while humans take over only at critical waypoints