TraeHarness Open-Sourced: 18 AI Agents Form a Virtual Development Team

TraeHarness open-sources 18 AI Agents that form a virtual dev team covering the full software lifecycle.
TraeHarness is an open-source multi-agent collaboration framework that deploys 18 specialized AI Agents — including Product Manager, Architect, Frontend, Backend, and Testing roles — orchestrated by a Master Controller. It automates the full software development lifecycle through a four-stage pipeline with six-dimensional quality inspection, targeting indie developers and small teams seeking to multiply their productivity.
How Much Can AI Help When One Person Does the Work of an Entire Team?
The pain points of indie developers and small teams are well known: one person has to juggle requirements analysis, architecture design, front-end and back-end development, testing, and deployment. While traditional AI tools can help with writing code, they're virtually useless when it comes to design, testing, and team collaboration. A recently open-sourced project that's been gaining attention on Bilibili, TraeHarness, takes an entirely new approach to solving this problem — instead of giving you a smarter AI assistant, it gives you a virtual expert team of 18 specialists.
The project is open-sourced on the IMA platform, and its core philosophy is to map the multi-role collaboration workflows in software engineering into an automated pipeline of cooperating AI Agents.
Multi-Agent Systems (MAS) are a core research direction in distributed artificial intelligence, with theoretical foundations tracing back to distributed problem-solving research in the 1980s. In the era of large language models, multi-agent architectures have experienced explosive growth. Since 2023, projects like Stanford's "Generative Agents" experiment, MetaGPT, AutoGen, and CrewAI have emerged one after another, demonstrating that multiple AI Agents — through role specialization and protocol-based communication — can accomplish complex tasks far beyond the capability boundaries of any single model. The core principle involves decomposing a complex problem into multiple subtasks, with each Agent focusing on reasoning and decision-making within a specific domain, coordinating through structured message-passing mechanisms. This avoids the attention degradation and role confusion that plague single models in long-context scenarios. TraeHarness is a practitioner of this technological wave.

TraeHarness Architecture: Master Controller + 18 Specialized Agents
Role Division Mirrors a Real Software Team
TraeHarness has an ambitious architectural design. Rather than simply having one large model play multiple roles, it establishes 18 independent specialized Agents, each with clearly defined responsibilities:
- Product Manager Agent: Handles requirements gathering, module decomposition, and requirements document generation
- Architect Agent: Plans database structures and designs API interfaces
- Frontend Agent: Generates page components and interaction logic
- Backend Agent: Implements business logic and API development
- Testing Agent: Executes automated testing and quality acceptance
- Data Analyst Agent: Data cleaning and visualization
Above all Agents sits a Master Controller responsible for global orchestration, ensuring orderly collaboration between all roles. The Master Controller pattern has deep engineering roots in distributed systems, similar to the Orchestrator pattern in microservices architecture. In Kubernetes, the Master node is responsible for scheduling Pod creation and destruction; in workflow engines like Apache Airflow, the scheduler determines task execution order and dependencies. TraeHarness transplants this concept to the AI Agent domain — the Master doesn't directly execute specific development tasks but instead maintains global state, manages the task dependency graph, assigns work to downstream Agents, and arbitrates when conflicts or exceptions arise. The advantage of centralized orchestration is global controllability, but it also faces the risk of a single point of bottleneck: if the Master's understanding drifts, errors propagate to all downstream Agents.
This design philosophy draws from the organizational structure of real software teams, making AI's division of labor and collaboration more closely resemble how human teams operate.
Four-Stage Development Pipeline: From Requirements to Delivery
The entire workflow is divided into four stages, each seamlessly connected to form a complete delivery pipeline:

- Requirements Analysis Stage: The Product Manager Agent automatically organizes requirements, breaking down vague ideas into concrete functional modules and outputting structured requirements documents.
- Solution Design Stage: The Architect Agent plans database table structures and designs interface specifications based on the requirements document, ensuring subsequent development has a solid foundation.
- Execution & Scheduling Stage: Frontend and Backend Agents develop in parallel with real-time code generation, while the Master coordinates dependencies and development cadence.
- Quality Acceptance Stage: This is TraeHarness's most noteworthy phase — it employs a six-dimensional quality inspection mechanism that gates deliverables across dimensions including format compliance, logical correctness, and functional completeness. Outputs that don't pass are sent back for rework.
Traditional software engineering quality assurance (QA) systems typically encompass multiple dimensions including Code Review, unit testing, integration testing, end-to-end testing, performance testing, and security auditing. TraeHarness's six-dimensional quality inspection mechanism essentially automates these manual QA processes. It's worth noting that "AI evaluating AI output" isn't without precedent — in machine learning, the discriminator in GANs (Generative Adversarial Networks) is designed to evaluate the quality of the generator's output; in LLM applications, "LLM-as-Judge" (using large models to evaluate large model outputs) has become an industry-standard evaluation method, widely used by companies like OpenAI and Anthropic in model alignment. However, its limitations are also clear: the evaluation model and the generation model may share the same knowledge blind spots, making systematic biases difficult to detect.

Multi-Agent Safety Mechanisms: Six Rules Define Behavioral Boundaries
The biggest risk in multi-agent systems is loss of control. TraeHarness addresses this with explicit constraint design, establishing six prohibitive rules to define clear behavioral boundaries for AI. The core principles can be summarized as three "don'ts":
- Don't overstep: Each Agent can only act within its own scope of responsibility and cannot cross boundaries to operate on other modules
- Don't skip steps: Stages must be executed strictly according to the pipeline sequence — you can't skip requirements analysis and jump straight to coding
- Don't fabricate: Fabricating test results or inventing data is prohibited — all outputs must be verifiable
This safety mechanism design philosophy is worth adopting by other multi-agent projects. Many current AI Agent frameworks focus too heavily on capability expansion while neglecting the importance of behavioral constraints. In real-world engineering scenarios, an AI that "knows what it shouldn't do" is often more reliable than one that "tries to do everything." This concept aligns closely with the "Alignment" philosophy in AI safety — frontier labs like OpenAI and Anthropic invest enormous resources into researching how to make AI systems follow human intent and rule boundaries. TraeHarness offers a concrete constraint implementation at the engineering practice level: behavioral boundaries are hard-coded through system instructions in Prompt engineering, combined with a Stage Gate mechanism to ensure Agents cannot bypass established processes.
Real-World Scenario: One-Click E-Commerce Backend Generation
The project showcases several typical application scenarios, with the most compelling being e-commerce backend development:

Requirements are automatically decomposed into four major modules, refined down to individual feature points. Frontend and Backend Agents collaborate on development, generating page components, API interfaces, and database table structures all at once. After automated tests pass and the interface preview is confirmed, one-click deployment goes live.
Beyond this, TraeHarness also demonstrates two supplementary scenarios:
- Excel Data Analysis: The Data Analyst Agent automatically cleans data and outputs visualized charts, suitable for data-driven business decisions
- Knowledge Base RAG Retrieval: Precisely locates document fragments and generates verifiable answers, solving enterprise internal knowledge management challenges. RAG (Retrieval-Augmented Generation) is a technical paradigm proposed by Meta AI in 2020, designed to address the knowledge timeliness and hallucination problems of large language models. It works by first converting enterprise documents, knowledge bases, and other unstructured data into vectors using Embedding models, stored in vector databases (such as Pinecone, Milvus, Chroma). When a user asks a question, the system first retrieves the most relevant document fragments through semantic search, then injects these fragments as context into the large model's prompt, enabling the model to generate answers based on real data. RAG's core value lies in making AI answers "verifiable," significantly reducing the probability of model fabrication — particularly suitable for scenarios requiring high accuracy such as enterprise knowledge management and customer service Q&A.
The project team claims: 300% efficiency improvement, 80% reduction in bug rate, and a qualitative leap in delivery speed. Of course, these numbers need validation across more real-world projects, but the direction of efficiency gains from multi-agent collaboration is clear.
A Sober Perspective: Opportunities and Challenges of Multi-Agent Collaboration
From a technology trend perspective, TraeHarness represents an important direction in AI tool evolution: from single-point assistance to full-process collaboration. Traditional AI coding tools (like Copilot and Cursor) primarily address efficiency in the single step of "writing code," while multi-agent frameworks attempt to cover the complete chain from requirements to delivery.
Looking back at the evolution of AI-assisted programming tools, three distinct phases emerge. The first phase was code completion, represented by GitHub Copilot (released in 2021), which used OpenAI's Codex model to provide line-level and function-level code suggestions. The second phase was interactive programming, represented by AI-native IDEs like Cursor and Windsurf, where developers could use natural language conversations to have AI understand project context and make cross-file modifications. The third phase is the full-process Agent-ification that TraeHarness represents — AI is no longer just a developer's "copilot" but attempts to take over the complete software engineering workflow from requirements analysis to deployment. This evolutionary direction aligns with the vision behind products like Cognition AI's Devin (dubbed "the first AI software engineer") and OpenAI's Codex Agent, reflecting the industry's paradigm shift from "AI assisting humans" to "AI replacing processes."
However, several challenges deserve sober consideration:
- Information loss in inter-Agent communication: Will understanding drift occur in information passing among 18 Agents? In real teams, communication overhead is itself the biggest efficiency killer. The classic software engineering book The Mythical Man-Month pointed out as early as 1975 that communication overhead grows exponentially with team size. While communication between AI Agents is structured text transfer — theoretically more precise than human verbal communication — the cumulative error problem in long-chain reasoning by large language models remains significant. Each information translation between Agents can introduce semantic drift.
- Adaptability to complex business needs: E-commerce backends are relatively standardized scenarios. How flexible is the automated pipeline when facing highly customized business requirements?
- Reliability of quality acceptance: Six-dimensional quality inspection sounds comprehensive, but when AI evaluates AI's output, does it face the limitation of "grading your own homework"?
Regardless, as an open-source project, TraeHarness provides a referenceable practical framework for implementing multi-agent collaboration in software engineering. For indie developers and small teams, the maturation of such tools will truly realize the vision of "one person equals an entire team."
Interested developers can find the project's open-source code on the IMA platform and experience this virtual team's collaborative capabilities firsthand.
Related articles

Trae AI Coding Tool: Complete Guide to Download, Installation, and Getting Started
Complete guide to ByteDance's Trae AI editor: core features, download & installation, Python setup, and AI chat coding. Free, Chinese-native, no VPN needed.

Codex vs Claude Code Cost Comparison: Breaking Down the Real Reasons Behind the 10x Price Gap
Codex costs $15 vs Claude Code's $155 for the same task. We break down the 10x price gap across Token pricing, consumption, and work patterns with practical tips.

Interview with Claude Code Lead: AI Programming ROI Mindset, Loops, and the Evolving Role of Engineers
Claude Code lead Boris Cherny shares insights on 100% AI coding, ROI thinking frameworks, Loops automation, Fable model capabilities, and how engineers are shifting from coding to product intuition and system design.