Hermes Orchestrating DeepSeek + MiniMax Dual-AI Collaborative Coding: A From-Scratch Project Test

Experiment Background: Making Multiple AI Models Collaborate on Development

When we talk about AI programming, it's usually one model handling one task. But what if multiple AI models could collaborate like a team, coordinated by a "project manager"? A Bilibili creator conducted a bold experiment: using the Hermes agent as a coordinator to have DeepSeek V4 and MiniMax 2.7 collaboratively complete real software development tasks.

The Hermes agent belongs to the emerging "AI Agent" paradigm. Unlike traditional single-turn Q&A AI, Agents possess capabilities for autonomous planning, tool invocation, memory management, and task decomposition. They can break down a complex goal into multiple subtasks, execute them sequentially, and adjust strategies based on intermediate results. In multi-Agent systems, different Agents take on different roles (such as planner, executor, reviewer) and communicate through message-passing protocols—conceptually similar to service orchestration in microservices architecture. Similar frameworks in the industry include AutoGen, CrewAI, and LangGraph, all exploring how multiple AI entities can collaboratively complete complex tasks.

The core question of this experiment: Can multi-model collaboration independently complete programming projects with real practical value, with nearly zero human intervention?

Experiment Setup: Hardware and Model Configuration

Hardware and Architecture Design

The experiment employed an interesting architecture: a ZimaBoard 2 single-board computer as the runtime host, running the Hermes agent 24/7. ZimaBoard is an x86-based single-board server that, unlike ARM-based single-board computers like the Raspberry Pi, can directly run standard Linux distributions and x86 applications without cross-compilation. The ZimaBoard 2 typically features an Intel processor, onboard eMMC storage, and SATA interfaces, with power consumption of only about 6-10W—ideal for a lightweight 24/7 server. In this experiment, it serves as the Agent's runtime host rather than an inference compute node—the actual LLM inference is still performed via API calls to the cloud, and the ZimaBoard only needs to run the Agent's scheduling logic and network communication.

Two key reasons for choosing a single-board computer over a laptop:

Work continuity: Tasks won't be interrupted by closing a laptop lid
Environment isolation: The agent cannot access personal computer content, and host restarts or shutdowns won't affect agent operation

Connecting to ZimaBoard via SSH

Model Role Assignment

Hermes Agent: Runs on the ZimaBoard, responsible for task planning, assignment, and verification
DeepSeek V4: Serves as Hermes's primary reasoning model
MiniMax 2.7: Connected via OpenCode, responsible for actual code writing

DeepSeek V4 is a large-scale language model from DeepSeek, renowned for its powerful reasoning and code generation capabilities, performing excellently across multiple programming benchmarks. It uses a Mixture of Experts (MoE) architecture, maintaining high performance while controlling inference costs. MiniMax 2.7 is a model released by MiniMax, with unique advantages in long-context processing and code generation. The strategy of pairing them is: leveraging DeepSeek V4's strong reasoning for task analysis and planning, and MiniMax 2.7's code generation capability for actual coding—this complementary configuration maximizes the overall system's output quality.

The elegance of this architecture lies in: Hermes handles "thinking" and "managing," while MiniMax handles "executing," forming a complete AI development team.

Task One: Adding PDF Export to a Markdown Editor

Task Description and Execution Process

The first task was adding PDF and HTML export functionality to an existing Markdown editor project. The editor's backend is written in Rust, with the frontend rendering Markdown files.

The experimenter provided only a simple requirement description, then instructed Hermes to:

First create an implementation plan
Assign tasks to the MiniMax model one by one
Verify the completion of each task

Hermes creating implementation plan and assigning tasks

Throughout the process, Hermes demonstrated notable autonomy—it even discovered during project analysis that the ZimaBoard didn't have Rust installed and proactively reported the issue. This reflects an important Agent capability: environment awareness and exception handling. When an Agent encounters unexpected obstacles during task execution, it can identify the problem, assess the impact, and decide whether to resolve it independently or escalate to the human operator.

Execution Results

The entire task took approximately 9 minutes from start to finish. Hermes coordinated MiniMax to complete all subtasks step by step, with a unified verification at the end.

Task completion report

Actual test results:

✅ Application compiled without errors
✅ Markdown rendering works correctly
✅ Export menu correctly displays three format options
✅ PDF export successful, with text, formatting, and colors perfectly preserved
✅ HTML export output is completely identical to the in-editor display
⚠️ Images in PDF were not exported correctly (minor flaw)

The HTML export was evaluated as "absolutely perfect," completely matching the in-editor display. This result is quite impressive.

Task Two: Building an RSS Aggregation Service from Scratch in Nim

Why Nim as a Stress Test?

The second task significantly increased in difficulty: building an RSS article aggregation web service from scratch using the Nim programming language.

Nim is a statically-typed systems programming language, developed by Andreas Rumpf starting in 2008. Its design philosophy combines Python's readability, C's performance, and Lisp's metaprogramming capabilities. Nim code compiles to C, C++, or JavaScript, enabling extremely lean native binaries—which explains why the complete web service in this experiment compiled to less than 700KB. However, Nim's community is far smaller than mainstream languages; the number of Nim projects on GitHub is roughly one percent of Python's, meaning LLMs encountered extremely limited Nim code samples during training.

Choosing Nim served two purposes:

Nim is a relatively niche language with far less training data than Python or JavaScript, making it a rigorous stress test for the models
The experimenter personally likes Nim and wanted to see if AI could handle it

Configuration Adjustment: The Delegate Task Mechanism

For the second task, the experimenter adjusted Hermes's configuration. Instead of indirectly calling MiniMax through OpenCode, Hermes used its built-in delegate task mechanism to directly invoke external models.

Delegate Task is a common task delegation pattern in Agent frameworks. Under this mechanism, the primary Agent (Hermes) doesn't directly generate final code but instead packages specific coding tasks as structured instructions, passing them to specialized execution models via API calls. The advantage of this design is that the primary Agent can maintain a shorter context window focused on high-level planning, while the execution model receives complete task context for code generation. Compared to indirect invocation through intermediate tools like OpenCode, direct delegation removes a layer of abstraction, reduces information loss during transmission, and decreases additional token consumption and latency.

Hermes directly invoking external models via delegate task mechanism

Configuration steps:

Add additional provider descriptions in the Delegation section
Update Hermes's system prompt to specify using the Delegate Task tool for code writing

Execution Process and Final Results

The experimenter provided a detailed requirements prompt, from which Hermes generated a "quite large and detailed" implementation plan, then executed autonomously.

The core components of an RSS aggregation service include: an RSS/Atom format parser, a scheduled fetch scheduler, an article deduplication and storage engine, and a web frontend display layer. In this experiment, the AI needed to understand multiple technical domains—HTTP server setup, XML parsing, database operations, frontend template rendering—and integrate them into a complete application. This was a comprehensive test of AI's system design capabilities.

Interestingly, after implementation was complete, Hermes proactively conducted a technical audit and discovered 4 bugs. This "self-review" behavior is a noteworthy feature of Agent systems: it demonstrates that the coordinating Agent can not only assign tasks but also evaluate output quality, forming a built-in quality assurance loop.

Final test results:

✅ Project starts without errors, web service running on port 5000
✅ Pages display correctly with all major functional areas present
✅ RSS feed addition works successfully
✅ Articles sync automatically on startup
⚠️ Article title clicks are unresponsive (needs subsequent fix)
⚠️ Interface design is rather basic (UI requirements weren't specified in the prompt)

The most impressive data point: the entire fully-functional web service compiled to a single binary file under 700KB. This showcases both Nim's natural advantage of compiling to C for lean binaries and the fact that the AI-generated code didn't introduce unnecessary bloated dependencies. For comparison, a Node.js project with similar functionality could easily exceed 100MB in the node_modules directory alone.

Key Findings: Advantages and Limitations of Multi-Model Collaboration

Core Advantages of Multi-Model Collaboration

Clear role division: Hermes handles planning and verification, MiniMax handles code implementation—each fulfilling their role. This division borrows from the classic "architect + developer" team structure in software engineering
High autonomy: Apart from installing the Rust environment requiring human intervention, the entire coding process required almost no human participation
Quality assurance mechanism: Hermes verifies after each subtask completion and performs an overall audit at the end, forming a quality gate process similar to Code Review

Current Limitations

Minor bugs still require manual fixes: Such as PDF image export failure and unclickable article links
Audit-discovered bugs aren't auto-fixed: After Hermes found 4 bugs in the second task, it waited for human confirmation rather than auto-fixing. This may be a deliberate safety design—without explicit human authorization, the Agent shouldn't modify completed code on its own
Limited UI design capability: Without explicit design requirements, generated interfaces are rather rough. This reflects current language models' shortcomings in visual design—they excel at logical implementation but lack aesthetic judgment

Practicality Assessment

Based on the experimental results, the multi-AI collaboration model can already handle tasks ranging from "adding features to existing projects" to "building new projects from scratch." The efficiency of completing PDF export in 9 minutes is quite competitive even for human developers—a skilled developer completing the same feature (including consulting documentation, writing code, debugging, and testing) typically needs 1-3 hours. While the output code isn't yet production-ready, as a rapid prototyping tool, this model has already demonstrated enormous practical value.

Conclusion: AI Programming Moving from Single-Model to Multi-Model Collaboration

This experiment demonstrates an important trend: AI programming is evolving from "single-model conversation" to "multi-model collaboration." Through coordinating agents like Hermes, different models can leverage their respective strengths, forming a workflow similar to human development teams. This evolutionary path mirrors software engineering's own development history—from solo development to team collaboration, from waterfall processes to agile iteration.

While it cannot yet fully replace human developers, in scenarios like rapid prototyping and feature iteration, this model already possesses considerable real-world capability. As Agent frameworks mature and model capabilities continue to improve, we may see more complex multi-Agent development teams emerge—including specialized testing Agents, security audit Agents, documentation Agents, and more—forming a complete AI software factory.

Key Takeaways

The Hermes agent can coordinate DeepSeek and MiniMax models for division of labor, achieving a complete development workflow of planning-execution-verification
The first task (Markdown editor PDF export) was completed in just 9 minutes with perfect HTML export, though PDF export had a minor image rendering flaw
The second task used the niche Nim language to build an RSS aggregation service from scratch, with the final compiled binary under 700KB, demonstrating AI's ability to handle uncommon languages
Human intervention in the multi-model collaboration mode was minimal, mainly limited to environment configuration and final bug confirmation
Current limitations include detail bugs still requiring manual fixes, and limited UI output quality when explicit design requirements aren't provided

Experiment Background: Making Multiple AI Models Collaborate on Development

The core question of this experiment: Can multi-model collaboration independently complete programming projects with real practical value, with nearly zero human intervention?

Experiment Setup: Hardware and Model Configuration

Hardware and Architecture Design

Two key reasons for choosing a single-board computer over a laptop:

Work continuity: Tasks won't be interrupted by closing a laptop lid
Environment isolation: The agent cannot access personal computer content, and host restarts or shutdowns won't affect agent operation

Connecting to ZimaBoard via SSH

Model Role Assignment

Hermes Agent: Runs on the ZimaBoard, responsible for task planning, assignment, and verification
DeepSeek V4: Serves as Hermes's primary reasoning model
MiniMax 2.7: Connected via OpenCode, responsible for actual code writing

The elegance of this architecture lies in: Hermes handles "thinking" and "managing," while MiniMax handles "executing," forming a complete AI development team.

Task One: Adding PDF Export to a Markdown Editor

Task Description and Execution Process

The first task was adding PDF and HTML export functionality to an existing Markdown editor project. The editor's backend is written in Rust, with the frontend rendering Markdown files.

The experimenter provided only a simple requirement description, then instructed Hermes to:

First create an implementation plan
Assign tasks to the MiniMax model one by one
Verify the completion of each task

Hermes creating implementation plan and assigning tasks

Execution Results

The entire task took approximately 9 minutes from start to finish. Hermes coordinated MiniMax to complete all subtasks step by step, with a unified verification at the end.

Task completion report

Actual test results:

✅ Application compiled without errors
✅ Markdown rendering works correctly
✅ Export menu correctly displays three format options
✅ PDF export successful, with text, formatting, and colors perfectly preserved
✅ HTML export output is completely identical to the in-editor display
⚠️ Images in PDF were not exported correctly (minor flaw)

The HTML export was evaluated as "absolutely perfect," completely matching the in-editor display. This result is quite impressive.

Task Two: Building an RSS Aggregation Service from Scratch in Nim

Why Nim as a Stress Test?

The second task significantly increased in difficulty: building an RSS article aggregation web service from scratch using the Nim programming language.

Choosing Nim served two purposes:

Nim is a relatively niche language with far less training data than Python or JavaScript, making it a rigorous stress test for the models
The experimenter personally likes Nim and wanted to see if AI could handle it

Configuration Adjustment: The Delegate Task Mechanism

Hermes directly invoking external models via delegate task mechanism

Configuration steps:

Add additional provider descriptions in the Delegation section
Update Hermes's system prompt to specify using the Delegate Task tool for code writing

Execution Process and Final Results

The experimenter provided a detailed requirements prompt, from which Hermes generated a "quite large and detailed" implementation plan, then executed autonomously.

Final test results:

✅ Project starts without errors, web service running on port 5000
✅ Pages display correctly with all major functional areas present
✅ RSS feed addition works successfully
✅ Articles sync automatically on startup
⚠️ Article title clicks are unresponsive (needs subsequent fix)
⚠️ Interface design is rather basic (UI requirements weren't specified in the prompt)

Key Findings: Advantages and Limitations of Multi-Model Collaboration

Core Advantages of Multi-Model Collaboration

Clear role division: Hermes handles planning and verification, MiniMax handles code implementation—each fulfilling their role. This division borrows from the classic "architect + developer" team structure in software engineering
High autonomy: Apart from installing the Rust environment requiring human intervention, the entire coding process required almost no human participation
Quality assurance mechanism: Hermes verifies after each subtask completion and performs an overall audit at the end, forming a quality gate process similar to Code Review

Current Limitations

Minor bugs still require manual fixes: Such as PDF image export failure and unclickable article links
Audit-discovered bugs aren't auto-fixed: After Hermes found 4 bugs in the second task, it waited for human confirmation rather than auto-fixing. This may be a deliberate safety design—without explicit human authorization, the Agent shouldn't modify completed code on its own
Limited UI design capability: Without explicit design requirements, generated interfaces are rather rough. This reflects current language models' shortcomings in visual design—they excel at logical implementation but lack aesthetic judgment

Practicality Assessment

Conclusion: AI Programming Moving from Single-Model to Multi-Model Collaboration

Key Takeaways

The Hermes agent can coordinate DeepSeek and MiniMax models for division of labor, achieving a complete development workflow of planning-execution-verification
The first task (Markdown editor PDF export) was completed in just 9 minutes with perfect HTML export, though PDF export had a minor image rendering flaw
The second task used the niche Nim language to build an RSS aggregation service from scratch, with the final compiled binary under 700KB, demonstrating AI's ability to handle uncommon languages
Human intervention in the multi-model collaboration mode was minimal, mainly limited to environment configuration and final bug confirmation
Current limitations include detail bugs still requiring manual fixes, and limited UI output quality when explicit design requirements aren't provided

Experiment Background: Making Multiple AI Models Collaborate on Development

Experiment Setup: Hardware and Model Configuration

Hardware and Architecture Design

Model Role Assignment

Task One: Adding PDF Export to a Markdown Editor

Task Description and Execution Process

Execution Results

Task Two: Building an RSS Aggregation Service from Scratch in Nim

Why Nim as a Stress Test?

Configuration Adjustment: The Delegate Task Mechanism

Execution Process and Final Results

Key Findings: Advantages and Limitations of Multi-Model Collaboration

Core Advantages of Multi-Model Collaboration

Current Limitations

Practicality Assessment

Conclusion: AI Programming Moving from Single-Model to Multi-Model Collaboration

Key Takeaways

Related articles

Cursor + Codex Dual-IDE Collaboration: A Practical Methodology for Open-Source Project Customization

Cursor Multi-Agent in Practice: Building a Full-Stack Next.js Blog in 50 Minutes

Building an AI Software Factory from Scratch: A Cursor Engineer's Hands-On Experience with Multi-Agent Collaboration

Experiment Background: Making Multiple AI Models Collaborate on Development

Experiment Setup: Hardware and Model Configuration

Hardware and Architecture Design

Model Role Assignment

Task One: Adding PDF Export to a Markdown Editor

Task Description and Execution Process

Execution Results

Task Two: Building an RSS Aggregation Service from Scratch in Nim

Why Nim as a Stress Test?

Configuration Adjustment: The Delegate Task Mechanism

Execution Process and Final Results

Key Findings: Advantages and Limitations of Multi-Model Collaboration

Core Advantages of Multi-Model Collaboration

Current Limitations

Practicality Assessment

Conclusion: AI Programming Moving from Single-Model to Multi-Model Collaboration

Key Takeaways

Related articles

Cursor + Codex Dual-IDE Collaboration: A Practical Methodology for Open-Source Project Customization

Cursor Multi-Agent in Practice: Building a Full-Stack Next.js Blog in 50 Minutes

Building an AI Software Factory from Scratch: A Cursor Engineer's Hands-On Experience with Multi-Agent Collaboration