Hermes Overhaul in Practice: Building a Multi-Model Orchestration Master Agent System

Transforming Hermes into a multi-model orchestration master Agent with intelligent routing and task monitoring.
This article details how a developer overhauled the Hermes AI Agent framework into a unified master entry point that orchestrates Sub-Agents alongside Claude, Gemini, and Codex. It covers the multi-model CLI integration approach, inter-Agent communication via Gateway APIs and plugin registration, task status monitoring with counters and Events panels, a lightweight web site for Job result display, and a three-tier graceful degradation strategy for search and scraping.
Why Overhaul Hermes?
Hermes, as an AI Agent framework, can significantly boost productivity in daily use. AI Agent frameworks are a class of software architectures that allow large language models (LLMs) to autonomously execute multi-step tasks — unlike traditional single-turn Q&A, Agents can perceive their environment, formulate plans, invoke tools, and iterate based on feedback. Hermes provides a TUI (Terminal User Interface) interaction mode, enabling developers to drive AI through complex workflows like code generation, file operations, and information retrieval right from the command line. However, during intensive use and in complex scenarios, several pain points emerge: frequent window switching, inconvenient multi-model invocation, and difficulty tracking task status.
This article shares how one developer transformed Hermes's default Agent into a "master entry point" that seamlessly orchestrates Sub-Agents, Claude, Gemini, and Codex — building a true multi-Agent collaborative workflow.
Core Overhaul: Unified Master Agent Orchestration Architecture
Design Philosophy
The core principle behind the overhaul is reducing context switching and improving focus. The Hermes default Agent is repositioned as the master entry point, from which you can directly communicate with all other Agents — including Hermes's own Sub-Agents, as well as three external models: Claude, Codex, and Gemini.
Here, Sub-Agents are child task execution units spawned by the master Agent, each typically specializing in a specific domain (such as code review, documentation generation, data analysis, etc.). Multi-Agent Orchestration is a trending paradigm in AI engineering, with the core idea of decomposing complex tasks and dispatching them to the most capable Agent, then having the master Agent aggregate the results. This architecture resembles the Orchestration Pattern in microservices, where the master Agent plays the role of conductor rather than having a single model handle everything.
The master Agent has the following capabilities:
- Recognizes all child Agents and understands their respective specialties
- Automatically dispatches relevant subtasks to the most suitable Sub-Agent when facing complex tasks
- Understands the strengths of Claude, Codex, and Gemini, routing tasks precisely via Delegate Task
Responses from different Agents are labeled with their names, and the interaction logic resembles a typical chat application, making the experience very intuitive.
Technical Implementation of Multi-Model Integration
Many people use Claude, ChatGPT, and Gemini through subscription plans, but subscription plans typically don't come with API Keys and can't be directly integrated into Hermes.

Although Hermes supports logging into Google AI Studio to use Gemini and can use Codex through the Codex App Server, quotas are quickly exhausted under heavy use. Therefore, the developer chose a CLI-based integration approach for each model.
The specific implementation takes a form similar to IDE plugins:
- Claude: Communicates via ACP mode using Stream JSON. ACP (Agent Communication Protocol) is a protocol specification designed for inter-Agent communication, aiming to standardize message passing, task delegation, and result reporting between Agents. Stream JSON is a streaming JSON transmission method that allows data to be sent incrementally in chunks rather than waiting for the complete response before returning everything at once — this is particularly important for LLM inference scenarios, where the model generates tokens far slower than network transmission speed. Streaming lets users see the generation process in real time, dramatically improving the interaction experience.
- Codex: Reuses the official Codex App Server
- Gemini: Since the CLI doesn't yet support ACP mode, it still uses a cold-start approach, but maintains conversation continuity by passing a Conversation ID. The Conversation ID is a key identifier for maintaining multi-turn dialogue context — LLMs themselves are stateless, with each call being an independent inference process. Conversation continuity depends on passing the message history as context. The Conversation ID's role is to associate all message records of the same conversation on the client or server side, ensuring subsequent requests carry the complete conversation history.
The communication protocol is unified as TCP + JSON RPC. JSON-RPC is a lightweight remote procedure call protocol that uses JSON as its data encoding format, defining standard structures for requests (containing method names and parameters) and responses (containing results or errors) — making it ideal for inter-Agent tool invocation scenarios. Running it over the TCP transport layer offers lower overhead and more persistent connection characteristics compared to HTTP. During Hermes runtime, only one CLI instance is maintained whenever possible, ensuring session continuity while avoiding cold starts for every conversation.
Inter-Agent Communication Mechanism in Detail
Gateway API Approach
Hermes Agents are isolated from each other by default and cannot communicate directly. However, each Agent has its own Gateway with an internal API Server. A Gateway is a common design pattern in microservice architecture, serving as the unified entry point for all external requests and handling routing, authentication, and load balancing. The Gateway API Server built into each Hermes Agent follows a similar philosophy — it provides an HTTP interface layer for each Agent, allowing external parties (including other Agents) to interact with it via standard HTTP requests. The default isolation between Agents is a security design that prevents state contamination and privilege escalation. The master Agent sends HTTP requests to pass Prompts to other Agents, preserving the security benefits of isolation while enabling controlled cross-Agent collaboration.
Plugin Registration Mechanism
To let the master Agent know which Agents are available for communication and how to communicate with them, the developer wrote a Hermes Plugin:

During plugin initialization, it retrieves local configuration and registers information about Sub-Agents and CLIs along with their usage methods. The invocation layer reads configuration through utility classes, encapsulates the information, then calls the Gateway interface or the corresponding Scale to complete communication.
After accumulating conversations over time, the master Agent gradually "remembers" each Agent's specialties and can autonomously determine optimal routing for complex multimodal tasks. This capability relies on the LLM's In-Context Learning characteristic — as interaction history accumulates, the model can summarize from past task dispatch results which Agent performs better on which type of task, making more precise routing decisions in subsequent calls.
Task Monitoring and Status Tracking
Status Line Task Counter
A kanban task counter was added to the TUI's Status Line, tracking the number of all currently running tasks on the board. The counter resets to zero when all tasks are complete — clear at a glance.
Events Panel
Since the counter can't identify when tasks enter a Blocked state, an additional Events panel was added. Entering the Events command displays:
- Final status of kanban tasks
- Cron Job execution results
Cron Job originates from the Unix cron daemon, used to automatically execute scripts or commands on a preset schedule. In the AI Agent context, Cron Jobs take on new meaning — they can periodically trigger Agents to perform specific tasks, such as daily news summary generation, scheduled code repository checks, periodic data reports, and more. This mechanism transforms AI Agents from passive responders to proactive executors, serving as foundational infrastructure for building automated workflows.
The Cron Job monitoring implementation is also straightforward: it periodically reads the last run status recorded in the .hermes/jobs.json configuration file and writes failure records to a local database.
Job Result Display: Building a Lightweight Web Site
Jobs executed via the terminal CLI or TUI in Hermes have no server-side component, so results can't be directly delivered externally. While they can be sent to chat channels, the rendering quality on mobile devices is poor.

The solution is to build a lightweight web site:
- The server automatically scans and serializes Job persistent output on a regular basis
- The frontend selects appropriate rendering components based on data type (Markdown, charts, etc.)
- Combined with intranet tunneling or cloud deployment, results can be viewed anytime, anywhere
Intranet tunneling (using tools like frp, ngrok, Cloudflare Tunnel, etc.) is a technique for exposing local services to the public internet. Since Hermes runs in a local terminal, its Job results are only accessible on the local machine by default. By establishing a secure tunnel from the public internet to the local service through intranet tunneling, users can access Job result pages via a browser on their phone or other devices, achieving true cross-device accessibility.
Search and Scraping: Three-Tier Fallback Strategy
In productivity scenarios, an LLM's web search and scraping capabilities directly impact the quality of AI work output. The developer configured a dedicated search and scraping Agent for Hermes, employing a three-tier priority strategy:
- Tier 1: Cloud API or the LLM's built-in Web Search (highest efficiency)
- Tier 2: Dynamic headless browser (using Camoufox with built-in anti-detection)
- Tier 3: CDP-managed daily browser (highest success rate)

Regarding the Tier 2 approach, a headless browser is a browser instance without a graphical interface, commonly used for automated testing and web scraping. However, many websites deploy anti-automation detection mechanisms (such as checking WebDriver properties, Canvas fingerprint anomalies, missing browser plugin signatures, etc.) that can identify and block headless browser access. Camoufox is a Firefox-based anti-detection headless browser that simulates a real user environment by modifying underlying browser fingerprint characteristics (including User-Agent, screen resolution, WebGL renderer, timezone, and dozens of other dimensions), making automated access difficult for websites to distinguish.
Regarding the Tier 3 approach, CDP (Chrome DevTools Protocol) is a debugging protocol exposed by the Chrome browser that allows external programs to control nearly all browser behavior via WebSocket connections — including page navigation, DOM manipulation, network interception, JavaScript execution, and more. Unlike traditional automation tools such as Selenium, CDP communicates directly with the browser engine without injecting an additional WebDriver, making it harder to detect. Using CDP to control a daily-use browser can reuse existing Sessions, Cookies, and fingerprint characteristics, effectively bypassing most detection mechanisms. The developer recommended the open-source tool Web Access, which dynamically selects scraping strategies based on the scenario, solidifies operational experience per domain, reuses across Sessions, and scrapes via background tabs without affecting current usage.
This three-tier fallback strategy follows the engineering principle of "Graceful Degradation": prioritize the lowest-cost, fastest solution, and automatically switch to the next tier when a higher-priority approach fails, ensuring the task ultimately gets completed.
Summary and Outlook
The core value of this overhaul lies in:
- Unified entry point: Reduces window switching, improves focus
- Intelligent routing: Master Agent autonomously determines task dispatch
- Observable status: Counter + Events panel for real-time oversight
- Accessible results: Web site for cross-device viewing
Limited by the TUI and terminal itself, there's only so much that can be modified. For an even more refined multi-Agent collaboration experience, the next step might require moving beyond the CLI to develop a standalone local GUI client. Since some modifications involve Hermes source code and the official team will gradually optimize these experiences as well, this article primarily shares the concepts and implementation approaches for reference.
Related articles

ZCodeAI Free AI Agent Tool Review: Multi-Model Aggregation at Zero Cost
Detailed review of ZCodeAI, a desktop AI Agent tool by ZhiPu featuring free built-in models like DeepSeek V4 Flash and Xiaomi MiMo, with multi-model aggregation and no API Key required.

Claude Code Chinese Practical Handbook: A Complete Beginner's Guide for Users in China
A detailed look at the Claude Code Chinese handbook on Feishu, covering setup, domestic LLM integration, commands, and templates for users in China.

Claude Code Installation Guide & The Five Stages of AI Programming Tools Explained
Complete Claude Code installation guide with the five stages of AI programming tools, from manual coding to agents. Learn 0-to-1 project building and 1-to-100 iteration challenges.