Hermes Agent Deployment Tutorial: An AI Assistant That Uses Fewer Tokens Than CrawlAI

Introduction: Why Choose Hermes Agent?

In the AI Agent space, "CrawlAI" (and similar projects) has long been a popular choice, performing quite well across various aspects. However, for average users, CrawlAI has one critical drawback — it consumes way too many Tokens. For individual users without a large API budget, this is undoubtedly a significant barrier.

What is an AI Agent? An AI Agent (intelligent agent) refers to an AI system capable of autonomously perceiving its environment, making decisions, and executing actions. Unlike traditional chatbots, Agents possess capabilities such as tool calling, multi-step reasoning, memory management, and task planning. They can decompose complex tasks into multiple sub-steps and invoke external tools like search engines, code executors, and file systems to accomplish goals. Since 2024, AI Agents have become a core direction for LLM applications, with representative projects including AutoGPT, CrewAI, MetaGPT, and more.

Why is Token consumption a key issue? Tokens are the basic units that large language models use to process text — in Chinese, one character typically corresponds to 1-2 tokens. When an AI Agent executes tasks, it needs to perform multiple rounds of reasoning, tool calling, and context maintenance, each step consuming tokens. A complex task might require tens of thousands or even hundreds of thousands of tokens. At commercial API pricing (e.g., GPT-4 at roughly $30 per million tokens), costs accumulate rapidly. Therefore, token efficiency directly determines the daily operating cost of an Agent and is a core consideration for average users when choosing an Agent framework.

Today I'm introducing another AI Agent project — Hermes Agent. According to feedback from multiple content creators, it's no less intelligent than CrawlAI, performs even better in certain scenarios, and is much more token-friendly.

Hermes Agent Deployment Tutorial

However, it's important to note upfront that this project was natively developed for Linux and isn't very Windows-friendly. The author of this article spent 12 hours trying to deploy it on a non-C drive path (to make it easier to package and share) but was never successful. The final conclusion: for Windows deployment, it's best to install on the C drive. Below is the complete deployment process.

Preparing the Deployment Environment

Cloning the Project Source Code

First, open a command line window and follow these steps:

Create a new project folder and navigate into it
Use the git clone command to clone the Hermes Agent source code
Wait for the download to complete (speed depends on your network)

Ensure your network connection is stable — download speeds are usually fairly fast.

Creating a Virtual Environment

After entering the project directory, create a Python virtual environment:

Key point: You must create the virtual environment in the project's root directory
Once created, you'll see the corresponding environment directory in the project folder
Activate the virtual environment before proceeding to the next step

What is a Python virtual environment? A Python virtual environment is an isolation mechanism that creates an independent Python runtime and dependency package space for each project. This prevents dependency version conflicts between different projects. Common tools include venv (built into Python), conda, and virtualenv. Virtual environments are especially important in AI project deployment because different frameworks often have different version requirements for libraries like PyTorch and transformers. Without a virtual environment, version conflicts from globally installed packages can prevent projects from running.

Installing Dependencies

After activating the virtual environment, install project dependencies:

Installation takes approximately 1-2 minutes, depending on network speed
If errors occur during installation, you can re-run the install command to patch missing packages

Model Configuration and Channel Selection

Choosing an AI Model

After installing dependencies, you'll enter the configuration wizard. The project supports multiple model interfaces:

OpenAI series
SiliconFlow
Other compatible interfaces

What is the SiliconFlow platform? SiliconFlow is a leading domestic AI model inference service platform in China that provides a unified API interface to access various open-source and commercial large models. Its core advantage is compatibility with the OpenAI API format — users only need to change the API address and key to seamlessly switch models. The platform offers over 110 models including DeepSeek, Qwen, GLM, and more, with some models providing free quotas that significantly lower the barrier for individual developers. For users in China, SiliconFlow also solves the network issues of directly accessing overseas services like OpenAI.

Since the demo uses the SiliconFlow platform, select the "More Models" option from the model list, then choose the custom interface (option 13) and enter SiliconFlow's API address.

API key entry tip: When pasting a key in the command line, first select the text, then right-click to paste (the content won't be displayed), and press Enter to confirm. This is a Windows command line security feature — password-type inputs don't echo back to prevent others from seeing them.

The SiliconFlow platform offers over 110 models to choose from. In the demo, DeepSeek V3 (number 19) was selected, with the context length kept at default.

Technical characteristics of DeepSeek V3: DeepSeek V3 is a large language model released by DeepSeek, using a Mixture of Experts (MoE) architecture with a total of 671B parameters but only activating 37B parameters per inference, balancing performance and efficiency. The model excels at code generation, mathematical reasoning, and Chinese comprehension, with inference costs far lower than dense models of equivalent performance. Its open-source nature allows third-party platforms like SiliconFlow to deploy and provide low-cost API services, which is why it's particularly suitable as an Agent's underlying model — both smart and cost-effective.

Configuring the WeChat Communication Channel

During the communication channel selection phase:

Select WeChat as the communication method (option 14)
After confirmation, the system will generate a QR code
Scan the code with your phone's WeChat to log in
After successful connection, select "Anyone" can trigger conversations (option 2) to avoid the hassle of pairing friends one by one
Set this WeChat assistant as the primary target and press Y to confirm

Important note: On Windows 10, do not select test mode — it's prone to errors. The test function only runs stably on Linux; Windows users should select N to skip.

Permission Settings and Launch

Configuring Permissions

After exiting the configuration wizard, you need to manually set several key permissions:

Enter the API key into the configuration file
Set relevant permission parameters (check/enable)
Ensure all four configuration items are correctly filled in

Launch Command (Critical)

When launching, you must include the --allow parameter (specifically -u or a similar permission parameter) — this step is crucial:

With the permission parameter, the Agent can control the computer to perform more operations
Without this parameter, the Agent will refuse to execute more important tasks
After launching, keep the command line window open to chat with the assistant from your phone at any time

Why are permission parameters needed? Permission control for AI Agents is an important security topic. When an Agent gains computer control permissions, it can perform sensitive operations like file manipulation, running programs, and accessing the network. This brings convenience but also introduces risk — if the Agent's reasoning goes awry or it's subjected to a Prompt Injection attack, it might execute unintended dangerous operations. Therefore, the permission parameter design follows the Principle of Least Privilege, requiring users to explicitly authorize before unlocking advanced features. This is similar to how phone apps request permissions, ensuring users have clear awareness and control over the Agent's behavioral boundaries.

Common Issues and Notes

How to Resume After Restart

When restarting after closing the window, you need to:

Navigate to the project directory
Activate the virtual environment (be careful not to misspell the folder name)
Execute the launch command

Common mistake: An extra space or symbol in the folder name or environment path will cause failure — double-check carefully. This is because Windows file path parsing is sensitive to special characters (such as spaces, Chinese characters, parentheses, etc.), while Linux-native projects typically don't account for these Windows-specific issues in path handling.

Model Compatibility Issues

Not all models work properly. Testing revealed:

DeepSeek V3: Works normally ✓
Qwen series: Generally works ✓
Some models with "MH" suffix: May return 400 errors, indicating that certain capabilities of these models are not supported on the SiliconFlow platform

Why do 400 errors occur? HTTP 400 errors mean "Bad Request" and in AI API calls typically indicate that the request format is incompatible with the model. Different models have varying levels of support for Function Calling, System Prompt formats, tool definition schemas, etc. AI Agent frameworks typically rely on Function Calling capabilities to invoke external tools — if a model doesn't support this feature or implements it differently, it will return a 400 error. This isn't a deployment issue but rather a model capability limitation.

If you encounter a 400 error, it's most likely a model incompatibility issue, and you'll need to re-enter the model installation command to switch models.

How to Switch Models

Ctrl+C to terminate the current process
Execute the model installation command within the virtual environment
Re-select a model (e.g., switch from DeepSeek to Qwen 3)
Restart and you're good to go

Usage Tips After Successful Deployment

After deployment, Hermes Agent is like a "newborn child" that needs gradual training:

Writing articles: Have it generate various types of text content
Scraping information: Retrieve data you need from the web
Controlling your computer: Execute operations like shutdown, opening programs, etc.
Continuous learning: Content you teach it gets stored in its knowledge base, making it smarter over time

The "continuous learning" mentioned here is actually the Agent's memory system at work. Modern AI Agents are typically equipped with two systems: short-term memory (conversation context) and long-term memory (vector database storage). When you correct the Agent's mistakes or tell it your preferences, this information gets encoded and stored in long-term memory, to be retrieved and referenced in future interactions, achieving the "gets to know you better over time" effect.

As long as you keep the command line window open (you can minimize it), you can chat with the AI assistant on WeChat from your phone at any time. After long-term training, it will become a capable assistant that truly understands your needs.

Conclusion

Hermes Agent's core advantage over CrawlAI lies in lower token consumption, and its WeChat direct messaging interaction is very convenient. Although the Windows deployment process does have some pitfalls (must install on C drive, path sensitivity, model compatibility, etc.), as long as you follow the process step by step, average users can successfully deploy it. Users with a Linux environment are recommended to deploy on Linux first for a smoother experience.

For users looking to further optimize their experience, consider these advanced directions: use WSL2 (Windows Subsystem for Linux) to get a native Linux environment on Windows; try different model combinations, using cheaper small models for simple tasks and powerful models for complex ones; and regularly back up the Agent's memory database to prevent accidental loss of accumulated personalized knowledge.