The Complete Guide to Hermes Agent: Building a Self-Evolving AI Assistant from Scratch

What is Hermes Agent?

Hermes Agent is an open-source AI agent project (MIT license) that has already garnered over 140,000 stars on GitHub, making it one of the fastest-growing open-source projects. Its most defining feature is self-evolution — by writing and updating its own Skills, it becomes increasingly powerful over time.

This self-evolution capability is built on a concept called "meta-programming." Traditional AI assistants have their capabilities fixed at deployment, but Hermes achieves runtime capability expansion by externalizing operational workflows into readable and writable Markdown files. This design draws from the cognitive science concept of "procedural memory" — just as humans internalize complex operations into muscle memory through repeated practice, Hermes achieves a similar effect by converting repetitive workflows into structured skill files. This architecture is also known as the "Tool-Use Agent" paradigm, one of the mainstream directions in current AI Agent research.

Hermes Agent Overview

It can run on your own infrastructure — whether it's a Mac Mini, laptop, VPS, Docker container, or even on Android via Termux. It supports Telegram, Discord, Slack, WhatsApp, and even iMessage as interaction interfaces.

Out of the box, Hermes comes with 91 built-in skills, and the community skill library offers over 520 installable skills. It can do far more than a typical chatbot: generate ExcalDraw diagrams, voice conversations, video production, YouTube comment monitoring, daily news briefings, server security checks, and much more.

The Five Core Pillars of Hermes

Pillar 1: Memory

Memory is the persistent context that Hermes carries across sessions. There are two core files:

user.md: Records who you are, your style preferences, and what you dislike
memory.md: Records environment information, ongoing projects, and business context

These two files are loaded at the start of every session, ensuring the Agent doesn't start from scratch like an amnesiac each time. Crucially, Hermes automatically extracts information from conversations and updates these files, but you should still proactively tell it things like "remember this" or "never do that again."

Hermes' memory system is fundamentally different from the currently popular RAG (Retrieval-Augmented Generation) technology. RAG typically chunks large documents and stores them in a vector database, retrieving relevant fragments to inject into context during queries. Hermes instead adopts a lighter "file-as-memory" approach — directly condensing key information into the user.md and memory.md files, loading them in full at the start of each session. The advantages of this approach are: high information density, no retrieval latency, and no semantic drift issues from vector similarity matching. The downside is that it's limited by context window size and isn't suitable for storing massive amounts of information. This is why Hermes emphasizes "extraction" rather than "storage" — it actively compresses and updates memory rather than accumulating indefinitely.

Pillar 2: Skills

Skills are reusable operation manuals, like cooking recipes that ensure consistency in every execution. Each skill file (skill.md) contains YAML front matter that tells the Agent what the skill is for, enabling progressive disclosure — only loading the full skill content into context when needed.

YAML Front Matter is a metadata format originating from static site generators (like Jekyll and Hugo), wrapped between triple dashes (---) at the top of a file. In Hermes' skill system, YAML front matter serves as an "index card," containing the skill name, trigger conditions, applicable scenarios, and other summary information. Progressive Disclosure is a classic principle in user interface design, meaning complex information is only presented when the user needs it. Applied to AI Agents, this means the Agent doesn't load the full content of all 520+ skills into the limited context window at once. Instead, it first reads summaries to assess relevance, only loading the complete skill document when confirmed necessary. This dramatically saves token consumption and improves response accuracy.

Hermes analyzes your workflows, automatically converts repetitive operations into skills, and continuously optimizes them based on your feedback. The community skill library provides over 520 ready-made skills covering design, programming, automation, and many other categories.

Pillar 3: Soul

The Soul.md file defines the Agent's personality. If you have multiple Hermes instances, each can have a different "character" — concise, humorous, or serious. This file also gradually evolves with your feedback.

Pillar 4: Crons

Crons transform Hermes from a passive responder into a proactive automation engine. You can say in natural language "execute XYZ every day at 6 AM," and it will create a scheduled task. Each trigger starts a completely new isolated session, sending results back to the original chat upon completion.

Cron is a time-honored task scheduler in Unix/Linux systems, with its name derived from the Greek word "Chronos" (time). Traditional Cron uses a five-field expression (minute hour day month weekday) to define execution times — for example, "0 6 * * *" means every day at 6 AM. Hermes provides a natural language abstraction over this — users simply say "every day at 6 AM," and the Agent automatically converts it to the underlying Cron expression. The more critical design choice is the "isolated session" mechanism: each Cron trigger creates an independent execution environment, preventing scheduled task output from polluting the main conversation's context and preventing long-running tasks from failing due to context overflow. This design is similar to the process isolation concept in operating systems.

Hermes Feature Demo

Pillar 5: The Self-Evolution Loop

The complete loop is: do work → Agent learns → save to memory → convert repetitive steps into skills → search historical sessions for old context → repeat. The more you use it, the better Hermes understands you and the more powerful it becomes.

Hermes vs Claude Code vs OpenClaw

The author provides a clear use-case breakdown:

Claude Code: Daily workhorse, 90% of knowledge work happens here, ideal for deep coding sessions at your computer
Hermes/OpenClaw: Mobile use, managing tasks via Telegram anytime anywhere, setting up Crons, quick interactions

The author's main reason for switching from OpenClaw to Hermes was that OpenClaw's frequent updates caused crashes, while Hermes is lighter, more stable, and more focused on self-evolution. But these tools aren't mutually exclusive — through GitHub repository syncing, you can have all Agents share the same knowledge base.

Hands-On: Building Hermes Agent from Scratch

VPS Deployment

A VPS (Virtual Private Server) is recommended for deployment. Choose Ubuntu 24.04 LTS, and you can opt for either Docker containerized deployment (one-click install, simpler) or direct installation on the VPS root.

Docker is an OS-level virtualization technology that packages applications and their dependencies into standardized "containers," ensuring consistent execution in any environment. For AI Agent deployment, Docker's value is particularly notable: first, environment isolation — each Agent container has its own independent filesystem, network stack, and process space, so one Agent's crash won't affect other instances; second, reproducibility — build steps defined in a Dockerfile can precisely recreate the same environment on any machine; finally, resource control — you can set CPU and memory limits for each container, preventing any single Agent from consuming excessive resources. Compared to installing directly on a VPS, the Docker approach adds roughly 5-10% performance overhead, but the operational convenience it brings in multi-Agent management scenarios far outweighs this cost.

The Docker approach's advantage lies in: each Agent has its own independent environment, API keys, and memory, without interfering with each other, making it easy to manage multiple Agent instances.

Configuration Process

Choose an inference provider: OpenAI Codex is recommended (can directly use a ChatGPT subscription, most economical)
Set up messaging channel: Configure a Telegram Bot (created via BotFather)
Set user permissions: Restrict interaction to only your Telegram account

An inference provider refers to a platform that offers large language model API calling services. As a model-agnostic Agent framework, Hermes supports multiple backends: OpenAI, Anthropic, Google Gemini, locally deployed open-source models (via Ollama or vLLM), and more. The author recommends OpenAI Codex for economic reasons — if you already subscribe to ChatGPT Plus/Pro, you can directly reuse that subscription's API quota without additional costs. Factors to weigh when choosing an inference provider include: model capability (reasoning depth), response speed (latency), cost (price per million tokens), privacy compliance (whether data leaves the country), and availability (SLA guarantees).

Deployment Configuration

API Key Security Management

Never paste API keys directly in chat. The correct approach is through terminal commands:

hermes config set GITHUB_TOKEN <your-token>

This stores the key in a .env file rather than in conversation history. Even when using private open-source models, it's good practice to maintain this habit.

GitHub Backup

The first thing to do after setup is connect a GitHub repository. If your VPS has issues, you won't lose any data. Create a Cron task to auto-sync every night:

"Every day at midnight Central Time, push all changes to the GitHub repository"

Hermes automatically handles timezone conversion, creating .gitignore to exclude sensitive files, and other details.

Management Best Practices

Use Claude Code to Manage All Agents

An extremely practical suggestion: create a Claude Code project specifically to manage all your VPSes and Agents. Record each Agent's IP address, passwords, environment variables, and Docker container information. When an Agent has issues, you can use Claude Code to help troubleshoot and fix it.

Security Principles

Give each Agent independent accounts (email, API keys)
Name API keys for different Agents to track spending
Follow the principle of least privilege: treat Agents like new interns
Set up VPS firewalls and conduct regular security audits

The Principle of Least Privilege (PoLP) is a cornerstone concept in information security, proposed by Jerome Saltzer in 1975. Its core idea is: any entity (user, program, process) should only be granted the minimum set of permissions needed to complete its task. Applied to AI Agent scenarios, this principle is especially important — because Agents have the ability to autonomously execute code and call APIs, excessive permissions could lead to catastrophic consequences. For example, an Agent responsible for sending daily news briefings shouldn't have permission to delete databases; an Agent monitoring YouTube comments doesn't need access to your bank API keys. The author's comparison of Agents to "new interns" is very apt: you wouldn't give an intern admin access to all company systems on day one, but rather gradually open access as trust is established.

When to Create a New Agent?

Follow this decision tree:

Need different permissions/keys/tools? → New Agent
Need independent long-term memory? → New Agent
Is it continuously repetitive work? → New Agent
Just a one-time task? → Keep it in the main Agent

Don't split too early. Get one Agent working well first, and only separate when you naturally feel the need.

Maintenance Mindset

Agent makes the same mistake twice → Correct it on the spot and update skills/memory
You've given the same instruction twice → Have it write a skill
Responses are too verbose or the tone is off → Edit the Soul file
Behavior is abnormal → Check memory.md; outdated memory is the #1 cause of strange behavior

Hermes isn't a set-it-and-forget-it tool — it's a teammate that requires continuous training. The more time you invest interacting with it and providing feedback, the more powerful and attuned to you it becomes.

Key Takeaways

Hermes Agent is a 140K-star open-source AI agent whose core feature is self-evolution through automatically writing and updating skills
Five core pillars: Memory, Skills, Soul, Crons, and the Self-Evolution Loop
Complementary to Claude Code: Claude Code is ideal for deep coding, while Hermes excels at mobile task management and automation
Docker containerized deployment is recommended, paired with automatic GitHub backup and a Claude Code project to manage all Agents centrally
Security best practices: independent accounts, least privilege, API keys configured via terminal rather than chat input