One Person, Three Machines: Local Agent Deployment and Multi-Machine Collaborative Operations in Practice

Overview: Redefining Personal Operations with AI Agents

When one person needs to manage multiple servers, traditional operations approaches can be overwhelming. Bilibili creator "雲姥工" (Cloud Grandpa) shared his hands-on experience—by deploying multiple AI Agents, he achieved an efficient operations model where one person controls three physical hosts. The core philosophy of this approach is: assign tasks of different difficulty levels to different AI tools, accomplishing the maximum amount of work with minimal cost.

bilibili source: 雲姥工：本地agent部署及探索上一

Multi-Agent Collaboration: Division of Labor Between Cloud Code and Hermes

Cloud Code Handles the "Hard Work"

In actual operations, the creator clearly divided tasks by difficulty and risk level. Cloud Code (Claude's command-line tool) is used to handle infrastructure-level "hard work"—system configuration, critical service deployment, low-level operations tasks, and more.

Cloud Code is Anthropic's command-line interaction tool for Claude. It allows users to converse directly with the Claude model in a terminal environment and authorize the model to execute Shell commands, edit files, manage system services, and more. Unlike web-based conversations, Cloud Code creates a completely fresh session context (clean session) each time it starts, without inheriting any previous conversation history. While this design means you need to re-describe the task background each time (consuming more Tokens), it also avoids hallucinations and error accumulation caused by context pollution.

Although some point out that Cloud Code's clean session initialization burns more Tokens, the creator considers this precisely its advantage: high stability, strong determinism, ensuring critical tasks are completed correctly. For operations scenarios, it's better to spend a few extra Tokens than to make mistakes on critical operations.

Hermes Handles the "Soft Work"

By comparison, Hermes is assigned to handle software-layer tasks—batch operations, daily briefing generation, automated script execution, and so on. Hermes here refers to a class of AI Agent frameworks that support persistent memory and tool calling. Unlike Cloud Code's stateless mode, it has conversational memory, task planning, and autonomous tool-calling capabilities, accumulating experience and optimizing execution strategies across multiple interactions.

Hermes's characteristic is that it "gets smarter the more you use it"—through its memory mechanism, it learns user preferences and environment characteristics, gradually reducing unnecessary exploratory operations, thereby lowering Token consumption. It's well-suited for repetitive, lower-risk work.

However, the creator also admitted he's "very uneasy" about Hermes executing infrastructure operations. This distrust isn't unfounded—persistent memory also carries risks. Incorrect experiences may become entrenched, and the Agent might accidentally delete files or perform erroneous renames, causing irreversible damage.

Risk Control Strategies

To prevent Agent misoperations, the creator summarized several practical principles:

Before critical operations, require the Agent to upload files to the local NAS for backup
Don't operate directly on original files—have the Agent rename them and test on copies
Configure 1-2 Telegram Sessions for each Agent for status notifications and manual confirmation

These strategies essentially seek a balance between AI autonomy and human control—giving Agents enough freedom to improve efficiency while retaining manual approval at critical decision points, avoiding catastrophic consequences from automation.

Single-File Deployment: The Ultimate Ventoy + RAW Image Solution

Why Choose Ventoy

The creator repeatedly emphasized that Ventoy is a "killer weapon" that many veteran operations engineers haven't realized the power of.

Ventoy is an open-source bootable USB solution. Traditional bootable drive creation requires "burning" an ISO image to a USB drive, limiting it to one system at a time with the remaining space unusable. Ventoy's revolutionary approach is: install Ventoy to the USB drive once, then simply copy ISO/IMG/VHD and other image files to the drive to boot from them. It supports storing multiple images simultaneously and selecting between them at boot time. Furthermore, Ventoy supports direct booting of RAW disk images (via the VTOI plugin), meaning you can package a fully configured Linux system into a single file for true "plug-and-play" deployment.

Ventoy's core advantage lies in single-file booting—packaging an entire configured Linux system into one Image file that can boot directly on any machine when plugged in via USB or portable drive.

What does this mean? When deploying an Agent environment for someone else, you only need to:

Configure the system and all Agent services locally
Compress and package (a configured system is about 60+ GB, compressed to about 30GB)
Upload to cloud storage for the other party to download
The other party plugs in the storage device and boots up

All the other party needs to do is "copy, paste, plug into the new machine"—the Agent goes online immediately after connecting to the network.

Technical Stack Details

The creator's recommended technology stack combination:

File System: BTRFS (not EXT4 or XFS)

BTRFS (B-tree File System) is a next-generation Copy-on-Write file system for Linux. Compared to traditional EXT4, BTRFS natively supports snapshots, subvolumes, transparent compression, data checksumming, and online resize. In this operations scenario, BTRFS has two key advantages: first, Windows has open-source drivers like WinBtrfs that can directly read and write BTRFS partitions, facilitating cross-platform image manipulation; second, its snapshot feature enables rapid system snapshots before an Agent executes dangerous operations, with instant rollback if something goes wrong—an important safety net for potential AI Agent misoperations.

Image Format: RAW Image rather than VHD

RAW Image is the simplest disk image format—a byte-for-byte complete copy of a physical disk without any additional metadata or encapsulation layers. Compared to VHD (Virtual Hard Disk) or QCOW2 and other virtualization-specific formats, RAW Image's advantage is that it can be mounted directly through the Linux kernel's native Loop device mechanism, requiring only losetup and mount commands without installing VirtualBox, QEMU, or any third-party tools. A Loop device is a pseudo-device provided by the Linux kernel that simulates a regular file as a block device, allowing the operating system to access partitions and file systems within image files as if they were physical hard drives.

Boot Method: Dual boot (BIOS + EFI) + VTOI for maximum compatibility
Init System: Runit instead of SystemD, pursuing extreme performance

The Init system is the first user-space process (PID 1) that runs after Linux boots, responsible for starting and managing all system services. SystemD is the default Init system for the vast majority of mainstream Linux distributions—extremely feature-rich but consequently large and complex, loading numerous services and dependencies at startup. Runit is a minimalist Init system that manages services through directory monitoring (one directory per service containing a run script), offering extremely fast startup, minimal resource usage, and simple troubleshooting. For servers focused on running AI Agents, the complex features SystemD provides (desktop integration, log management, etc.) aren't needed, and Runit's lightweight characteristics leave more system resources for actual workloads.

Distribution: Artix Linux or CachyOS (both based on Arch Linux but using Runit)

Artix Linux and CachyOS are both derivative distributions based on Arch Linux. Arch Linux is known for rolling updates, bleeding-edge packages, and high customizability, but defaults to SystemD. Artix Linux's core difference is the removal of SystemD, offering OpenRC, Runit, s6, and other alternative Init systems for users to choose from, while retaining Arch Linux's pacman package manager and AUR (Arch User Repository) ecosystem. CachyOS focuses on performance optimization, defaulting to packages compiled for modern CPU instruction sets (such as x86-64-v3/v4) and providing tuned kernel configurations. Both combine the software richness of the Arch ecosystem with their respective optimizations, making them ideal for server scenarios pursuing peak performance.

Kernel: Staying on the latest version (currently 7.03, preparing to upgrade to 7.1)

Using the RAW Image + Loop Setup approach, you can directly mount virtual hard disks in any Linux environment (even a Live CD) without installing additional software—a massive boost to operations efficiency.

The Bargain Hunter Philosophy: Maximum Output at Minimum Cost

Returns on Hardware Tinkering

The creator admitted that many people question whether his "bargain hunting" approach of tinkering with various hardware and systems is a waste of time. But now that the AI Agent era has arrived, all that accumulated knowledge has transformed into productivity:

Familiarity with various hardware enables quick selection of the right machine for different scenarios
Linux system expertise enables creation of one-click deployment images
Understanding of underlying principles enables rapid diagnosis and repair when Agents encounter problems

The essence of this "bargain hunter" spirit is full-stack control over the technology chain. When AI Agents need to run on physical hardware, understanding the complete chain from BIOS/UEFI firmware, disk controllers, and file systems to user-space services means being able to quickly intervene at any point of failure, rather than being helpless as if facing a black box.

Commercial Potential

This skill set has already begun generating commercial value. The creator mentioned that people have found him through QQ groups to produce AI videos (he already has a complete AI video workflow), and partners pay him to deploy Agent environments—they cover the Token subscription costs while he handles system configuration and Agent tuning.

This business model is essentially "Agent Operations as a Service," converting technical barriers into service premiums. As AI Agents become more widespread, technical personnel who can efficiently deploy and maintain Agent runtime environments will become a new scarce resource.

Current State and Outlook for Local Models

Regarding local large model deployment, the creator stated he had conducted experiments earlier, but limited by his hardware performance, his primary workload still runs in the cloud. He has two small models running locally for testing, with detailed content to be covered in future videos.

The main bottleneck for local large model deployment lies in VRAM capacity and memory bandwidth. Taking the current mainstream 7B parameter model as an example, FP16 precision requires approximately 14GB of VRAM, while a 70B model requires approximately 140GB. Quantization techniques (such as 4-bit quantization in GGUF format) can dramatically reduce VRAM requirements but introduce some precision loss. For personal operations scenarios, small parameter models (such as 3B-7B) fine-tuned for specific tasks can often provide sufficient capability at extremely low hardware costs.

His pragmatic attitude is worth emulating: don't chase running the largest model—chase getting things done. The focus isn't on cutting-edge research like big tech companies pursue, but on how to get models running and producing real value at minimum cost.

Summary

The core logic of this approach is clear: control the underlying architecture (bootloader, disk format) yourself, hand network and system operations to Cloud Code, delegate batch soft operations to Hermes, and use local models as supplements. Operations work "becomes extraordinarily easy," enabling taking on more work, managing more machines, and covering more scenarios. This is a textbook case of AI Agents empowering individual productivity.

From a broader perspective, this approach represents an emerging "personal infrastructure" paradigm—individuals are no longer just consumers of cloud services but become operators of small-scale infrastructure through AI Agents. When AI dramatically reduces the marginal cost of deployment and operations, the scale of resources an individual can manage will grow by orders of magnitude, potentially spawning a large number of new personalized technical service models.