Claude Code Firmware-Level Ops in Practice: Virtual Disk Expansion & Local Agent Deployment

Overview

In the practice of AI-assisted operations, Claude Code has demonstrated impressive low-level system architecture comprehension. A seasoned ops engineer (Bilibili creator "雲姥工") shared his hands-on experience using Claude Code for hardware firmware-level maintenance, covering hardcore operations like Ventoy Linux management, virtual disk expansion, and filesystem conversion, while also exploring feasible local Agent deployment solutions.

bilibili source

Claude Code's Real-World Performance in Firmware-Level Operations

Low-Level Architecture Understanding Beyond Expectations

Based on the creator's hands-on testing, Claude Code demonstrates outstanding capability in grasping low-level architecture (OS layer/firmware layer). Compared to other AI coding solutions, it exhibits the following characteristics:

Reliable solutions: Proposed operational suggestions are rigorous and standardized, with no "messy operations"
Deep understanding: Capable of understanding Ventoy's working principles and boot mechanisms
Suited for hardcore tasks: Particularly suitable for handling critical issues like whether a system can boot normally

Technical Background of Ventoy

Ventoy is an open-source USB bootable drive creation tool. Its core innovation is that users don't need to repeatedly format USB drives — they can simply copy ISO/WIM/IMG image files directly to the USB drive to boot. Ventoy achieves direct booting of multiple image formats through a custom bootloader (supporting both Legacy BIOS and UEFI modes). Its working principle involves MBR/GPT partition table parsing, GRUB boot chain loading, and locating boot files through UUID-identified partitions. UUID (Universally Unique Identifier) is crucial in this context — Ventoy's bootloader relies on partition UUIDs to locate data partitions. Once a UUID changes due to operations like filesystem conversion without synchronizing the configuration update, the entire boot chain breaks. It's precisely because of this complex dependency relationship that Claude Code's ability to proactively identify and correct UUID issues is particularly remarkable.

Practical Task: Virtual Disk Expansion & Filesystem Conversion

The creator assigned Claude Code a complex multi-step task involving a complete overhaul of a RAW format virtual disk:

Low-level disk expansion: Expand the virtual disk from 60GB to 100GB
Partition-level expansion: Synchronously expand space in the upper partition
Filesystem conversion: Convert ext4 to btrfs
UUID correction: Adjust UUIDs to ensure Ventoy can boot normally

RAW format virtual disk is an unencapsulated bare disk image format that maps each sector of a physical disk 1:1, without the metadata headers found in formats like QCOW2 or VMDK. The advantage of this format is optimal I/O performance and maximum compatibility, but the downside is no support for snapshots or dynamic expansion. The ext4-to-btrfs filesystem conversion is a high-risk operation — btrfs provides a native ext4 conversion tool (btrfs-convert) that achieves in-place conversion by saving ext4 metadata as a subvolume, but this process requires sufficient free space in the partition, and the UUID will change after conversion. Compared to ext4, btrfs offers modern filesystem features like Copy-on-Write (CoW), built-in snapshots, transparent compression, and RAID support, making it particularly suitable for ops scenarios requiring frequent backups and rollbacks.

This task chain is tightly interconnected — any single misstep could render the system unbootable. Claude Code not only completed all operations but also demonstrated deep understanding of Ventoy's boot mechanism — it knew that an incorrect UUID would cause Ventoy boot failure, which the creator described as "terrifying."

Comparison Test Results Across Different Model Backends

Performance Differences Across Three Tiers of Models

The creator tested different model backends within Claude Code's Agent framework and reached clear tiered conclusions:

Tier	Model	Performance
First Tier	DeepSeek Professional	Strongest performance
Second Tier	MiniMax	Mid-level, stable and usable
Third Tier	DeepSeek Flash	Bottom tier but still better than local small models

Agent Framework (Harness) Matters More Than the Model Backend

A key insight is: The Agent layer's Harness design matters more than the model backend. The creator noted:

"If the model backend is relatively weak, you still need to look at whether your architecture is good. If your architecture is stable, things can still hold; if your architecture is unstable, it will propose very strange suggestions and execute them, and you'll crash."

In AI Agent architecture, Harness (constraint framework) refers to the complete set of workflow control mechanisms built around the large language model, including system prompt design, tool invocation permission management, output validation, error recovery strategies, and safety boundary settings. The reason Claude Code's Agent framework performs excellently in ops scenarios is that its Harness designs strict operation confirmation mechanisms and step-by-step execution strategies — it won't execute all commands at once, but verifies results after each critical step before continuing. This design philosophy is similar to the "change management" process in operations: every change requires verification, rollback points, and confirmation. In contrast, even if the underlying model is highly capable, a loosely designed Harness may lead to dangerous operations due to lack of constraints.

In other words, a rigorously designed Agent framework can compensate for model capability shortcomings to a certain extent. This has significant reference value for scenarios with limited budgets but requiring reliable operations.

Local Agent Deployment Solution in Detail

Hardware Configuration & Cooling Optimization

The creator is building a local Agent runtime environment using an extreme cost-performance approach:

Current configuration: Old server motherboard + GTX 1070 GPU (repurposed idle hardware)
Cooling solution: DIY water cooling with reapplied thermal paste, followed by stress testing
Stress test results: CPU temperature controlled below 55-70°C under full load, stable performance
Memory: DDR4 2666, performance exceeding expectations

Local Agent Architecture Design Philosophy

The creator planned a distributed local AI architecture:

Desktop machine (laptop with 3070): Running CacheOS, responsible for local model inference, while also handling daily use and gaming
Agent machine (old server + 1070): Dedicated to running Agent tasks, connected to the model inference machine via network ports
Future upgrades: Plans to leverage dual PCIe 3.0 x16 slots for dual GPUs

This distributed architecture that separates "model inference" and "Agent execution" onto different physical machines is the engineering-optimal solution in resource-constrained environments. Model inference is a compute-intensive task requiring GPU parallel computing to accelerate matrix operations; Agent execution mainly involves system calls, file operations, and network communication, with higher demands on CPU and I/O performance. Connecting the two via network ports (typically using OpenAI-compatible API interfaces, such as llama.cpp's server mode or Ollama's REST API) allows each machine to focus on what it does best. CacheOS is an operating system designed for NAS and server management that supports Docker containerized deployment, making it ideal as a host system for local AI inference services.

Local Model Application Scenarios

Local model applications currently focus on lightweight tasks:

Voice announcements: Text-to-speech for real-time broadcasting of Telegram messages and task completion notifications
Background monitoring: Automatic voice notification when Agent completes tasks
Future expansion: Potentially expanding to local video production (video production currently still relies on cloud-based MiniMax)

The creator believes that 2B-3B parameter models are sufficient for these scenarios — CPUs alone can handle them, and adding a GPU makes things faster and smoother. Models with 2B-3B parameters (such as Qwen2.5-3B, Phi-3-mini, etc.) require only 2-3GB of VRAM after INT4 quantization, well within the GTX 1070's 8GB VRAM capacity, and can even load multiple small models simultaneously for different tasks.

Unique Advantages of Ops Engineers in Local AI Deployment

As a tech enthusiast with an operations background, the creator believes that compared to players who specifically buy second-hand mining GPUs to run AI, ops engineers have unique advantages:

Understanding hardware limits: Knowing where the optimal cost-performance sweet spot lies
System stability experience: Capable of handling engineering challenges like cooling, power supply, and long-term operation
Incremental deployment: Testing stability on small machines first, then migrating to production environments

This pragmatic engineering mindset is precisely the capability most needed for AI deployment in actual production environments. Ops engineers have long worked with hardware and deeply understand the engineering philosophy of "24/7 stable operation" — they know the real-world performance of a cheap server motherboard under sustained high loads, understand lifespan differences between capacitor brands, and are aware of how data center ambient temperature affects hardware longevity. These experiences translate into enormous advantages in local AI deployment: they won't blindly chase computing power specs, but instead find the optimal balance between performance, stability, power consumption, and cost.

Conclusion

The rigor and reliability Claude Code demonstrates in firmware-level operations proves that AI-assisted ops has entered a practical stage. The discovery that Agent framework design quality matters more than the model itself points the way for individual developers and small teams with limited resources — rather than chasing the strongest model, it's better to refine the Agent's workflow and constraint mechanisms. The local Agent deployment solution demonstrates a low-cost, high-availability path for AI operations implementation.

Key Takeaways

Claude Code excels at firmware/OS-level ops tasks, understanding Ventoy boot mechanisms and completing complex operation chains like virtual disk expansion and filesystem conversion
Agent framework (Harness) design quality matters more than the model backend; rigorous architecture can compensate for model capability shortcomings
Different model backends show three tiers of performance under the Agent framework: DeepSeek Professional strongest, MiniMax mid-tier, DeepSeek Flash bottom tier
Local Agent deployment uses an extreme cost-performance approach with old servers + idle GPUs, with distributed architecture separating inference and Agent tasks
Operations experience provides unique advantages in local AI deployment, including hardware limit understanding, system stability assurance, and incremental deployment strategies