Claude Code Firmware-Level Ops in Practice: Virtual Disk Expansion & Local Agent Deployment

Claude Code excels at firmware-level ops; Agent framework design matters more than the model itself
A seasoned ops engineer tested Claude Code on low-level system operations (Ventoy management, virtual disk expansion, filesystem conversion) and found it excels at understanding complex boot mechanisms and proactively fixing UUID issues. The key insight is that Agent framework (Harness) design quality matters more than the model backend — rigorous architecture can compensate for model shortcomings. The article also shares a low-cost local Agent distributed deployment solution using old servers and idle GPUs.
Overview
In the practice of AI-assisted operations, Claude Code has demonstrated impressive low-level system architecture comprehension. A seasoned ops engineer (Bilibili creator "雲姥工") shared his hands-on experience using Claude Code for hardware firmware-level maintenance, covering hardcore operations like Ventoy Linux management, virtual disk expansion, and filesystem conversion, while also exploring feasible local Agent deployment solutions.

Claude Code's Real-World Performance in Firmware-Level Operations
Low-Level Architecture Understanding Beyond Expectations
Based on the creator's hands-on testing, Claude Code demonstrates outstanding capability in grasping low-level architecture (OS layer/firmware layer). Compared to other AI coding solutions, it exhibits the following characteristics:
- Reliable solutions: Proposed operational suggestions are rigorous and standardized, with no "messy operations"
- Deep understanding: Capable of understanding Ventoy's working principles and boot mechanisms
- Suited for hardcore tasks: Particularly suitable for handling critical issues like whether a system can boot normally
Technical Background of Ventoy
Ventoy is an open-source USB bootable drive creation tool. Its core innovation is that users don't need to repeatedly format USB drives — they can simply copy ISO/WIM/IMG image files directly to the USB drive to boot. Ventoy achieves direct booting of multiple image formats through a custom bootloader (supporting both Legacy BIOS and UEFI modes). Its working principle involves MBR/GPT partition table parsing, GRUB boot chain loading, and locating boot files through UUID-identified partitions. UUID (Universally Unique Identifier) is crucial in this context — Ventoy's bootloader relies on partition UUIDs to locate data partitions. Once a UUID changes due to operations like filesystem conversion without synchronizing the configuration update, the entire boot chain breaks. It's precisely because of this complex dependency relationship that Claude Code's ability to proactively identify and correct UUID issues is particularly remarkable.
Practical Task: Virtual Disk Expansion & Filesystem Conversion
The creator assigned Claude Code a complex multi-step task involving a complete overhaul of a RAW format virtual disk:
- Low-level disk expansion: Expand the virtual disk from 60GB to 100GB
- Partition-level expansion: Synchronously expand space in the upper partition
- Filesystem conversion: Convert ext4 to btrfs
- UUID correction: Adjust UUIDs to ensure Ventoy can boot normally
RAW format virtual disk is an unencapsulated bare disk image format that maps each sector of a physical disk 1:1, without the metadata headers found in formats like QCOW2 or VMDK. The advantage of this format is optimal I/O performance and maximum compatibility, but the downside is no support for snapshots or dynamic expansion. The ext4-to-btrfs filesystem conversion is a high-risk operation — btrfs provides a native ext4 conversion tool (btrfs-convert) that achieves in-place conversion by saving ext4 metadata as a subvolume, but this process requires sufficient free space in the partition, and the UUID will change after conversion. Compared to ext4, btrfs offers modern filesystem features like Copy-on-Write (CoW), built-in snapshots, transparent compression, and RAID support, making it particularly suitable for ops scenarios requiring frequent backups and rollbacks.
This task chain is tightly interconnected — any single misstep could render the system unbootable. Claude Code not only completed all operations but also demonstrated deep understanding of Ventoy's boot mechanism — it knew that an incorrect UUID would cause Ventoy boot failure, which the creator described as "terrifying."
Comparison Test Results Across Different Model Backends
Performance Differences Across Three Tiers of Models
The creator tested different model backends within Claude Code's Agent framework and reached clear tiered conclusions:
| Tier | Model | Performance |
|---|---|---|
| First Tier | DeepSeek Professional | Strongest performance |
| Second Tier | MiniMax | Mid-level, stable and usable |
| Third Tier | DeepSeek Flash | Bottom tier but still better than local small models |
Agent Framework (Harness) Matters More Than the Model Backend
A key insight is: The Agent layer's Harness design matters more than the model backend. The creator noted:
"If the model backend is relatively weak, you still need to look at whether your architecture is good. If your architecture is stable, things can still hold; if your architecture is unstable, it will propose very strange suggestions and execute them, and you'll crash."
In AI Agent architecture, Harness (constraint framework) refers to the complete set of workflow control mechanisms built around the large language model, including system prompt design, tool invocation permission management, output validation, error recovery strategies, and safety boundary settings. The reason Claude Code's Agent framework performs excellently in ops scenarios is that its Harness designs strict operation confirmation mechanisms and step-by-step execution strategies — it won't execute all commands at once, but verifies results after each critical step before continuing. This design philosophy is similar to the "change management" process in operations: every change requires verification, rollback points, and confirmation. In contrast, even if the underlying model is highly capable, a loosely designed Harness may lead to dangerous operations due to lack of constraints.
In other words, a rigorously designed Agent framework can compensate for model capability shortcomings to a certain extent. This has significant reference value for scenarios with limited budgets but requiring reliable operations.
Local Agent Deployment Solution in Detail
Hardware Configuration & Cooling Optimization
The creator is building a local Agent runtime environment using an extreme cost-performance approach:
- Current configuration: Old server motherboard + GTX 1070 GPU (repurposed idle hardware)
- Cooling solution: DIY water cooling with reapplied thermal paste, followed by stress testing
- Stress test results: CPU temperature controlled below 55-70°C under full load, stable performance
- Memory: DDR4 2666, performance exceeding expectations
Local Agent Architecture Design Philosophy
The creator planned a distributed local AI architecture:
- Desktop machine (laptop with 3070): Running CacheOS, responsible for local model inference, while also handling daily use and gaming
- Agent machine (old server + 1070): Dedicated to running Agent tasks, connected to the model inference machine via network ports
- Future upgrades: Plans to leverage dual PCIe 3.0 x16 slots for dual GPUs
This distributed architecture that separates "model inference" and "Agent execution" onto different physical machines is the engineering-optimal solution in resource-constrained environments. Model inference is a compute-intensive task requiring GPU parallel computing to accelerate matrix operations; Agent execution mainly involves system calls, file operations, and network communication, with higher demands on CPU and I/O performance. Connecting the two via network ports (typically using OpenAI-compatible API interfaces, such as llama.cpp's server mode or Ollama's REST API) allows each machine to focus on what it does best. CacheOS is an operating system designed for NAS and server management that supports Docker containerized deployment, making it ideal as a host system for local AI inference services.
Local Model Application Scenarios
Local model applications currently focus on lightweight tasks:
- Voice announcements: Text-to-speech for real-time broadcasting of Telegram messages and task completion notifications
- Background monitoring: Automatic voice notification when Agent completes tasks
- Future expansion: Potentially expanding to local video production (video production currently still relies on cloud-based MiniMax)
The creator believes that 2B-3B parameter models are sufficient for these scenarios — CPUs alone can handle them, and adding a GPU makes things faster and smoother. Models with 2B-3B parameters (such as Qwen2.5-3B, Phi-3-mini, etc.) require only 2-3GB of VRAM after INT4 quantization, well within the GTX 1070's 8GB VRAM capacity, and can even load multiple small models simultaneously for different tasks.
Unique Advantages of Ops Engineers in Local AI Deployment
As a tech enthusiast with an operations background, the creator believes that compared to players who specifically buy second-hand mining GPUs to run AI, ops engineers have unique advantages:
- Understanding hardware limits: Knowing where the optimal cost-performance sweet spot lies
- System stability experience: Capable of handling engineering challenges like cooling, power supply, and long-term operation
- Incremental deployment: Testing stability on small machines first, then migrating to production environments
This pragmatic engineering mindset is precisely the capability most needed for AI deployment in actual production environments. Ops engineers have long worked with hardware and deeply understand the engineering philosophy of "24/7 stable operation" — they know the real-world performance of a cheap server motherboard under sustained high loads, understand lifespan differences between capacitor brands, and are aware of how data center ambient temperature affects hardware longevity. These experiences translate into enormous advantages in local AI deployment: they won't blindly chase computing power specs, but instead find the optimal balance between performance, stability, power consumption, and cost.
Conclusion
The rigor and reliability Claude Code demonstrates in firmware-level operations proves that AI-assisted ops has entered a practical stage. The discovery that Agent framework design quality matters more than the model itself points the way for individual developers and small teams with limited resources — rather than chasing the strongest model, it's better to refine the Agent's workflow and constraint mechanisms. The local Agent deployment solution demonstrates a low-cost, high-availability path for AI operations implementation.
Key Takeaways
- Claude Code excels at firmware/OS-level ops tasks, understanding Ventoy boot mechanisms and completing complex operation chains like virtual disk expansion and filesystem conversion
- Agent framework (Harness) design quality matters more than the model backend; rigorous architecture can compensate for model capability shortcomings
- Different model backends show three tiers of performance under the Agent framework: DeepSeek Professional strongest, MiniMax mid-tier, DeepSeek Flash bottom tier
- Local Agent deployment uses an extreme cost-performance approach with old servers + idle GPUs, with distributed architecture separating inference and Agent tasks
- Operations experience provides unique advantages in local AI deployment, including hardware limit understanding, system stability assurance, and incremental deployment strategies
Related articles
Product ReviewsQoder vs Cursor Real-World Comparison: Which $20/Month AI IDE Is Better?
Hands-on comparison of Qoder vs Cursor AI IDEs: Agent autonomy, human interaction count, and architecture decisions. Qoder needed only 2 interactions vs Cursor's 8.
Product ReviewsCursor Cloud Agent Demo: Eliminating Bottlenecks Across the Entire Software Development Lifecycle
Deep analysis of Cursor's Cloud Agent demo showing how cloud VMs, automated test artifacts, and a full-chain control plane systematically eliminate human bottlenecks across the software development lifecycle.
Product ReviewsCursor 3.0 Deep Dive: Multi-Agent Parallelism, Design Mode, and Best-of-N Model Comparison
Cursor 3.0 evolves from an AI coding assistant into an Agent fleet command center. Explore multi-agent parallelism, Design Mode, and Best-of-N model comparison.