npcpy: An Open-Source Framework That Rethinks AI Agent Development with Software Engineering Principles

npcpy replaces prompt engineering mysticism with software engineering principles to build controllable AI Agents.
npcpy is a lightweight AI Agent framework whose core philosophy is constraining agents with software engineering logic rather than relying on prompts. It features a four-layer decoupled architecture (NPC/Context/Agent/Tool), supports vectorized parallel multi-agent collaboration and adversarial cooperation, and innovatively introduces sleep pruning, dream-based cross-domain discovery, and population evolution mechanisms for dynamic self-iterating knowledge graphs. From local inference to cloud deployment, the toolchain is complete and open source.
Why Do We Need npcpy?
Anyone who's worked on AI Agent development has probably experienced this: prompt chains are fragile and expensive, logic drifts unpredictably, and getting things to run reliably in production feels like a gamble. Traditional Agent frameworks put too much pressure on prompt engineering, and prompts themselves carry inherent uncertainty—making large-scale deployment a risky bet.
npcpy's core philosophy is clear—constrain Agents with software engineering principles, rather than leaving everything to prompts. Its core codebase is only around 1,400 lines, yet it covers the full pipeline from local inference to cloud models, and from single agents to multi-agent collaboration. Version 1.11.0 is already available, with the goal of letting developers build agents the same way they write traditional software—with controllable logic and stable execution.

Four-Layer Architecture: NPC, Context, Agent, Tool
npcpy's architecture is remarkably intuitive, broken into four distinct layers that each have clear responsibilities while working closely together.
NPC Layer: Decoupling Persona from Capabilities
The top-level NPC layer goes far beyond writing a System Prompt. It turns persona definitions, behavioral instructions, and capability interfaces into pluggable modules. The core design principle is decoupling—completely separating "who you are" from "what you can do." The benefit is that adjusting an agent's identity characteristics won't affect its capability performance, and vice versa.
Context Layer: A Dynamic Memory System Based on Knowledge Graphs
The Context layer is where the multimodal knowledge graph lives. It's not simple key-value storage—it's a dynamic memory system. Agent reasoning and orchestration rely entirely on it, with multimodal data flowing directly into the knowledge graph to provide a structured foundation for retrieval and inference.
Agent Layer and Tool Layer
The Agent layer handles decision-making, while the Tool layer uses the MCP protocol to interface with various external services. ToolAgent flexibly connects to services like Stable Diffusion and Hugging Face through Connectors; CodingAgent can execute code directly in a sandbox, automatically handling errors and forming closed-loop feedback; the Skill system supports plugin-style definitions, allowing expert skills to be written in Markdown-like syntax—for example, forcing the model to use the SymPy library for math problems.
On the model adaptation front, everything from local Ollama to cloud-based OpenAI, Anthropic, and DeepSeek integrates seamlessly, running across various compute environments.
Multi-Agent Collaboration: Vectorized Parallelism and Debate Mechanisms
Once individual agents become powerful, the next step is multi-agent collaboration. npcpy introduces two key designs in this area.
NPC Array: Vectorized Parallel Inference
Running agents sequentially used to mean unbearable IO latency. npcpy borrows from SIMD (Single Instruction, Multiple Data) thinking, packing hundreds of agents' inference tasks into a single vector. One predict call processes all observations in parallel, then reduce aggregates the results—completely eliminating IO jitter at the engineering level.
Iterative Game System: Adversarial Collaboration
Even more interesting is the iterative game mechanism. Instead of simple voting logic, npcpy assigns different roles to agents with distinct responsibilities: MathSolver proposes solutions, Analyst balances arguments, Verifier and Steema force convergence. This adversarial collaboration improves complex problem-solving rates by 42%.

Knowledge Graph Lifecycle: Sleep, Dreams, and Evolution
npcpy's most hardcore feature is designing knowledge management mechanisms as if agents were living organisms.
Cold Start and Dynamic Updates
Graph cold-start doesn't rely on expert-written definitions—instead, the engine automatically extracts ontologies from massive text corpora. For real-time updates, the system continuously absorbs streaming data, with entity alignment and conflict resolution all handled within evolution functions. Hybrid search combines structured facts, semantic vectors, and heuristic reasoning, allowing agents to retrieve concrete facts while also making speculative cross-concept connections.
Sleep Mechanism: Knowledge Pruning and Deduplication
Multimodal graphs inevitably accumulate redundancy and conflicts during dynamic growth. The sleep phase is essentially large-scale pruning—periodically merging semantically overlapping nodes and automatically cleaning up low-confidence or outdated logical edges. Like the brain filtering out noise, it solidifies core knowledge by reinforcing high-frequency pathways. The current redundancy pruning rate has reached 78%.
Dream Mechanism: Cross-Domain Knowledge Discovery
The dream phase goes further, leveraging LLM generative capabilities for speculative logical leaps, attempting to establish connections between completely different domains. For example, through dream mode, the system created non-intuitive loose couplings between tidal acceleration rates, geological evolution, and orbital mechanics models. Speculative dream connection points have broken through 182%, significantly enhancing cognitive depth.
C-Memolution: Population Evolution Mechanism
Cross-domain associations have no practical value if they can't be stably inherited. The C-Memolution mechanism borrows from population genetic selection logic: from a pool of 100 knowledge variants, 10 are randomly selected each round for real-world testing—whoever solves problems more accurately becomes a parent entering the evolution cycle. The selection phase looks only at rankings; reproduction involves structural crossover and mutation, retaining efficient pathways and pruning redundancy. The current best individual fitness reaches 91.5%, and a diversity index of 0.62 indicates the population is still actively exploring.

Engineering in Practice: Fine-Tuning, Deployment, and Multimodal Extensions
Model Fine-Tuning and Hardware Adaptation
For alignment, DPO with Beta set to 0.1 was chosen to unlock complex reasoning potential while maintaining stability. For scientific writing scenarios, SFT was used to transfer Llama 3's academic style. On the hardware side, deep adaptation for Apple Silicon runs LoRA under the MLX architecture with rank pushed to 128, combined with Flash Attention 2.0, significantly boosting both training speed and knowledge absorption rate.

MCP Protocol Integration and One-Click Deployment
npcpy natively supports the MCP protocol—local file systems, Postgres, Slack, and other interfaces can be mounted directly. The CLI toolchain handles everything from project initialization to debugging and syncing smoothly. Configuration uses declarative TeamCTX, decoupling collaboration architecture from specific logic so that adjusting team structure doesn't require touching underlying business code. After local testing, running team serve packages the entire team into an authenticated REST API, ready for production.
Installation and Multi-Platform Compatibility
npcpy offers three installation versions for different development scenarios:
- Lite version: API calls for basic development
- Light version: Local inference and vector databases
- Full version: Multimodal full-stack development
Compatible with Ubuntu production environments, Mac Apple Silicon, and Windows WSL—essentially plug and play.
Summary: npcpy's Core Value
npcpy's core value comes down to three points: First, the NPC/Context/Agent/Tool four-layer decoupling transforms chaotic prompt engineering into standard software architecture; Second, Sleep/Dream combined with C-Memolution turns static knowledge graphs into self-iterating dynamic systems; Third, from local fine-tuning to enterprise-grade clusters, the toolchain handles it all.
Whether you're an individual developer tinkering with local models or an enterprise deploying large-scale Agents, npcpy provides a clear path forward. The code is open source—just pip install npcpy --upgrade to get started. For developers tired of prompt engineering mysticism who want to build reliable AI Agents the engineering way, this framework is worth a serious look.
Related articles
TutorialsCursor + Codex Dual-IDE Collaboration: A Practical Methodology for Open-Source Project Customization
A complete methodology for open-source project customization based on real-world experience, detailing the Cursor+Codex dual-IDE workflow, seven-stage process, MVP validation, and AI source code reading techniques.
TutorialsCursor Multi-Agent in Practice: Building a Full-Stack Next.js Blog in 50 Minutes
Build a full-stack blog in 50 minutes using Cursor IDE's multi-Agent mode with Next.js, Clerk auth, and Supabase. Learn the 4-phase AI Agent workflow and key integration pitfalls.
TutorialsBuilding an AI Software Factory from Scratch: A Cursor Engineer's Hands-On Experience with Multi-Agent Collaboration
Cursor engineer Eric shares practical insights on building an AI software factory: automation levels, guardrail design, parallel Agent management, and scaling to 1000+ Agents for 24/7 development.