Two MCP Plugins Combined: Solving Claude Code's Memory Loss Problem in Large Projects

The Problem: Why Claude Code "Loses Memory" on Large Projects

When developing large projects with Claude Code, many developers encounter this frustrating situation: after about 20 minutes of work, the AI starts forgetting which files it modified, loses track of your requirements, and even rewrites code that already exists.

This isn't Claude getting dumber—it's the context window being filled to capacity. The context window is the maximum amount of text a large language model can process at once, typically measured in tokens. Claude's context window is 200K tokens (approximately 150,000 words). While that sounds generous, in real coding scenarios, every MCP tool call that reads code or runs commands dumps raw data directly into the context. A single Literate snapshot takes up 56KB, and after half an hour, 40% of the context is already filled with junk data. When the context approaches its limit, the model either refuses to continue processing or is forced to compress earlier conversations, causing critical information loss—this is the technical essence of the so-called "memory loss" phenomenon.

Let me explain MCP (Model Context Protocol)—this is an open protocol released by Anthropic in late 2024, designed to provide AI models with standardized external tool-calling interfaces. Through MCP, Claude Code can read and write files, execute terminal commands, access databases, and more. However, the return results from each tool call are written verbatim into the context window—this is an inherent limitation of the current architecture.

What makes things worse is that Claude needs to read files one by one to understand your project. Facing a million-line codebase, it simply can't read everything. This creates a dilemma: read too little and understanding is insufficient; read too much and the context explodes.

Two MCP plugins combined give Claude Code the ultimate brain upgrade

Cloud Context: On-Demand Retrieval Instead of Reading Everything

The first MCP plugin comes from the Zelis team and has already earned 1.2K Stars. Its core approach is elegantly simple: index the codebase using a vector database, and retrieve what's needed on demand, rather than stuffing all the code into the context.

This essentially applies RAG (Retrieval-Augmented Generation) technology to code comprehension scenarios. RAG's core philosophy is: instead of making the model memorize all information, precisely retrieve relevant content when needed and provide it to the model. This both saves context space and covers a knowledge range far exceeding the context window's capacity.

Technical Highlights

AST-Based Smart Chunking: AST (Abstract Syntax Tree) is a tree-structured data representation generated when a compiler parses source code, precisely representing the syntactic hierarchy of the code. Traditional text chunking methods split by fixed character counts or line numbers, often cutting a function in half and resulting in incomplete semantics. Cloud Context uses syntax tree node boundaries to split code, ensuring each chunk is a complete semantic unit (such as a function definition, class declaration, or module). This results in higher quality vectorization and retrieval, with greatly improved semantic integrity.
BM25 + Vector Hybrid Search: This is the mainstream architecture choice for current RAG systems. BM25 is a classic keyword matching algorithm in information retrieval, calculating relevance scores based on term frequency and document length—excellent at precisely matching specific function names, variable names, and other identifiers. Vector retrieval converts text into high-dimensional vectors through embedding models and finds semantically similar content by calculating cosine similarity between vectors—it excels at understanding "intent" rather than literal matching. For example, if you search for "user authentication logic," it can find a function named validateToken. Combining both (Hybrid Search) covers both exact matching and semantic understanding scenarios, making code discovery both fast and accurate.
Significantly Reduced Token Consumption: Official data shows a 40% reduction in token consumption at equivalent retrieval quality. This means within the same context budget, Claude can process more useful information.

This plugin solves the "understanding the project" problem. Claude no longer needs to read the entire codebase file by file—instead, it works like an experienced developer who knows exactly where to find the information needed.

Context Mode: Context Management with 98% Compression Rate

Context Mode is the second critical plugin, currently with 14.8K Stars and ranked #1 on Happynews. It solves the context overflow problem with a more aggressive approach.

Core Mechanism

Raw return data from MCP tool calls doesn't enter the context directly. Instead, it's stored in a local SQLite database, with only refined summaries retained in the context. Context Mode chose SQLite over Redis or PostgreSQL primarily for zero-configuration and local operation—developers don't need to set up additional database services; the plugin works immediately after installation. Local storage also means sensitive code data never leaves the developer's machine, offering inherent privacy and security advantages. Real-world measurements: 315KB of raw data compressed to 5.4KB, achieving a 98% compression rate.

Planning Continuity—The Standout Feature

This feature addresses the pain point of Claude losing state when compressing conversations. When Claude Code's conversation length approaches the context window limit, the system triggers auto-compact, condensing earlier conversation content into summaries to free up space. During this process, structured information like task plans, completed steps, and to-do items can easily be lost or blurred during compression.

Context Mode's snapshot mechanism essentially serializes and saves the current task state (including the planning tree, file modification records, and decision context) before compression occurs, then injects these structured states into the new context after compression—achieving "lossy compression but lossless recovery." This means you'll never experience the "losing memory mid-conversation" situation again; the AI always remembers the complete plan and progress of the current task.

Combined Usage: A 1+1>2 Effect

The two plugins have complementary positioning, forming a complete Claude Code context management solution when used together:

Problem	Solution
Project too large to read completely	Cloud Context vector indexing + on-demand retrieval
Context overflow	Context Mode compressed storage + summary retention
Memory loss after compression	Context Mode snapshot recovery mechanism

Put simply: Cloud Context lets Claude understand the entire project, while Context Mode ensures it doesn't forget things from reading too much.

From a technical architecture perspective, this combination actually builds a layered memory system: Cloud Context serves as "long-term memory" (the index of the entire codebase), Context Mode's SQLite storage serves as "medium-term memory" (complete data from the current session), and Claude's context window is the "working memory" (refined information currently being processed). This layered design closely mirrors the memory hierarchy of human cognitive systems and represents the mainstream approach in current AI Agent architecture design.

Installation and Configuration Guide

Both plugins have low installation barriers:

Cloud Context: Requires an OpenAI API Key and a Zelis Cloud account for vectorizing and indexing the codebase. The OpenAI API Key is used to generate embedding vectors for code snippets (converting code text into mathematical vector representations), while Zelis Cloud provides hosted vector database services responsible for storing and retrieving these vectors. Initial indexing of a large codebase may take a few minutes to over ten minutes, but incremental updates afterward are very fast.
Context Mode: A single command gets it done—ready to use out of the box. Since it uses local SQLite storage, there are no external service dependencies and no network requirements.

Summary and Usage Recommendations

If you frequently use Claude Code for large project development (especially codebases exceeding 10,000 lines), this combination of two MCP plugins can significantly improve the development experience. They optimize the core bottlenecks of AI programming assistants from two dimensions: information retrieval efficiency and context management capability.

However, note that Cloud Context relies on external API services, incurring additional API call costs (primarily embedding generation fees, typically a few cents per million tokens, with monthly costs in the single-digit dollar range for medium-sized projects). While Context Mode's compression is highly efficient, in extreme cases summaries may lose certain details—particularly subtle differences between highly similar code snippets that may get merged in summaries. It's recommended to introduce these gradually in actual projects and observe the results before full adoption.

From a broader perspective, these two plugins represent an important direction in AI programming tool development: breaking through the model's inherent context limitations through external memory systems. As codebases continue to grow and AI programming scenarios become increasingly complex, similar "memory augmentation" solutions will become standard capabilities in AI development tools.