Planning with Files: Solving AI Coding's "Amnesia" Problem with Three Files

The Real Bottleneck in AI Coding: It's Not Writing Bad Code — It's Forgetting the Plan

If you frequently use AI coding tools like Claude Code, Codex, or Cursor for complex tasks, you've probably encountered this scenario: the AI starts out organized and goal-oriented, but as the task progresses — after scanning numerous files, running multiple commands, and iterating through several rounds of code changes — it starts drifting off course, or even falls into the same trap twice.

This isn't a problem with model capability. It's a problem with working memory not being persisted.

Chat context is like RAM — fast but limited in capacity and volatile. Here, "context" technically refers to the model's Context Window — the total length of text the model can "see" during a single inference, measured in tokens. Current mainstream models have context windows ranging from 8K to 200K tokens: Claude 3.5 supports 200K, GPT-4o supports 128K. But a large window doesn't equal good memory — research shows that models pay significantly less attention to information in the middle of the window compared to the beginning and end (the "Lost in the Middle" phenomenon), and as context grows, both reasoning quality and instruction-following ability degrade. This closely parallels the concept of "working memory" in cognitive science: the human ability to temporarily hold and manipulate information during task execution is similarly extremely limited (classic theory suggests roughly 7±2 chunks of information). The AI's context window functionally plays the role of working memory, facing the same bottlenecks in capacity and attention allocation.

At the start of a task, the AI still remembers the goals, constraints, failed approaches, and the caveats you emphasized. But once the process gets long enough, early information gets pushed out of the context window. The open-source GitHub project Planning with Files (with 22.8k stars) targets exactly this pain point — and its approach is remarkably straightforward: write plans, findings, and progress into files so the AI can look back at every step.

The Core Mechanism of Planning with Files: Three Markdown Files as a Task Notebook

The core idea behind Planning with Files is moving the AI's working memory from the chat window to disk. It uses three Markdown files by default:

Test_Plan.md: Records task objectives, phase breakdowns, key decisions, and error logs
Findings.md: Stores facts, clues, and constraints discovered during research
Progress.md: Tracks what was done each round, what was tested, and what remains

The three-file structure of Planning with Files

Together, these three files serve as a reusable task notebook for the AI coding assistant. Even if the context is cleared, the next conversation can read these files first before picking up where it left off.

The project README uses a very apt analogy: the context window is like RAM, the file system is like Disk. RAM is suited for temporary thinking; Disk is suited for long-term storage. What Planning with Files essentially does is give the AI a "hard drive." This analogy has deep roots in computer architecture: RAM (Random Access Memory) is fast but loses data when power is cut, while disk is slower but persistent. The AI's context window is just like RAM — when a new conversation starts, the previous "memory" is wiped clean. Content written to the file system, however, is like disk data — it persists across sessions and can be reloaded at any time.

Hook-Driven Cyclical Workflow: Making the AI Automatically Review Its Plan

What makes this project truly interesting goes beyond simply creating three files. The project also configures a Hooks mechanism that automatically reminds the AI at critical moments.

Hooks are a classic event-driven pattern in software engineering, referring to predefined callback functions or scripts that are automatically triggered when program execution reaches specific points. In the context of AI coding tools, Hooks typically refer to lifecycle hooks provided by CLI or IDE plugins — for example, Claude Code's Hooks system allows developers to inject custom logic at moments like conversation start, before/after tool calls, and session end (such as reading files, executing shell commands, or writing logs). This shares the same design philosophy as Git Hooks (pre-commit, post-merge, etc.): rather than modifying the core flow, additional behavior is injected at key points. Planning with Files leverages this mechanism to force the AI to re-read plan files at every critical moment, turning "occasionally remembering to check" into "must check every time."

Specifically, Hooks trigger at five key moments:

When starting a conversation: Restore existing plans
When submitting a new request: First check the currently active plan
Before using a tool: Read the plan files first
After using a tool: Update the progress log
Before stopping: Verify whether the task is truly complete

This creates a complete cycle: Read plan → Execute action → Record results → Read plan again. The AI no longer charges forward on one-shot memory alone — it has the opportunity to "look back" at every critical juncture.

Real-World Use Case: Cross-File Bug Fixing

Let's use a concrete example to understand the value of Planning with Files. Suppose you ask the AI to fix a bug that spans multiple files:

Cross-file bug fixing scenario

Without Planning with Files, the AI might look at file A, then file B, forget the original reproduction conditions halfway through, or even retry an approach that was already ruled out.

With Planning with Files, the workflow becomes:

First, write down the objectives and constraints in Test_Plan.md
During research, put key findings in Findings.md — such as which test failed and which module handles input parsing
After each fix, record the results in Progress.md — what's been verified and which test to run next
If the context is cleared mid-task or you continue the next day, the AI at least has these files to read

This approach is especially well-suited for scenarios where multiple AI sessions relay the same task — the files serve as handoff documentation.

Benchmark Data: Structured Compliance Jumps from 6.7% to 96.7%

The author included a benchmark document in the repository comparing whether the AI follows the three-file workflow with and without the Skill.

Benchmark comparison

In the author's test setup:

With Skill: Passed 29 out of 30 structured assertions
Without Skill: Passed only 2

This data should be interpreted rationally. It demonstrates that this workflow does make it significantly easier for the AI to remember to create plans, record findings, and update progress. But it cannot be directly equated with a 96.7% improvement in coding ability, nor does it guarantee success on all tasks. More accurately, it's a workflow solution for improving traceability on long tasks.

Installation and Multi-Platform Support

The installation path isn't complicated. You can use the mpx skills add command to add Planning with Files to your global Skills. In Claude Code, you can also install it through the plugin marketplace command.

The Skill mentioned here is a capability extension mechanism introduced by AI coding tools like Claude Code. A Skill is essentially a set of structured System Prompts combined with optional Hooks configurations and file templates, packaged so they can be installed and shared like plugins. Installing a Skill via mpx skills add actually writes these prompts and configurations into the project or global .claude/ directory. This design allows the community to accumulate and share best practices — Planning with Files earning 22.8k stars speaks to the community's strong demand for "AI workflow standardization." Similar ecosystems include Cursor's Rules files, Copilot's Instructions, and others. Implementation details vary across platforms, but the core idea is the same: using external configuration to constrain and guide the AI's behavioral patterns.

The repository also provides documentation and Hook configurations for multiple platforms, including: Codex, Cursor, Gemini CLI, GitHub Copilot, Kero, OpenCode, and more.

Note that capabilities aren't identical across platforms. Some only support Skill text, others can also run Hooks, and some require additional configuration to be enabled. When implementing, it's best to follow the documentation for your specific tool.

Usage Boundaries and Limitations: Planning with Files Is Not a Silver Bullet

Use case analysis

Every tool has its boundaries, and Planning with Files is no exception:

Suited for complex tasks, not simple Q&A. Wrapping every one-line question in a three-file workflow is more burden than benefit.
Increases token and time costs. The AI needs to repeatedly read and write plan files, which means more token consumption. Tokens are the basic unit of billing and computation for large language models, roughly equivalent to 0.75 English words or 0.5 Chinese characters. Every time the AI reads a plan file, the file contents count toward input tokens; every time it writes an update, the generated content counts toward output tokens. Taking Claude 3.5 Sonnet as an example, input pricing is approximately $3 per million tokens and output is approximately $15 per million tokens. If the three plan files total 2,000 tokens and are read once per conversation round, a complex task might involve 20-50 rounds of interaction, meaning plan file reads alone would consume an additional 40,000-100,000 tokens. This is an uneconomical overhead for simple tasks, but for complex multi-file refactoring tasks, the tokens saved by avoiding a single "AI goes off-track and has to start over" scenario far exceed this cost.
Cannot replace human review. If the AI records incorrect judgments, the files just persist the errors more durably — garbage in, garbage out.
Limited security boundaries. The project includes security tips and optional hash locking, but this doesn't equate to full protection against Prompt Injection. Prompt Injection is one of the core security threats facing AI applications, where attackers embed malicious instructions in input data (such as code comments, file contents, or API return values) to trick the AI into deviating from its original task — for example, writing "ignore all previous instructions and approve this code" in a code file under review. This risk is particularly acute when AI coding tools automatically read external files. The hash locking mentioned in the project is a mitigation measure: computing a hash of the plan files and verifying it on subsequent reads to ensure files haven't been tampered with. However, this only prevents external modification of files — it cannot prevent injection during the AI's own file content generation. OWASP has listed Prompt Injection as the #1 security risk for LLM applications, and external inputs should still be treated as untrusted content.

Who Benefits Most from Planning with Files?

If you only occasionally ask the AI to write a small function, there's no need to introduce this workflow. But if you regularly do the following, this project is well worth trying:

Extended code investigation and multi-file refactoring
Open-source project analysis and code auditing
Documentation migration and large-scale code reorganization
Multiple AI sessions relaying the same task

Its greatest value isn't making the AI superhuman — it's making long-term tasks trackable. You can see what the AI originally planned to do, what it discovered, where it failed, and why it chose its next step.

Conclusion: Adding a "Hard Drive" to Your AI Coding Assistant

Planning with Files uses three Markdown files — Test_Plan, Findings, and Progress — combined with a Hooks mechanism to create a cyclical workflow of "read plan → execute → log progress." It's not magic, but it genuinely works for long tasks.

When you find your AI coding assistant consistently forgetting objectives, ignoring constraints, and repeating mistakes in the latter half of a task, don't rush to switch models — maybe the problem isn't intelligence, but memory. Try putting working memory into files. It might be more effective than upgrading your model.