Gemini CLI Complete Guide: MCP Extensions & Memory Files in Practice

Google launches Gemini CLI with 1M token context and MCP extensions to rival Claude Code and Codex CLI.
Google's Gemini CLI, powered by Gemini 2.5 Pro, offers three core features: a 1-million-token context window, MCP Server extensions, and memory files. The ultra-long context enables holistic project analysis and cross-file refactoring; MCP integration connects external tools like Context7 and Taskmaster for real-time documentation retrieval and task breakdown; memory files ensure the AI consistently follows development standards. Hands-on testing successfully built a multi-agent AutoGen workflow, demonstrating a complete loop from requirements to implementation.
Google's newly released Gemini CLI has officially entered the AI coding tool arena, going head-to-head with Anthropic's Claude Code and OpenAI's Codex CLI. With a 1-million-token ultra-long context window, MCP Server extension support, and project memory file functionality, Gemini CLI offers developers a full-featured command-line AI coding solution. This article covers everything from installation and configuration to hands-on demonstrations, providing a comprehensive breakdown of this tool's core capabilities.

1 Million Token Context: Why It Matters
Gemini CLI is powered by the Gemini 2.5 Pro model, inheriting its 1-million-token-plus context window capability. What does this number actually mean? It's roughly equivalent to 2–3 complete Flask-scale projects, or the entire codebase of a dozen common Python packages.
To understand the significance of this number, you first need to grasp the concepts of tokens and context windows. A token is the basic unit that large language models use to process text — one token corresponds to roughly 3/4 of an English word, or 1–2 Chinese characters. The context window refers to the maximum number of tokens a model can "see" and process simultaneously in a single conversation. Early GPT-3.5 only supported a 4K token context; GPT-4 expanded this to 128K; and Gemini 2.5 Pro's 1-million-token context means the model can process approximately 750,000 English words or tens of thousands of lines of code in a single interaction. This capability relies on Google's optimizations to the Attention Mechanism — in traditional Transformer architectures, the computational complexity of attention scales quadratically with sequence length. Google has dramatically reduced the computational overhead of long-sequence processing through techniques like Ring Attention and sparse attention.
For developers, the practical value of ultra-long context manifests in three key areas:
- Holistic architecture analysis — You can feed an entire project to the model at once for global understanding
- Cross-file code refactoring — No more explaining context relationships file by file
- Complex dependency mapping — The model can see all inter-module call chains simultaneously
In actual testing, after importing the complete codebase of the open-source AI agent framework SmallAgents into Gemini CLI, it accurately analyzed the project's main module responsibilities, data flow patterns, and design pattern usage. It even identified potential architectural issues and provided refactoring suggestions, including function complexity optimization and dependency relationship improvements.
Installation & Basic Configuration
Environment Setup
Before installing Gemini CLI, make sure your system has Node.js installed (V20 recommended). Simply download the appropriate installer for your operating system.
The installation command is straightforward — copy the official installation command provided by Gemini and run it in your terminal:
- Mac/Linux: Open Terminal and execute directly
- Windows: Open CMD and execute
During installation, you'll be prompted to choose a theme (the default dark theme works fine), then you'll need to log in with your Google account to complete authentication. After successful login, the terminal will display that it's using the Gemini 2.5 Pro model by default.
Essential Commands at a Glance
You can view all available operations with the help command. Here are several key commands worth remembering:
- MCP-related commands: Manage and invoke MCP Servers
- Memory commands: Set up and manage memory files
- Tool list commands: View all available tools
!prefix commands: Execute Shell commands, e.g.,!pwdto display the current path
In practice, it's recommended to launch Gemini CLI directly within the built-in terminal of VSCode or PyCharm, allowing you to seamlessly combine the IDE's file management and code editing capabilities.
MCP Server Extensions: Supercharging Gemini CLI
MCP Protocol: The "USB Port" for AI Tools
MCP (Model Context Protocol) is a standardized protocol open-sourced by Anthropic in late 2024, designed to provide large language models with a unified interface for calling external tools. MCP uses a client-server architecture: AI tools (such as Gemini CLI, Claude Code) act as MCP clients, while various external services (such as document retrieval, database queries, project management tools) run as MCP Servers. The two communicate via the JSON-RPC 2.0 protocol — MCP Servers expose a list of callable tools and parameter definitions to the client, and the client automatically selects the appropriate tool based on user intent and initiates the call. This design is analogous to what the USB protocol is to hardware devices — once the standard is established, any developer can write an MCP Server that conforms to the protocol, giving AI tools new capabilities without modifying the AI tool's own code.
How to Configure MCP Servers
MCP Server configuration is one of Gemini CLI's most differentiating features. By editing the configuration file, you can connect various external tools to Gemini CLI.
Configuration steps:
- Navigate to the Gemini CLI configuration path in your terminal
- Open the configuration file using the
nanocommand - Add the MCP Server JSON configuration to the file
In our hands-on testing, we configured two commonly used MCP Servers:
- Context7: Capable of fetching the latest documentation for the vast majority of open-source projects and libraries, effectively solving the problem of LLM training data lag. Since large language models have a knowledge cutoff date, they may still reference deprecated APIs for rapidly evolving open-source projects. Context7 ensures generated code is based on current API versions by retrieving the latest documentation in real time.
- Taskmaster: Capable of generating Product Requirements Documents (PRDs) and breaking them down into actionable subtasks, helping developers transform vague product ideas into structured development plans.
After configuration, type /mcp in Gemini CLI to view all configured MCP Servers and their supported tools.
Hands-On: Building an AI Agent Workflow with AutoGen
To validate the practical effectiveness of MCP Servers, we tested a complete development scenario — building an AI agent workflow using Microsoft's AutoGen framework.
AutoGen is a multi-agent conversation framework open-sourced by Microsoft Research. Its core philosophy is to accomplish complex tasks through collaborative dialogue between multiple AI agents. Unlike single-agent approaches, AutoGen allows developers to define multiple agents with different roles and capabilities. These agents can send messages to each other, review each other's outputs, and iteratively optimize results. AutoGen version 0.4 underwent a major architectural overhaul, introducing event-driven asynchronous communication mechanisms and more flexible agent orchestration patterns. This multi-agent collaboration model simulates the Code Review process in software engineering, improving code quality by introducing a "second pair of eyes."
After entering the prompt, Gemini CLI first used Context7 to search for AutoGen's latest documentation and new features, then wrote a workflow containing three agents based on the latest API:
- Code Generation Agent: Writes initial code based on requirements
- Code Review Agent: Reviews the generated code and provides improvement suggestions
- Code Integration Agent: Synthesizes the outputs of the first two agents to produce the final optimized code
During the test run, the three agents worked collaboratively: the first agent generated a Python function to find the Nth prime number, the second agent reviewed the code and proposed optimizations, and the third agent integrated all the information to output a more complete final version. The entire process required no manual intervention and ran successfully on the first attempt.
Memory Files: Making AI Follow Your Development Standards
Creating Project-Level Memory Files
Memory Files are another core feature of Gemini CLI, allowing developers to set persistent rules for a project that the AI follows across all subsequent interactions.
From a technical perspective, memory files are an implementation of System Prompt Engineering. In traditional LLM interactions, developers need to repeat their tech stack preferences, coding conventions, and other constraints at the beginning of every conversation. This not only wastes token quota but also leads to inconsistent outputs due to omissions. Memory files persist these constraints as project-level configuration, similar to what .editorconfig or .eslintrc is to code editors — they define the project's "meta-rules" that are automatically injected into the model's context with each interaction. Notably, this pattern is becoming the standard configuration paradigm for AI coding tools: the equivalent feature in Claude Code is the CLAUDE.md file, while Cursor uses .cursorrules files.
To create one, simply add a GEMINI.md file in your project root directory and define your development standards. A complete memory file typically includes:
- Tech stack constraints: e.g., Python 3.11, AutoGen 0.4, using venv virtual environments
- Environment configuration notes: Virtual environment creation, activation methods, dependency installation commands
- Coding standards & style: Naming conventions, commenting requirements, etc.
- Project structure definition: Directory organization
- Tool usage strategies: e.g., "Always use Context7 to search for the latest documentation," "All code examples should use Chinese comments"
Once set up, use /memory refresh to reload the memory file and /memory show to confirm it loaded successfully.
Validating the Results in Practice
With the memory file configured, entering a simple prompt like "Build me an AI agent that can create travel itineraries" caused Gemini CLI to automatically follow all the rules defined in the memory file: building with Python 3.11, adhering to project conventions, and organizing code according to the specified directory structure.
It first output a step-by-step development plan. After confirmation, it began creating project files and writing code. When runtime errors occurred, simply pasting the error messages back to Gemini CLI allowed it to quickly locate and fix the issues. After successful execution, entering "Create a 3-day travel plan for Nepal" produced a comprehensive travel itinerary from the agent, complete with daily schedules, budget estimates, transportation options, and attraction recommendations.
Taskmaster Integration: From Requirements to Task Breakdown
Beyond code development, Gemini CLI paired with the Taskmaster MCP Server can also handle project management tasks. In testing, entering "Develop a TodoList App for iOS, generate a PRD and break it down into 10 subtasks" prompted Gemini CLI to invoke Taskmaster and automatically:
- Generate a complete Product Requirements Document and save it to a file
- Break the PRD down into 10 specific development subtasks and save them
Developers can then use these broken-down subtasks to continue building the entire project step by step within Gemini CLI, creating a complete loop from requirements analysis to code implementation.
Summary & Outlook
The release of Gemini CLI marks a new phase for AI coding tools. The 1-million-token context window addresses the pain point of large-scale project analysis, the MCP Server extension mechanism provides unlimited capability expansion, and the memory file feature evolves AI coding from "random responses" to "standardized development."
The AI command-line coding tool market has now formed a three-way competition: Anthropic's Claude Code is renowned for its depth of code understanding and agentic coding capabilities, excelling in complex refactoring tasks; OpenAI's Codex CLI leverages the broad user base of the GPT model family, emphasizing seamless integration with the ChatGPT ecosystem; and Google's Gemini CLI differentiates itself with ultra-long context and free usage quotas. Beyond command-line tools, IDE-integrated tools like GitHub Copilot, Cursor, and Windsurf are also competing for the developer market, pushing the entire AI coding tool space into fierce competition in 2025.
For developers already using Claude Code or Codex CLI, Gemini CLI's biggest differentiator lies in its ultra-long context and flexible MCP ecosystem. For developers new to AI coding tools, Gemini CLI's free tier and relatively simple configuration process also lower the barrier to entry. As the MCP ecosystem continues to grow, Gemini CLI's practical development capabilities will only continue to strengthen.
Related articles
TutorialsCursor + Codex Dual-IDE Collaboration: A Practical Methodology for Open-Source Project Customization
A complete methodology for open-source project customization based on real-world experience, detailing the Cursor+Codex dual-IDE workflow, seven-stage process, MVP validation, and AI source code reading techniques.
TutorialsCursor Multi-Agent in Practice: Building a Full-Stack Next.js Blog in 50 Minutes
Build a full-stack blog in 50 minutes using Cursor IDE's multi-Agent mode with Next.js, Clerk auth, and Supabase. Learn the 4-phase AI Agent workflow and key integration pitfalls.
TutorialsBuilding an AI Software Factory from Scratch: A Cursor Engineer's Hands-On Experience with Multi-Agent Collaboration
Cursor engineer Eric shares practical insights on building an AI software factory: automation levels, guardrail design, parallel Agent management, and scaling to 1000+ Agents for 24/7 development.