Google Hides Gemini's Chain of Thought: Why This AI Transparency Rollback Is Sparking Controversy

Event Overview

Google recently made a controversial interface change to Gemini's web version—defaulting to hide the model's thinking traces. Users must now click a three-dot menu to view a thinking summary, which has been criticized as overly condensed and virtually useless.

Twitter user feedback screenshot

This change has triggered strong backlash from AI practitioners and power users. Some users bluntly stated that this makes Gemini "unsuitable for any serious work requiring accuracy."

Why Chain of Thought Is Critical to AI Trustworthiness

Verifiability Is the Foundation of User Trust in AI

In current AI application practice, Chain of Thought (CoT) display has become a key basis for users to judge model output quality. CoT was formally proposed by Jason Wei and colleagues at Google Research in 2022. The core idea is to have large language models output intermediate reasoning steps before giving a final answer, similar to a human's "scratch paper" process when solving problems. Subsequent research found that CoT not only improves model accuracy on tasks like math and logical reasoning but also provides users with an audit window. Since 2024, with the rise of "reasoning models" like OpenAI's o1/o3 series and DeepSeek-R1, chain of thought has evolved from a prompting technique into a core capability at the model architecture level, and its visibility has become a critical product design decision.

When an AI model displays its reasoning process, users can:

Verify reasoning logic: Determine whether the model is analyzing the problem along the correct line of thinking
Identify hallucination risks: Detect whether the model is "fabricating" information rather than basing answers on actual retrieval
Trace information sources: Confirm whether the model performed web searches and whether search results were correctly cited

The "hallucination" mentioned here is one of the core challenges in the large language model field. AI hallucination refers to models generating content that appears plausible but is actually incorrect or entirely fabricated—from inventing non-existent academic paper citations, to fabricating historical event details, to generating incorrect code logic. This problem stems from the model's generation mechanism: it is essentially performing probabilistic next-token prediction rather than retrieving facts from a reliable database. As of 2025, although various technical approaches have significantly reduced hallucination rates, the problem remains fundamentally unsolved. This is precisely why chain-of-thought visibility is so important—users can inspect the reasoning process to identify whether the model is answering based on retrieved real information or "confidently fabricating."

Hiding this information essentially asks users to "blindly trust" model output, which runs counter to the industry trend toward AI safety and responsible use.

Gemini vs. ChatGPT and Claude on Transparency

You may not have noticed, but OpenAI's ChatGPT and Anthropic's Claude have both been strengthening thinking process transparency in recent updates. OpenAI's o1 model, released in September 2024, pioneered the "reasoning model" category, characterized by extended internal thinking before answering. For safety and commercial reasons, OpenAI doesn't display the raw complete chain of thought but provides a processed "summary of thinking." This design sparked ongoing community discussion about transparency, but by 2025, OpenAI gradually increased the detail level of summaries and allowed developers to access more complete reasoning traces via API. Anthropic's Claude adopted a relatively more open strategy, with its extended thinking feature displaying fairly detailed reasoning processes to users by default.

Google's Gemini previously displayed thinking processes, so this hiding action is seen as a clear regression in transparency that contradicts the industry mainstream direction.

Impact of Hiding Chain of Thought on Real Workflows

Core Pain Points for Professional Users

For professional users who rely on AI as a productivity tool, this change has substantive impact:

Unable to determine whether web searches were performed: Current mainstream AI products generally integrate RAG (Retrieval-Augmented Generation) technical architecture. The working principle is that the model retrieves relevant information from external knowledge sources (such as web searches, document databases) before generating an answer, then generates responses based on retrieval results. When the chain of thought is visible, users can clearly see whether the model triggered a search, what keywords were searched, and what sources were retrieved. After hiding the chain of thought, users cannot distinguish between "answers based on training data" and "answers based on real-time search," which creates serious information reliability issues in rapidly changing fields (such as tech news, financial data, policy regulations).
Unable to verify result reliability: When the model claims to give an answer "based on search results," users cannot confirm whether these searches actually occurred
Debugging difficulty: When output results are incorrect, users cannot trace back which step of the reasoning went wrong

These problems are particularly acute in work scenarios requiring high accuracy, such as research, writing, and data analysis. An AI output that cannot be verified has significantly diminished practical value.

Thinking Summary Feature Is Essentially Useless

Even when users find the thinking summary feature hidden in the three-dot menu, the information provided is described as "so extremely condensed as to be unusable." This means Google not only increased the access barrier but also drastically reduced the information volume, delivering a double blow to user experience.

Google's Possible Motivations for Hiding Chain of Thought

Why did Google make this decision? Several considerations may be at play:

Interface simplicity: For mass-market users, lengthy thinking processes may be seen as "noise" that affects the user experience
Computational cost: Displaying the full chain of thought requires additional token output, and hiding it may be a cost optimization. In large language model operations, tokens are the basic unit of computational cost measurement—each token corresponds to approximately 4 English characters or 1-2 Chinese characters. When a model generates a chain of thought, the number of output tokens can be 3-10 times that of the final answer. For GPT-4-level models, the cost per million output tokens ranges from several to tens of dollars, with reasoning models costing even more. For a platform like Google that processes hundreds of millions of queries daily, even a few hundred extra tokens per request accumulates to astronomical computational resource consumption. However, it should be noted that the chain-of-thought generation process still occurs internally within the model (it's what ensures reasoning quality)—what's hidden is only the portion displayed to users, so the actual savings may primarily be in frontend rendering and network transmission costs.
Competitive strategy: Avoiding exposure of weaknesses or inconsistencies in the model's reasoning

However, regardless of the reason, the correct approach should be to provide adjustable levels of detail rather than a blanket hide. A simple "Show detailed reasoning process" toggle could satisfy both casual and professional users simultaneously.

Conclusion: AI Transparency Should Not Regress

AI model transparency is not an optional feature—it is foundational infrastructure for building user trust. At a time when the large model "hallucination" problem remains unsolved, chain-of-thought display is an important tool for user self-protection. Google's choice to regress in this direction may accelerate professional users' migration to competing products.

For users who rely on AI for serious work, it is recommended to maintain heightened vigilance toward Gemini's output at this stage, or consider using alternatives that provide complete reasoning processes.