Intelligence Is Getting More Expensive: A Deep Dive into Google I/O 2026

Google I/O 2026 reveals that AI intelligence is getting more expensive, not cheaper.
A deep analysis of Google I/O 2026 reveals key industry shifts: Gemini 3.5 Flash costs 3x more than its predecessor, RL training environments have become a hidden battleground among labs, managed agents and sandboxes are the new infrastructure focus, and the "intelligence is too cheap to meter" narrative is collapsing. Open-source models are splitting into three tiers, and frontier labs are diverging in strategy.
As Google I/O 2026 just wrapped up, two veteran AI content creators — Mohammad and Sam Witteveen — sat down on-site for an in-depth conversation. Their discussion spanned everything from model pricing to the open-source ecosystem, product trends to RL training environments, revealing the profound shifts underway in the AI industry.
The Product Era Has Arrived: Models Are No Longer the Sole Star
Sam was the first to plant a flag in the conversation, making a key assertion: products are becoming more important than models. While Google unveiled new models like Gemini 3.5 Flash at this year's I/O, what was even more striking was the product ecosystem built around those models — the rapid-fire launch of Anti-gravity 2.0, Gemini Spark personal agent, Ask YouTube, and more signals that the AI industry has officially entered a new era where applications reign supreme.
Gemini Spark, Google's take on a personal agent, is designed as a long-running intelligent assistant — similar to a system with cron jobs that can automatically retrieve web information for users every day. Google's core advantage lies in its access to massive contextual data: users' emails, calendars, YouTube watch history, and more. With the introduction of the MCP (Model Context Protocol), Spark can also tap into third-party ecosystems, no longer limited to Google's own services.
Here's a notable detail: Sam offered a remarkably forward-looking insight — MCP installation data will become a key signal for hyperscale cloud providers deciding which startups to acquire. If large numbers of users are installing a particular app's MCP, that company is likely to become an acquisition target.
The "Intelligence Is Too Cheap" Narrative Is Falling Apart
"Intelligence is too cheap to meter" — this once-popular industry narrative is being shattered by reality. Gemini 3.5 Flash is priced roughly 3x higher than its predecessor (for both input and output), while consuming about 5x as many tokens on the same benchmarks.

Sam highlighted a crucial distinction: the cost of tokens and the cost of intelligence are two different things. Looking back to when GPT-5.0 first launched, OpenAI heavily promoted its lower prices, but the model actually used 3x the token volume — making it far from cheap in aggregate. Throughout the iterations from 5.1 to 5.5, OpenAI has been working hard to shorten chain-of-thought length while maintaining answer quality.
In retrospect, Anthropic's pricing strategy may have been the most correct — pricing high from the start avoided the user backlash that comes with later price increases. And if the only way to boost model intelligence is to keep extending the chain of thought, the entire industry faces a serious cost-control challenge.
RL Environments: The "Hidden Battleground" of AI Training
One of the most revealing parts of the conversation was the discussion around RL (reinforcement learning) training environments. Sam disclosed that RL environments have become the "big secret in the room" behind every AI lab — labs are privately negotiating deals to acquire RL environments, and startups dedicated to selling RL environments have already emerged.

What does this mean in practice? Take Excel or Google Sheets as an example: if you can build a perfect spreadsheet RL environment, generate 10,000 perfect Excel files, and then derive all the steps needed to reach those results, that's incredibly valuable training data. If you can also put it into a continuous improvement loop, the results are even better.
A striking data point underscores the power of RL: Gemini 3.5 Flash scored only about 8% on the Arc AGI benchmark with minimal thinking budget, but when switched to a high reasoning-token budget, the score skyrocketed to roughly 90%. The same model, just by changing the reasoning token budget, can unlock dramatically different performance levels.
Managed Agents and Sandboxes: The New Infrastructure Focus
Both guests agreed that sandboxes will be a key focus going forward. As Anthropic and Google have successively launched Managed Agent services, the industry is splitting into two camps:
- Managed Agent model: Similar to Google Cloud Functions, where the platform provides sandboxing, observability, model access, and other all-in-one services
- Agent SDK model: Enterprises retain control over sandboxing and observability, using only the SDK for development

A noteworthy insider detail: ADK (Agent Development Kit) was originally a Google Cloud project, not a DeepMind one. The newly released agent-related products are clearly more influenced by DeepMind, and this internal power shift is worth watching.
Jerry Liu of Llama Index even stated in a VentureBeat interview that "the era of agent frameworks is over" — today, any coding agent can write a decent code framework. The real value lies in hosting, sandboxing, observability, and fault tolerance.
The Three-Tier Landscape of Open-Source Models
Mohammad proposed a clear tiered framework for open-source models:
- Small models: Targeting individual users, capable of running locally on phones and similar devices
- Mid-size models (e.g., 32B): Suited for small businesses and specialized workflows
- Large models (trillion-parameter scale): Nominally "open source" but requiring enterprise-grade hardware — effectively built for large organizations
Sam added that Gemma 4's 2B model is already beating the previous generation's 27B model, thanks to clever tricks for small models such as "per-layer embeddings." These techniques allow models to run smoothly on Android phones by distributing different embeddings across different regions of RAM.
For institutions like banks and hospitals that require on-premises deployment, running an 8×H200 node to serve 30–100 concurrent users is entirely feasible, making open-source models irreplaceable in privacy-sensitive scenarios.
The Competitive Landscape Among Frontier Labs

When asked "who's winning the AI race," Sam gave a thought-provoking answer: humans are winning. He recalled the days of 2017–2018, building language models with LSTMs, when models could only generate about 50 steps of coherent text. Today's capabilities far exceed what was imaginable back then.
But at the lab level, the landscape is diverging:
- Google: Large enough to consistently push the Pareto frontier on multiple fronts simultaneously
- Anthropic: More focused on breakthroughs at the top end of intelligence
- OpenAI: Strategy has been inconsistent, though GPT-5.5 did show genuine progress on intelligence
One consensus emerged: the era of simply "making models bigger" is over, but the trend of "training with more tokens" continues. The more critical shift is a fundamental change in the ratio of pre-training to post-training tokens — post-training (including RL, alignment training, etc.) now accounts for a significantly larger share.
Sam's closing summary was precise and profound: what we're dealing with is a kind of "jagged intelligence" — a model might be superhuman genius in one domain but worse than a five-year-old in another. This means that rather than chasing benchmark scores, it's better to evaluate on your actual use cases and find the model and configuration that truly fits.
Related articles

Codex AI Coding Agent Explained: What's the Real Difference from ChatGPT?
Deep dive into OpenAI's Codex coding agent, comparing Codex vs ChatGPT in programming scenarios and how AI agents are reshaping software development.

Databricks Open-Sources Omni: A Meta-Framework for Unified Management of All AI Agents
Databricks open-sources Omni under Apache 2.0 — a meta-framework unifying Claude Code, Codex & more AI Agents with shared sessions, cross-vendor review & enforced security policies.

Generating 10 Web Games with One-Line Prompts: A Hands-On Claude Code Experience
A senior developer uses Claude Code to generate 10 playable web games including 2048, Gomoku, and Tetris with one-line prompts in under an hour. A deep dive into AI programming's real capabilities.