Deep Dive into Google I/O 2025's Three Major Android Developer Productivity Updates

Google I/O 2025 unveils Android CLI, Skills, and Bench updates to power AI-driven Android development.
Google I/O 2025 introduced three major Android developer productivity updates: Android CLI reaching stable with IDE-level capabilities for AI Agents, Android Skills expanding to cover Adaptive UI, XR, and Perfetto SQL domains, and Android Bench adding open-source and latest commercial model evaluations. Together, these form an open, model-agnostic ecosystem for Agentic Development on Android.
Google I/O 2025 has wrapped up, but the major updates around Android developer productivity deserve a thorough analysis. As the era of AI-driven development (Agentic Development) accelerates, Google released key updates around three pillars—Android CLI, Android Skills, and Android Bench—with a clear objective: enabling developers to efficiently build Android apps using any AI tool, Agent, or LLM.
Agentic Development is one of the most significant paradigm shifts in software engineering during 2024-2025. Unlike traditional AI code completion (such as GitHub Copilot's single/multi-line suggestions), Agentic Development emphasizes AI Agents that can autonomously plan and execute multi-step development tasks—from understanding requirements, designing solutions, and writing code to debugging and testing, with the entire workflow semi-autonomously completed by the Agent. Representative products of this paradigm include Devin, Cursor Agent Mode, and Claude Code. Google's moves here are specifically designed to ensure the Android ecosystem isn't marginalized by this wave.
Android CLI Stable Release: Giving AI Agents Access to IDE Capabilities
The Android CLI (command-line tool) has been announced as reaching stable status—one of the most practically impactful announcements at this year's I/O. Traditionally, Android Studio, as an IntelliJ IDEA-based IDE, has its code analysis capabilities (such as semantic understanding, reference finding, and type inference) encapsulated within the GUI, inaccessible to external tools programmatically. The new Android CLI exposes capabilities at a level similar to the Language Server Protocol (LSP), allowing AI Agents to obtain a project's semantic information as if calling an API—a major architectural breakthrough.
The new version brings several key capabilities:
-
Programmatic Version Lookup: Developers can automatically query SDK, toolchain, and other version information through scripts, greatly simplifying version management in CI/CD pipelines. Version management has always been a headache in Android continuous integration—Android projects depend on numerous component versions, including Gradle version, Android Gradle Plugin version, Kotlin version, Compose Compiler version, and various AndroidX library versions, all with complex compatibility matrices. Programmatic version lookup allows CI scripts to automatically detect the current environment's SDK versions, confirm compatibility, and avoid build failures due to version mismatches—especially important in large teams and monorepo architectures.
-
Journeys Support: The CLI now natively supports the Journeys feature, making automated testing and development workflows smoother. Journeys is a natural language-based end-to-end UI testing approach introduced in Android Studio—developers describe user action paths in natural language (e.g., "Open the app, tap the login button, enter username and password, verify navigation to the home page"), and the system automatically converts them into executable UI tests. Compared to traditional Espresso or UI Automator tests, Journeys significantly lowers the barrier to writing tests and offers greater robustness against UI changes. CLI support for Journeys means these tests can run in headless environments, perfectly fitting CI pipelines.
-
Deep Integration with Android Studio: This is the most noteworthy highlight—any AI Agent can access Android Studio's powerful capabilities through the Android CLI, including IDE-level functions like analyzing files, finding references, and locating declarations.

What does this mean? Simply put, AI Agents are no longer "blind" code generators—they can now understand project structure, trace code reference relationships, and analyze file dependencies like a real developer. This opening of IDE capabilities is a critical step by Google in the Agentic Development direction.
Additionally, Google Anti-Gravity (Google's AI development platform) now officially supports Android development, providing complete integration of Android CLI and Skills through the Android Resources Bundle. This gives developers a unified entry point for accessing Android AI development resources.
Android Skills Continues to Expand: Bridging the Knowledge Gap Between LLMs and Android Development
Android Skills are "specialized knowledge packages" custom-built by Google for LLMs, injecting specialized workflows and domain knowledge into large language models to help AI better understand and handle common yet complex Android development scenarios.

In this update, Android Skills coverage has expanded significantly with the following new areas:
-
Adaptive UI: Helps LLMs understand how to build UIs that adapt to different screen sizes and device form factors. With the diversification of Android device form factors—from phones, tablets, and foldables to car displays, TVs, and Wear OS watches—developers need to build interfaces that gracefully adapt to different screen sizes, orientations, and input methods. Google recommends using Window Size Classes to categorize screens into Compact, Medium, and Expanded tiers, combined with Jetpack Compose's adaptive layout components. The complexity here lies in simultaneously considering changes in layout, navigation patterns, and interaction paradigms—a high-difficulty scenario where LLMs easily make mistakes, making dedicated Skills support particularly necessary.
-
Glimmer for XR (Display Glasses Development): Development skills for XR devices, reflecting Google's investment in spatial computing. Glimmer is Google's development framework for the Android XR platform (including the Project Moohan headset developed with Samsung and future AR glasses). XR development fundamentally differs from traditional mobile development: it requires handling 3D spatial layouts, gesture tracking, eye-tracking interaction, spatial audio, and other entirely new dimensions. Glimmer extends spatial UI capabilities on top of Jetpack Compose, allowing Android developers to leverage existing skills to enter the XR space. The addition of this Skill indicates Google is preparing at the developer tools level for the coming spatial computing wave.
-
Perfetto SQL: Specialized skills in performance analysis, enabling AI to assist developers with deep performance tuning. Perfetto is Google's open-source system-level tracing tool that captures low-level system events like CPU scheduling, memory allocation, GPU rendering, and Binder calls. Perfetto SQL is its query language, allowing developers to perform structured queries and analysis on massive trace data using SQL-like syntax—for example, querying "frame rendering events exceeding 16ms on the main thread" to identify jank causes. This domain is highly specialized, and typical LLMs have virtually no relevant knowledge. The addition of Android Skills enables AI to assist in writing Perfetto SQL queries and interpreting performance data, significantly lowering the barrier to performance optimization.
-
App Functions: Coverage of more general app development scenarios

The value of these Skills lies in bridging the knowledge gap between general-purpose LLMs and specialized Android development. While general-purpose large models have powerful coding abilities, they often have blind spots regarding Android-specific API usage, best practices, and architectural patterns. Android Skills inject structured domain knowledge, enabling any LLM to become a more competent Android development assistant. Developers can directly invoke and experience these Skills through the Android CLI.
Android Bench Adds New Model Evaluations: Establishing AI Capability Standards
Android Bench is Google's LLM evaluation leaderboard specifically designed to test various models against real Android development challenges, with the core goal of driving continuous improvement of models in Android development scenarios.
AI model evaluation benchmarks play a critical role in driving technological progress. Similar to how HumanEval evaluates general programming ability and SWE-bench evaluates real software engineering tasks, Android Bench focuses on Android-specific scenario evaluation. Its test cases come from real Android development challenges, including but not limited to: correctly using Android lifecycle APIs, handling configuration changes, implementing Material Design specifications, writing Compose UI, and handling permission requests. Vertical domain benchmarks like this expose the shortcomings of general models in specific domains and provide clear optimization directions for model training.
In this update, Android Bench responded to community requests with two important changes:
Addition of Open-Source Model Evaluations
The developer community has long requested that Google evaluate open-source model performance. This update officially includes more commonly used open-source models, including Google's own Gemma 4. Gemma is Google's open-source large language model series, built on the same research and technology as Gemini but released with smaller parameter sizes and open weights. Gemma 4, as the latest version, shows significant improvements in code generation and understanding. For enterprise developers, the value of open-source models lies in: local deployment for code privacy protection, fine-tuning on internal codebases, and independence from external API availability and pricing. This is an important reference for developers focused on the open-source ecosystem who want to deploy AI development assistants locally.
Inclusion of Latest Commercial Models
The leaderboard has been simultaneously updated with evaluation results for the latest commercial models, including Gemini 3.5 Flash and others. Developers can intuitively compare different models' actual performance on Android development tasks through the leaderboard, enabling more informed tool selection.

The significance of Android Bench goes beyond providing a ranking—it establishes an AI capability standard for the Android development domain. As more models are included in the evaluation, model providers will have stronger motivation to optimize for Android scenarios, ultimately benefiting the entire developer community.
Summary: The Android Development Ecosystem in the Agentic Development Era
Looking at these three announcements together, Google's strategic intent is crystal clear: building an open, model-agnostic Android AI development ecosystem.
- Android CLI provides the infrastructure layer, allowing any Agent to tap into Android Studio's capabilities
- Android Skills provides the knowledge layer, enabling any LLM to possess Android expertise
- Android Bench provides the evaluation layer, driving continuous evolution of the entire ecosystem
Google explicitly stated in the announcement: "In the Agentic Development era, we will continue to help you build Android apps using any AI tool, Agent, and LLM." This open stance means that whether you're using Gemini, Claude, GPT, or open-source models, Google is working to ensure a consistent, high-quality Android development experience.
This strategy stands in stark contrast to Apple—Apple tends to deeply bind AI capabilities within its own Xcode and Swift ecosystem, while Google has chosen a platform-neutral approach, opening Android development capabilities as infrastructure to all AI tools. This differentiated strategy may attract more third-party AI tool vendors to prioritize Android development scenarios, creating a positive feedback loop.
For Android developers, now is the best time to embrace AI-assisted development. We recommend that developers familiarize themselves with the Android CLI's new capabilities as soon as possible, try integrating it into existing development workflows, and follow the Android Bench leaderboard to select the AI model best suited to their scenarios.
Related articles

Building Cloud Computing Clusters from Old Phones: Google and UCSD Explore a New Path to Sustainable Computing
Google and UCSD explore building cloud clusters from old phones, leveraging ARM chip efficiency to cut e-waste and data center carbon footprints.

Jeff Dean's Commencement Speech at UW Allen School: A Message to the Next Generation of Engineers in the AI Era
Jeff Dean delivers commencement speech at UW Allen School of Computer Science & Engineering, sharing insights with the next generation of CS graduates in the AI era.

Codex VS Claude Code: The Token Economics Behind a 10x Price Gap
Same coding task: Codex costs $15, Claude Code costs $155. Deep dive into the real reasons behind the 10x gap — it's not pricing, it's token volume, output style, and context strategy.