AI Daily: IBM Open-Sources Granite 4.0 Hybrid Architecture, Google Launches Jules Command-Line Tool

On October 3, 2025, the AI field saw a wave of major updates released simultaneously. From IBM's new hybrid architecture open-source model to Google's expanding coding toolchain, from Ant Group's vision-language model breakthrough to OpenAI's valuation soaring to $500 billion, this article covers the most noteworthy technical developments of the day.

IBM Releases Granite 4.0: Hybrid Architecture Dramatically Reduces Inference Costs

IBM officially released the Granite 4.0 series of open-source models, marking a major architectural upgrade for the series. The new version adopts a novel hybrid architecture design, with the core goal of dramatically reducing memory usage and inference costs while maintaining model performance—precisely the most critical pain point in enterprise-grade deployment.

In the large language model domain, hybrid architecture typically refers to a design paradigm that combines Transformer's standard attention mechanism with other sequence modeling methods (such as State Space Models/SSM, linear attention, etc.). Pure Transformer architectures see inference costs grow quadratically with sequence length, creating enormous memory pressure in long-context and high-concurrency scenarios. Meanwhile, linear-complexity models like Mamba and RWKV are inference-efficient but still lag behind on complex reasoning tasks. Hybrid architectures attempt to combine the best of both worlds: using standard attention in layers that require global attention, and linear or recurrent structures in the remaining layers, thereby achieving a balance between performance and efficiency. IBM's previous Granite series had already established a reputation in enterprise compliance and code generation, and this architectural upgrade continues its product positioning toward real-world enterprise deployment needs.

The Granite 4.0 series includes multiple parameter sizes, with the enterprise-grade flagship model performing impressively across multiple benchmarks. Interestingly, the model now supports Chinese and is already available on multiple platforms including Watson, Dunk AI, and Hugging Face, where developers can access and use it directly.

For enterprise users, the cost advantages brought by the hybrid architecture mean serving more concurrent requests on the same hardware—delivering significant economic value in large-scale deployment scenarios.

Ant Group Open-Sources Ming UniVision: Unifying Visual Understanding and Generation

Ant Group's Inclusion AI team open-sourced the vision-language model Ming UniVision, a notably innovative piece of work. This model is the first to unify image understanding, generation, and editing within a single continuous latent space through an autoregressive approach.

Ming UniVision Model Technical Architecture Diagram

To understand the significance of this breakthrough, one needs to appreciate the long-standing "separate governance of understanding and generation" dilemma in the vision-language model field: models like CLIP and LLaVA excel at image understanding and Q&A, while diffusion models like Stable Diffusion and DALL-E focus on image generation—the two types of tasks typically require independent models and training pipelines. In recent years, works like Meta's Chameleon and ByteDance's SEED-X have begun exploring unified frameworks, but most adopt discrete tokenization schemes. Ming UniVision's "continuous latent space autoregressive" approach encodes images as continuous vector sequences rather than discrete tokens, then processes them uniformly with an autoregressive model. This avoids information loss from discretization and naturally integrates editing tasks (which require local modifications to existing images) into the same framework.

Traditional approaches typically require designing independent modules for different visual tasks, while Ming UniVision's unified architecture not only simplifies model design but also improves training convergence speed by 3.5x—an acceleration that results from the unified representation eliminating gradient conflicts between multiple tasks. The model code and weights are openly available on platforms like Hugging Face for researchers and developers to use freely.

This advancement demonstrates that Chinese AI teams are making substantive breakthroughs in the direction of multimodal unified modeling.

Google Advances on Multiple Fronts: Jules Tools and Gemini 2.5 Flash Update

Jules Tools: AI Coding Assistant Goes Command-Line

Google launched the official command-line tool Jules Tools for its AI coding assistant Jules. Developers can now perform the following operations directly from the terminal:

Create coding tasks
Pull code patches
Embed Jules into CI/CD and other automation workflows

CI/CD (Continuous Integration/Continuous Delivery) is a core practice in modern software engineering, referring to pipelines that automatically trigger builds, tests, and deployments after code commits. GitHub Actions, Jenkins, and GitLab CI are mainstream implementation tools. Embedding an AI coding assistant into CI/CD workflows means AI can play a role in code review, automated fixes, test generation, and other stages—not just providing completion suggestions in a developer's local IDE. Jules Tools providing a command-line interface (CLI) is a key step toward achieving this integration—CLI tools can be called by any script or automation system, something browser interfaces cannot do. Similar approaches have appeared in products like GitHub Copilot CLI and Cursor, but Google's positioning of Jules as a programmable pipeline component demonstrates its clear focus on enterprise DevOps scenarios.

This means developers no longer need to rely on a browser interface to use Jules. The introduction of the command line truly integrates the AI coding assistant into developers' daily workflows. For engineers accustomed to terminal operations, this is an extremely practical update.

Gemini 2.5 Flash Image Officially Released

Google simultaneously announced that Gemini 2.5 Flash Image has exited preview and is now officially available. The new version brings several key improvements:

10 new specifiable output image aspect ratios added at once
Support for image-only response mode
Pricing remains unchanged

Gemini 2.5 Flash Image New Aspect Ratio Options

Developers can try it immediately for free in Google AI Studio. Support for multiple output ratios is highly practical for application developers who need to adapt to different devices and scenarios.

Google Finance Mobile Launches AI Insights Feature

Google Finance rolled out a new mobile version for Labs users, allowing them to track market dynamics in real-time on mobile and receive AI-driven market insights.

Google Finance Mobile AI Insights Interface

This is another example of Google permeating AI capabilities into vertical application scenarios. AI-powered interpretation of financial information has the potential to help ordinary investors understand market changes more quickly.

OpenAI Updates: $500 Billion Valuation and Japanese Government Partnership

OpenAI has been making frequent moves recently. On the business front, the company completed a $6.6 billion secondary market share transfer, pushing its valuation to $500 billion and officially becoming the world's highest-valued startup.

Notably, a secondary market share transfer (Secondary Market Transaction) refers to existing shareholders (employees, early investors) selling their unlisted equity to new investors—the company itself does not directly receive funding, but such transactions establish new pricing references that determine the company's market valuation. This figure surpasses SpaceX and sends a clear signal to the market: investors are willing to price AI infrastructure companies at multiples approaching those of large publicly traded tech companies, and capital enthusiasm for the AI sector remains at historic highs.

OpenAI Reaches Strategic Partnership with Japan's Digital Agency

On the government partnership front, OpenAI reached a strategic collaboration with Japan's Digital Agency, providing AI tools to Japanese government employees. This continues OpenAI's "government as customer" strategic approach—similar partnerships with the US, UK, and other governments have preceded this. Government contracts typically mean stable long-term revenue and higher data security compliance requirements, marking an acceleration in AI tool adoption within government administration.

Additionally, OpenAI published a post on its website denouncing Elon Musk's continued use of lawsuits and other means to obstruct its development, with the public confrontation between the two parties ongoing.

Other Notable Developments

Perplexity Comet Browser Opens Globally: AI search company Perplexity announced that its AI browser Comet is now freely available for download to users worldwide, with no invitation code required. AI-native browsers are a new competitive space that emerged in the second half of 2024, with the core concept of deeply integrating large language models into the browser kernel rather than attaching them as plugins or sidebars. Perplexity's Comet extends its AI search capabilities throughout the entire browsing experience—users can access AI summaries, source citations, and conversational Q&A on any webpage. Competition in this space is fundamentally a battle for the "user information consumption entry point": traditional search engines control traffic through search boxes, while AI browsers attempt to intervene in users' information acquisition process at an even earlier stage. Comet's removal of invitation codes and free global availability is a classic growth strategy of trading scale for data and optimizing models with data. Competition in the AI-native browser space is heating up.

Notebook LM Launches Personalization Features: Google's Notebook LM added personalization settings to chat conversations, allowing users to customize conversation style and adjust response length, currently offering three modes: default, study guide, and custom. This update makes the AI note-taking tool better suited to individual usage habits.

OpenRouter Free Inference Ending Soon: OpenRouter announced that free inference service for the Grox Fast model will end at 9:30 AM Pacific Time on October 3. Developers who need it should take advantage while they can.

Summary

From today's developments, several clear trends emerge: open-source models continue to push architectural innovation (IBM Granite 4.0, Ming UniVision), AI tools are deeply integrating into developer workflows (Jules Tools), and AI commercialization and government adoption are accelerating simultaneously (OpenAI's Japan partnership, Google Finance AI integration). For developers and practitioners, following these changes helps grasp the direction of technological evolution.

Key Takeaways

IBM released the Granite 4.0 series of open-source models with hybrid architecture that dramatically reduces memory usage and inference costs, now supporting Chinese
Ant Group open-sourced Ming UniVision, the first vision-language model to unify image understanding, generation, and editing in a single continuous latent space, with 3.5x faster training convergence
Google released Jules Tools CLI, integrating AI coding assistants into terminal workflows; Gemini 2.5 Flash Image officially launched with 10 new output aspect ratios
OpenAI completed a $6.6 billion secondary market share transfer reaching a $500 billion valuation, while partnering with Japan's Digital Agency on AI tools
Perplexity's AI browser Comet is now freely available globally; Notebook LM adds personalized conversation features