Google's most cost-effective AI model—Gemini 2.5 Flash—now available in preview
What's the story
Google recently introduced its new artificial intelligence (AI) model, Gemini 2.5 Flash. The launch came just weeks after the debut of Gemini 2.5 Pro.
The new model is now available in preview mode through the Gemini API, AI Studio, and Vertex AI platforms.
According to Google, this version is a major upgrade over its predecessor, Flash 2.0, with improved reasoning capabilities without compromising on speed.
Model advancements
Enhanced reasoning capabilities
The Gemini 2.5 Flash model has a knowledge cutoff of January 2025 and can handle text, images, video, and audio prompts. It also comes with a one-million-token context window.
As Google said, reasoning models like this one take more time to interpret the queries before responding. This leads to more thorough outputs that ideally match user needs better than earlier speed-focused models.
Developer flexibility
Cost-effective and customizable reasoning
Google has dubbed Gemini 2.5 Flash as its most cost-efficient thinking model.
Developers are charged $0.15 per million input tokens. Output pricing varies significantly: $0.60 per million tokens with reasoning disabled, and up to $3.50 per million tokens when reasoning is enabled.
The company says this model lets developers configure its reasoning capabilities for the best performance, basically balancing cost and complexity.
If developers don't set a budget, the model decides the complexity of queries on its own.
Test results
Performance on industry benchmarks
In a recent Humanity's Last Exam (HLE) test, Gemini 2.5 Flash scored 12%. This score beat rival models like Claude 3.7 Sonnet and DeepSeek R1 but missed OpenAI's newly launched o4-mini, which scored 14%.
The HLE test was introduced as an alternative benchmark to industry tests that have become too easy for rapidly evolving models like Gemini 2.5 Flash.