Google's AI Mode can now 'see' images and answer questions

By Mudit Dube

Apr 08, 2025

11:05 am

What's the story

Google has upgraded its search-focused AI Mode chatbot with multimodal capabilities, allowing it to "see" and answer questions about images.

The company also announced plans to expand access to this advanced feature for millions more users.

The new capability combines a custom version of Gemini AI with Google's Lens image recognition tech to let you upload/take a picture and get detailed information about its contents.

Feature details

AI Mode's multimodal capabilities explained

Robby Stein, VP of product for Google Search, elaborated on the new feature.

He said that "AI Mode builds on our years of work on visual search and takes it a step further."

With Gemini's multimodal capabilities, Stein said that AI Mode can understand the whole scene in an image.

This includes understanding how objects relate to each other as well as their unique materials, colors, shapes, and arrangements.

Technical approach

Google's AI Mode uses fan-out technique

Google's latest update uses a "fan-out technique" to issue multiple queries about the image it sees and any objects in it.

This way, the AI Mode can offer responses that are extremely nuanced and contextually relevant.

For example, if an image shows books, the AI can recognize them and recommend similar titles with good ratings.

Competition

AI Mode: Google's response to Perplexity and ChatGPT

AI Mode for Search is Google's response to Perplexity and ChatGPT Search. It offers a chatbot-like experience, answering queries with AI-generated summaries from Google's search index.

The feature was first launched for Google One AI Premium subscribers last month, but is now being rolled out to millions more Labs users in the US, not just paying subscribers.