
Google's AI Mode can now 'see' images and answer questions
What's the story
Google has upgraded its search-focused AI Mode chatbot with multimodal capabilities, allowing it to "see" and answer questions about images.
The company also announced plans to expand access to this advanced feature for millions more users.
The new capability combines a custom version of Gemini AI with Google's Lens image recognition tech to let you upload/take a picture and get detailed information about its contents.
Feature details
AI Mode's multimodal capabilities explained
Robby Stein, VP of product for Google Search, elaborated on the new feature.
He said that "AI Mode builds on our years of work on visual search and takes it a step further."
With Gemini's multimodal capabilities, Stein said that AI Mode can understand the whole scene in an image.
This includes understanding how objects relate to each other as well as their unique materials, colors, shapes, and arrangements.
Technical approach
Google's AI Mode uses fan-out technique
Google's latest update uses a "fan-out technique" to issue multiple queries about the image it sees and any objects in it.
This way, the AI Mode can offer responses that are extremely nuanced and contextually relevant.
For example, if an image shows books, the AI can recognize them and recommend similar titles with good ratings.
Competition
AI Mode: Google's response to Perplexity and ChatGPT
AI Mode for Search is Google's response to Perplexity and ChatGPT Search. It offers a chatbot-like experience, answering queries with AI-generated summaries from Google's search index.
The feature was first launched for Google One AI Premium subscribers last month, but is now being rolled out to millions more Labs users in the US, not just paying subscribers.