Google Gemini AI's staged demo raises questions about its capabilities

By Rishabh Raj

Dec 08, 2023

02:54 pm

What's the story

Google's latest and most advanced AI model, called Gemini, is facing criticism for allegedly misrepresenting the system's real-time capabilities in a video demo. The six-minute "what the quack" video, aired during Google's announcement, displayed Gemini's multimodal abilities. It illustrated quick image recognition capabilities and impressively responded to spoken prompts in real time, impressing one and all. However, Bloomberg's Parmy Olson in her op-ed claims that the demo video wasn't entirely authentic.

Details

What is Gemini capable of?

The demo shows Gemini can spot pictures super quick, explain what the picture is about, and answer related questions. It can track a paper ball in a cup-and-ball game. It can even guess if a jumping cat will make it by just looking at a video paused mid-way through the jump. Freakish, right? The video also showcases Gemini's critical thinking capabilities in deciding which way a duck should go—toward a friendly duck or an angry grizzly bear.

Insights

Discrepancies in Gemini's demo

However, a disclaimer buried within the video's description on YouTube mentions, "For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity." Olson found this upsetting. According to her piece in Bloomberg, Google admitted that the video demo wasn't live with spoken prompts but rather utilized still image frames from raw footage, supplemented by text prompts for Gemini's responses. "That's quite different from what Google seemed to be suggesting," Olson writes.

Insights

Google has a history of questionable video demos

Olson highlights Google's track record with questionable demo videos, citing previous doubts about the credibility of the demo video of Duplex AI. Duplex helps you book reservations or set up appointments without actually talking to anyone. Similar concerns arose regarding AI models presented in prerecorded videos, like Baidu's Ernie Bot launch, which faced market backlash due to edited videos. As a result, Olson suggests that Google is "showboating" to mislead people about Gemini's abilities compared to OpenAI's GPT.

Insights

Google defends demo, shares details on video creation

Oriol Vinyals, VP of Research at Google's DeepMind and co-lead for Gemini, explains how the team made the video. "All the user prompts and outputs in the video are real, shortened for brevity," Vinyals said in his post. "The video illustrates what the multimode user experiences built with Gemini could look like. We made it to inspire developers." Vinyals added that the team prompted Gemini with images and texts, asking it to predict the next steps.

What Next?

Inspiring developers via edited videos? Not Google's best move

But why so much fuss even if the video is edited? Well, the critics say that if Google actually wanted to "inspire developers," it wouldn't happen through carefully edited videos that might misrepresent the AI's abilities. The Verge in one of its articles suggests Google should let journalists and developers get hands-on with the product, and let them do stuff with Gemini to see how good it really is.