Summarize

Elon Musk's xAI is enhancing Grok chatbot with multimodal inputs

By Mudit Dube

May 22, 2024

12:02 pm

What's the story

Elon Musk's AI company, xAI, is progressing in its efforts to upgrade the Grok chatbot with multimodal inputs.

Recent public developer documents reveal that users could soon upload photos to Grok and receive text-based responses.

The company first hinted at this feature last month, stating that Grok-1.5V will incorporate "multimodal models in a number of domains."

The latest update indicates significant strides toward deploying a new model.

Tech update

Developer documents reveal Grok's enhanced capabilities

The updated developer documents include a sample Python script demonstrating how to use the xAI software development kit library.

This script allows developers to generate responses based on both text and images. It reads an image file, sets up a text prompt, and uses the xAI SDK to create a response.

This advancement is part of Musk's ongoing efforts to improve Grok's capabilities and user experience.

AI evolution

Grok's journey and training process detailed

Grok was first launched by xAI in November 2023, available to users subscribing to the X Premium Plus package.

The most recent update before this was Grok 1.5 in March, which introduced enhanced reasoning capabilities.

According to an xAI blog post, the model is trained "on a variety of text data from publicly available sources from the Internet up to Q3 2023 and data sets reviewed and curated by ... human reviewers."

AI competition

xAI's Grok closing gap with competitors

Founded by Musk in March 2023, xAI is still relatively new in the AI industry.

Despite initially lagging behind competitors like OpenAI's ChatGPT, Grok 1.5 model is reportedly closing the gap with GPT-4 on various benchmarks spanning grade school to high school competition problems.

However, it was noted that benchmarks for large language models often face criticism as these models can perform well on benchmarks if those benchmarks are included in their training data.