Elon Musk's xAI launches Grok-1.5V with advanced image processing capabilities
Elon Musk's xAI, a competitor to OpenAI, has unveiled the first version of Grok capable of handling visual data. Named Grok-1.5V, this inaugural multimodal AI model can process not only text but also various forms of visual content such as "documents, diagrams, charts, screenshots, and photographs." The company showcased several practical applications of these new features, including converting flow charts into Python code, and interpreting internet memes.
A leap forward from Grok-1.5
Grok-1.5V is the successor to the Grok-1.5 model, which was designed to excel in coding and math tasks and process extended contexts for better comprehension of specific queries. The new version retains these capabilities while adding advanced image processing features. Although xAI has not provided a specific timeline for its deployment, it has indicated that both early adopters and current users will soon be able to access Grok-1.5V's features.
xAI also introduced RealWorldQA benchmark dataset
Alongside the launch of Grok-1.5V, xAI has also introduced a benchmark dataset called RealWorldQA. This dataset includes 700 images, each paired with questions and answers to test multimodal models like Grok. According to xAI, when evaluated using this dataset, its technology outperformed competitors such as OpenAI's GPT-4V and Google's Gemini Pro 1.5, demonstrating the superior capabilities of its new AI model.