Summarize

Meta's first open-source AI model, Llama 3.2, goes official

By Dwaipayan Roy

Sep 25, 2024

11:21 pm

What's the story

Meta has introduced its first-ever open-source artificial intelligence (AI) model, Llama 3.2, capable of processing both text and images. This new development comes just two months after the company's previous significant AI model release. The innovative features of Llama 3.2 are expected to allow developers to build more sophisticated AI applications. These include augmented reality apps with real-time video understanding capabilities, visual search engines that categorize images based on content, and document analysis tools that condense lengthy text passages.

User accessibility

Llama 3.2's user-friendly interface and multimodal capabilities

Meta has designed Llama 3.2 to be user-friendly for developers, requiring minimal setup. Ahmad Al-Dahle, Vice President of Generative AI at Meta, explained to The Verge that developers only need to incorporate this "new multimodality and be able to show Llama images and have it communicate." This feature sets Meta on par with other AI developers like OpenAI and Google ,who launched their multimodal models last year.

Technical specifications

Vision support and hardware compatibility

The inclusion of vision support in Llama 3.2 is a strategic move by Meta, as it continues to enhance its AI capabilities on devices like Ray-Ban Meta glasses. The model comprises two vision models with 11 billion parameters and 90 billion parameters, respectively, along with two lightweight text-only models with one billion parameters and three billion parameters each. These smaller models are engineered to function on Qualcomm, MediaTek, and other Arm hardware, indicating Meta's intention for application in mobile devices.

Model comparison

Llama 3.1's text generation capabilities

Despite the introduction of Llama 3.2, there remains a role for its predecessor, Llama 3.1, which was launched in July. The older model includes a version with 405 billion parameters, theoretically giving it superior text generation capabilities compared to the new release. This suggests that while Llama 3.2 offers advanced image and text processing features, its predecessor may still hold an edge in certain areas of AI application such as text generation tasks.