Microsoft's Phi-3-Vision compact AI model can analyze images and graphs
Microsoft has unveiled Phi-3-Vision, an advanced version of its small language model, Phi-3. This multimodal model is capable of processing both text and images, making it ideal for mobile device applications. Unlike other image-focused AI models such as OpenAI's DALL-E or Stability AI's Stable Diffusion, Phi-3-Vision does not generate images but analyzes and describes them.
Phi-3-Vision: A compact powerhouse for visual reasoning
Phi-3-Vision, a 4.2 billion parameter model, is designed to perform general visual reasoning tasks such as interpreting charts or images. Despite its smaller size compared to other AI models, it is claimed to offer efficient image understanding and analysis. This model is part of the Phi-3 family which also includes Phi-3-mini, Phi-3-small, and Phi-3-medium with 3.8 billion, 7 billion, and 14 billion parameters, respectively.
The rise of small, efficient AI models
The development of smaller, lightweight AI models like Phi-3 is driven by the growing demand for cost-effective and less compute-intensive AI services. These compact models can power AI features on devices such as smartphones and laptops without overloading computer memory. Microsoft has previously launched other small models including Phi-2 and Orca-Math, a math problem-solving model that reportedly outperforms larger counterparts like Google's Gemini Pro.