Apple working on contextual AI language model: How it works
What's the story
Apple's research team is currently developing an innovative artificial intelligence (AI) model, named ReALM (Reference Resolution As Language Model).
The model is designed to understand language in context and function entirely on-device, without needing significant computational power.
A recent research paper suggests that this new AI model could potentially enhance the capabilities of Siri, Apple's built-in voice assistant.
Pre-print phase
Research paper shared
The research paper detailing the workings of ReALM is now in the pre-print phase, and has been shared on an open-access online platform for academic papers.
The primary function of ReALM is to execute tasks based on contextual language cues that mimic human speech.
For example, it could interpret commands like "Take me to the one that's second from the bottom."
Task division
Task categories on smart devices
ReALM is designed to perform tasks on a smart gadget that are divided into three categories: on-screen entities, conversational entities, and background entities.
On-screen entities imply tasks that are visible on the device's screen. Conversational entities are tasks based on user commands.
Finally, background entities represent tasks happening in the background, like a song playing on an app.
Computational power
ReALM's efficiency and performance
Despite its complex task of understanding and executing actions based on contextual prompts, ReALM does not require high computational power.
The research paper states that this makes "ReaLM an ideal choice for a practical reference resolution system that can exist on-device without compromising on performance."
This efficiency is achieved by using fewer parameters in comparison to major LLMs such as GPT-3.5 and GPT-4.
Benchmark results
ReALM outperforms OpenAI's models in tests
The research paper asserts that ReALM outperformed OpenAI's GPT-3.5 and GPT-4 models in a controlled environment.
Specifically, it scored higher on text-only benchmarks in comparison to GPT-3.5, and surpassed GPT-4 in domain-specific user utterances.
However, these findings are preliminary as the research paper has not yet undergone peer review, leaving its validity yet to be confirmed.