Summarize

Apple working on contextual AI language model: How it works

By Dwaipayan Roy

Apr 02, 2024

05:11 pm

What's the story

Apple's research team is currently developing an innovative artificial intelligence (AI) model, named ReALM (Reference Resolution As Language Model).

The model is designed to understand language in context and function entirely on-device, without needing significant computational power.

A recent research paper suggests that this new AI model could potentially enhance the capabilities of Siri, Apple's built-in voice assistant.

Pre-print phase

Research paper shared

The research paper detailing the workings of ReALM is now in the pre-print phase, and has been shared on an open-access online platform for academic papers.

The primary function of ReALM is to execute tasks based on contextual language cues that mimic human speech.

For example, it could interpret commands like "Take me to the one that's second from the bottom."

Task division

Task categories on smart devices

ReALM is designed to perform tasks on a smart gadget that are divided into three categories: on-screen entities, conversational entities, and background entities.

On-screen entities imply tasks that are visible on the device's screen. Conversational entities are tasks based on user commands.

Finally, background entities represent tasks happening in the background, like a song playing on an app.

Computational power

ReALM's efficiency and performance

Despite its complex task of understanding and executing actions based on contextual prompts, ReALM does not require high computational power.

The research paper states that this makes "ReaLM an ideal choice for a practical reference resolution system that can exist on-device without compromising on performance."

This efficiency is achieved by using fewer parameters in comparison to major LLMs such as GPT-3.5 and GPT-4.

Benchmark results

ReALM outperforms OpenAI's models in tests

The research paper asserts that ReALM outperformed OpenAI's GPT-3.5 and GPT-4 models in a controlled environment.

Specifically, it scored higher on text-only benchmarks in comparison to GPT-3.5, and surpassed GPT-4 in domain-specific user utterances.

However, these findings are preliminary as the research paper has not yet undergone peer review, leaving its validity yet to be confirmed.