OpenAI releases new model to advance AI agent development
What's the story
OpenAI has unveiled a research preview of Operator, an AI agent that can perform web-based tasks.
The technology behind Operator is Computer-Using Agent (CUA), a model that combines GPT-4o's vision capabilities with advanced reasoning through reinforcement learning.
CUA is designed to interact with graphical user interfaces (GUIs)—the buttons, menus, and text fields we see on a screen—in a human-like manner.
This allows it to perform digital tasks without relying on operating system (OS) or web-specific APIs.
Technological breakthrough
CUA's advanced capabilities and performance
CUA marks a major leap in AI, being able to comprehend and interact with GUIs just like humans.
It can decompose tasks into multi-step plans and adaptively self-correct when things get difficult.
This tech has already set new benchmark results, achieving 38.1% success on OSWorld for full computer use tasks and 58.1% on WebArena and 87% on WebVoyager for web-based tasks.
Functionality and security
CUA's operational process and safety measures
CUA works by analyzing raw pixel data to comprehend what is happening on the screen, employing a virtual mouse and keyboard to perform actions.
It can follow multi-step tasks, deal with errors, and adjust to unforeseen changes.
The model works in an iterative loop, combining perception, reasoning, and action.
OpenAI has also focused on safety in CUA's development to mitigate risks of an AI agent entering the digital world.
Benchmark results
CUA's performance evaluation and future improvements
CUA has established a new benchmark in both computer use and browser use benchmarks, utilizing the same universal interface of screen, mouse, and keyboard.
It registered a 58.1% success rate on WebArena and an 87% success rate on WebVoyager for web-based tasks.
However, despite its high success rate on simpler tasks like those in WebVoyager, CUA still requires more improvements to match human performance on more complex benchmarks like WebArena.