OpenAI's Sora model can generate 60-second videos from text prompts
OpenAI has unveiled Sora, an innovative video-generation model that brings written text to life by creating realistic and imaginative scenes. This cutting-edge technology allows users to generate photorealistic video clips up to a minute long, based on their written prompts. Sora is capable of crafting "complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background," according to OpenAI's blog post.
Sora's capabilities and limitations
Sora's capabilities include generating videos from still images, filling in missing frames in existing videos, and even extending them. The model has a solid grasp of how objects "exist in the physical world" and can "accurately interpret props and generate compelling characters that express vibrant emotions." However, OpenAI acknowledges that Sora might face challenges in "accurately simulating the physics of a complex scene" and may not always properly interpret cause-and-effect situations.
Take a look at a demonstration
How does Sora differ from its rivals?
Instead of putting videos together, frame-by-frame like other models, Sora can generate entire clips at once. This ensures that the subjects in the video do not get changed, even when temporarily they go out of view. OpenAI has claimed it used both "publicly available videos" and licensed clips from copyright holders to train Sora.
Availability and competition
At the moment, Sora is exclusively available to "red teamers" who are evaluating the model for potential risks and harms. OpenAI is also granting access to designers, visual artists, and filmmakers for their valuable feedback. In the competitive text-to-video market, Sora is up against firms like Runway and Pika, as well as Google's Lumiere. Lumiere, similar to Sora, offers users text-to-video tools and the ability to create videos from still images.
Watermarking a possibility?
Earlier this February, OpenAI announced it had started adding watermarks to pictures generated via its text-to-image tool DALL-E 3. However, the watermarks can "easily be removed." Similar to other AI products in its arsenal, the firm will have to deal with the results of photorealistic yet fake AI videos being mistaken for real ones.