OpenAI's Sora model can generate 60-second videos from text prompts

By Dwaipayan Roy

Feb 16, 2024

09:35 am

What's the story

OpenAI has unveiled Sora, an innovative video-generation model that brings written text to life by creating realistic and imaginative scenes. This cutting-edge technology allows users to generate photorealistic video clips up to a minute long, based on their written prompts. Sora is capable of crafting "complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background," according to OpenAI's blog post.

Specs

Sora's capabilities and limitations

Sora's capabilities include generating videos from still images, filling in missing frames in existing videos, and even extending them. The model has a solid grasp of how objects "exist in the physical world" and can "accurately interpret props and generate compelling characters that express vibrant emotions." However, OpenAI acknowledges that Sora might face challenges in "accurately simulating the physics of a complex scene" and may not always properly interpret cause-and-effect situations.

Twitter Post

Take a look at a demonstration

Introducing Sora, our text-to-video model.

Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W

Prompt: “Beautiful, snowy… pic.twitter.com/ruTEWn87vf
— OpenAI (@OpenAI) February 15, 2024

Differences

How does Sora differ from its rivals?

Instead of putting videos together, frame-by-frame like other models, Sora can generate entire clips at once. This ensures that the subjects in the video do not get changed, even when temporarily they go out of view. OpenAI has claimed it used both "publicly available videos" and licensed clips from copyright holders to train Sora.

Rollout

Availability and competition

At the moment, Sora is exclusively available to "red teamers" who are evaluating the model for potential risks and harms. OpenAI is also granting access to designers, visual artists, and filmmakers for their valuable feedback. In the competitive text-to-video market, Sora is up against firms like Runway and Pika, as well as Google's Lumiere. Lumiere, similar to Sora, offers users text-to-video tools and the ability to create videos from still images.

Steps

Watermarking a possibility?

Earlier this February, OpenAI announced it had started adding watermarks to pictures generated via its text-to-image tool DALL-E 3. However, the watermarks can "easily be removed." Similar to other AI products in its arsenal, the firm will have to deal with the results of photorealistic yet fake AI videos being mistaken for real ones.