Summarize

Microsoft's new AI model creates hyper-realistic video using static image

By Akash Pandey

Apr 18, 2024

05:45 pm

What's the story

Microsoft has launched VASA-1, an advanced artificial intelligence (AI) model capable of generating hyper-realistic videos of talking human faces using just a single photo and an audio clip. The resulting output showcases synchronized lip movements that align with the audio, complemented by natural-looking facial expressions and head movements. Despite its potential applications, Microsoft clarified that it doesn't plan to release a product or API with the VASA-1 model, but will use it for creating virtual interactive characters.

AI capabilities

VASA-1's capabilities and features explored

Microsoft's VASA-1, still under development, is capable of generating 512x512p resolution videos at up to 40fps with minimal starting latency. The tech giant shared these insights on its research announcement page. A video demonstrating the AI model was shared by X user Kaio Ken. The image-to-video service can produce high-quality videos up to one minute long from a single static image.

Twitter Post

Here's how VASA-1 works

a single 4090
that's insane https://t.co/A73HrMewyP pic.twitter.com/fHjb2y1hQD
— Kaio Ken (@kaiokendev1) April 17, 2024

Flexibility

User control and self-learning capabilities

VASA-1 offers users granular control over various aspects of the video, including main eye gaze direction, emotion offsets, head distance, and more. These controls allow modification of the output closely according to their directions. Interestingly, this AI model can also generate videos using singing audio, artistic photos, and non-English speech. Microsoft researchers noted that these functionalities were not present in its data initially, indicating a self-learning capability within the model.

Ethics

Addressing concerns and potential applications

Despite the impressive capabilities of VASA-1, concerns about potential misuse, such as creating deepfakes, have been raised. Microsoft has assured that it does not intend to release the AI model to the public and plans to use it for creating virtual interactive characters. The company also highlighted the potential of this technique in advancing forgery detection.