OpenAI's Sora Generates Photorealistic Videos

OpenAI released on Feb. 15 an impressive new text-to-video model called Sora that can create photorealistic or cartoony moving images from natural language text prompts. Sora isn’t available to the public yet; instead, OpenAI released Sora to red teamers — security researchers who mimic techniques used by threat actors — to assess possible harms or risks. OpenAI also offered Sora to selected designers and audio and visual artists to get feedback on how Sora can best be optimized for creative work.

OpenAI’s emphasis on safety around Sora is standard for generative AI nowadays, but it also shows the importance of precautions when it comes to AI that could be used to create convincing fake images, which could, for instance, damage an organization’s reputation.

What is Sora?

Sora is a generative AI diffusion model. Sora can generate multiple characters, complex backgrounds and realistic-looking movements in videos up to a minute long. It can create multiple shots within one video, keeping the characters and visual style consistent, allowing Sora to be an effective storytelling tool.

In the future, Sora could be used to generate videos to accompany content, to promote content or products on social media, or to illustrate points in presentations for businesses. While it shouldn’t replace the creative minds of professional video makers, Sora could be used to make some content more quickly and easily. While there’s no information on pricing yet, it’s possible OpenAI will eventually have an option to incorporate Sora into its ChatGPT Enterprise subscription.

“Media and entertainment will be the vertical industry that may be early adopters of models like these,” Gartner Analyst and Distinguished VP Arun Chandrasekaran Chandrasekaran told TechRepublic in an email. “Business functions such as marketing and design within technology companies and enterprises could also be early adopters.”

How do I access Sora?

Unless you have already received access from OpenAI as part of its red teaming or creative work beta testing, it’s not possible to access Sora now. OpenAI released Sora to selected visual artists, designers and filmmakers to learn how to optimize Sora for creative uses specifically. In addition, OpenAI has given access to red team researchers specializing in misinformation, hateful content and bias. Gartner Analyst and Distinguished VP Arun Chandrasekaran said OpenAI’s initial release of Sora is “a good approach and consistent with OpenAI’s practices on safe release of models.”

“Of course, this alone won’t be sufficient, and they need to put in practices to weed out bad actors getting access to these models or nefarious uses of it,” Chandrasekaran said.

How does Sora work?

Sora is a diffusion model, meaning it gradually refines a nonsense image into a comprehensible one based on the prompt, and uses a transformer architecture. The research OpenAI performed to create its DALL-E and GPT models — particularly the recapturing technique from DALL-E — were stepping stones to Sora’s creation.

SEE: AI engineers are in demand in the U.K. (TechRepublic)

Sora videos don’t always look completely realistic

Sora still has trouble telling left from right or following complex descriptions of events that happen over time such as prompts about a specific movement of the camera. Videos created with Sora are likely to be spotted through errors in cause-and-effect, OpenAI said, such as a person taking a bite out of a cookie but not leaving a bite mark.

For instance, interactions between characters may show blurring (especially around limbs) or uncertainty in terms of numbers (e.g., how many wolves are in the below video at any given time?).

What are OpenAI’s safety precautions around Sora?

With the right prompts and tweaking, the videos Sora makes can easily be mistaken for live-action videos. OpenAI is aware of possible defamation or misinformation problems arising from this technology. OpenAI plans to apply the same content filters to Sora as the company does to DALL-E 3 that prevent “extreme violence, sexual content, hateful imagery, celebrity likeness, or the IP of others,” according to OpenAI.

If Sora is released to the public, OpenAI plans to watermark content created with Sora with C2PA metadata; the metadata can be viewed by selecting the image and choosing the File Info or Properties menu options. People who create AI-generated images can still remove the metadata on purpose, or may do so accidentally. OpenAI does not currently have anything in place to prevent users of its image generator, DALL-E 3, from removing metadata.

“It is already [difficult] and increasingly will become impossible to detect AI-generated content by human beings,” Chandrasekaran said. “VCs are making investments in startups building deepfake detection tools, and they (deepfake detection tools) can be part of an enterprise’s armor. However, in the future, there is a need for public-private partnerships to identify, often at the point of creation, machine-generated content.”

What are competitors to Sora?

Sora’s photorealistic videos are quite distinct, but there are similar services. Runway provides ready-for-enterprise text-to-video AI generation. Fliki can create limited videos with voice synching for social media narration. Generative AI can now reliably add content to or edit videos taken in the conventional way as well.

On Feb. 8, Apple researchers revealed a paper about Keyframer, its proposed large language model that can create stylized, animated images.

TechRepublic has reached out to OpenAI for more information about Sora.