OpenAI, the company behind ChatGPT, has announced the release of a new video-generation software called Sora. In an announcement on Thursday, the AI company says this Sora can create realistic and imaginative scenes from text instructions.
Basically, Sora is a text-to-video model that allows users to create photorealistic videos up to a minute long, all based on prompts given to the tool.
Although companies like Runway and Pika have made remarkable progress with their text-to-video models, OpenAI’s main competitors in video generation with AI tools include Meta and Google’s Lumiere which gives users text-to-video tools and also lets them create videos from a still image, in a way similar to Sora’s.
Similar AI tools are available from other startups, such as Stability AI, which has a product called Stable Video Diffusion. Amazon has also released Create with Alexa, a model that specialises in generating prompt-based short-form animated children’s content.
More details on Sora
Per the statement released on Thursday, the new generative AI model, Sora works similarly to OpenAI’s image-generation AI tool, DALL-E.
Here is how it works: A user types out a desired scene and Sora will return a high-definition video clip. Also, Sora can generate video clips inspired by still images, and extend existing videos or fill in missing frames. The model understands how objects “exist in the physical world,” as well as “accurately interpret props and generate compelling characters that express vibrant emotions.”
Additionally, it is capable of creating “complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background,” per OpenAI blog post.
The Sora-generated demos attached to the announcement include an aerial scene of California during the gold rush, a video that looks like it was shot from inside a Tokyo train, and others.
Some of the demos have a look/touch of AI (like a suspiciously moving floor in a video of a museum) but OpenAI says the model “may struggle with accurately simulating the physics of a complex scene and may not properly interpret certain instances of cause and effect.
At the moment, Sora is only available to “red teamers” who are assessing the model for potential harms and risks. OpenAI is also offering access to some visual artists, designers, and filmmakers to get feedback.
Addressing safety issues
OpenAI says it will be taking several important safety steps ahead of making Sora available in OpenAI’s products.
“We are working with red teamers — domain experts in areas like misinformation, hateful content, and bias — who will be adversarially testing the model.” The company said.
Also, the company says it is building tools to help detect misleading content such as a detection classifier that can tell when a video was generated by Sora as well as leverage existing safety methods built for products using DALL·E 3.
“For example, once in an OpenAI product, our text classifier will check and reject text input prompts that are in violation of our usage policies, like those that request extreme violence, sexual content, hateful imagery, celebrity likeness, or the IP of others.”
Lastly, OpenAI says it will be engaging policymakers, educators and artists globally to understand their concerns and identify positive use cases for the new technology.