How OpenAI’s “Sora” Affects The Future of Filmmaking

Last month, the creators of ChatGPT, OpenAI, announced their next big project in artificial intelligence. Titled “Sora,” the program is a text-to-video generation model that is capable of outputting short videos from a basic one or two line prompt. The results, at least the ones OpenAI chose to present, are shockingly accurate if not outright disturbing. Upon entering the website, you are presented with a slideshow of nine incredibly high resolution videos, most of which being under thirty seconds in length. From a stylish Japanese woman walking down a busy street in Tokyo, to a group of woolly mammoths running towards the screen, to a man reading a book in the clouds. With the sheer amount of detail in each video, you’d expect the prompts to create them would have a higher word count than this article does right now. Instead, the prompts, which are all written out underneath each video, are merely a few sentences spelled out in plain English, and not a bunch of jargon only an AI would understand. For the woman walking through the busy Tokyo street, the prompt reads as follows: 

A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.”

OpenAI Sora
Image: OpenAI

If OpenAi are indeed being truthful, and the model is outputting this type of quality and accuracy off of just a simple prompt as shown above, then this is even more mind-blowing. As I scrolled through each video, watching them over and over trying to find inconsistencies, I couldn’t help but be awestruck. It was only a year ago that we had the pleasure of seeing this technology in its infancy through what I can only describe, as the kids call it, “nightmare fuel” of an artificially made video of Will Smith eating spaghetti. The video, uploaded by channel Robot Named Roy, clearly showed that though this technology was moving at a rapid pace, it was nowhere near ready to make “actual” videos that accurately depicted humans and our world. Now less than a year later and this software is generating content the likes of which I couldn’t believe. However, it didn’t take long for that slack-jawed bemusement and curiosity to turn quickly into fear. A fear for what the future may hold if this technology isn’t moderated with caution; particularly for film and television.

The fear that artificial intelligence will be taking actual, human jobs is a dystopian reality we are having to face today. OpenAI’s other software model, ChatGPT, has already disrupted numerous fields and gotten into classrooms wherein students are using it to write entire essays. David Holz’ Midjourney, an AI image generation tool, has put some smaller artists out of work as the software is able to create–by using tens of thousands of actual artist’s work as reference–genuinely beautiful pieces of art (well, at least most of the time) through just a simple prompt. Amazon is littered with “original” novels written entirely by AI, and some opportunistic individuals have gone out of the way to “write” entire biographies of celebrities, selling it on the platform as if they were official books penned or financed by said figures. Safeguards and protocols have been put in place, particularly when it comes to the education sector, but it seems that of all the areas we were expecting AI to be beneficial, it’s the artists who are suffering and continue to be most at risk.        

OpenAI Sora
Image: OpenAI

As a writer, I’ve been concerned about my future with this technology for a while now. Though as a filmmaker, I felt I could breathe easy, especially after seeing that haunting Will Smith video. But after seeing Sora, I’d be lying if I said my breath wasn’t a tad more stunted. One of the videos that’s shown on OpenAI’s website has the prompt,A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.” The resulting seventeen second video translates that prompt into something that genuinely looks as if it were made for a television commercial. The video goes from a dolly close-up to a swooping wide shot, to inserts of a man opening a latch, all with fairly coherent mise-en-scene. This doesn’t even mention the fact that the two men portrayed in the video look near lifelike and like actual actors. Sure, closer scrutiny can reveal some of their uncanny valley quirks, but if you didn’t tell me this video was generated by AI, I’d likely not have batted an eye or looked for said quirks. 

Using the tool to create entire short films or commercials, with actual “humans” shouldn’t be a concern, at least not yet. Where this technology can have a genuine impact, however, is with exposition material and b-roll. As Marques Brownlee points out in his video, some of these exposition shots of mountainous regions could easily be used in some commercial, with no discernible giveaway that the footage is constructed by AI. Which means the drone pilot that would have been previously needed to be hired, no longer does, and all the resources that would have needed to be spent to get shots like these, no longer do. He would go on to give a couple more examples, one including a retro shot of a pile of old television sets that looked like something out of Blade Runner, and another of an old-timey documentary style shot of California during the gold rush. Both of these shots would require props and prop masters, production and costume designers, directors of photography, and so much more. But now, a shot like this can be made with something like Sora, within minutes and via a simply written prompt. 

One could make the argument that this is great for filmmaking because it makes making films, an art-form that was at once inaccessible for aspiring artists due to its heavy cost, accessible in ways we never thought possible. But this isn’t just making the medium accessible. This isn’t going from having to buy a camera for tens of thousands of dollars, to now being able to shoot 4K footage off of your phone. This is outright taking away the creative aspects of actually making a film. It’s no better than someone touting themselves as a writer, only to have ChatGPT author their upcoming fantasy novel. Even though Sora is seemingly far away from directing entire Kubrick epics (though at this rate, I’m not even sure anymore), using the software to effectively write, direct, and “film” entire sequences for a narrative-driven, cinematic experience is a practice unbefitting anyone calling themselves a filmmaker. This isn’t to say the software cannot be used in some form or another during the creative process, it simply shouldn’t be the entire creative process.

Shaz Mohsin

Notify of
Inline Feedbacks
View all comments