The actor, filmmaker and studio owner is raising the alarm about the impact of the tech, saying, "I feel like everybody in the industry is running a hundred miles an hour to try and catch up, to try and put in guardrails."
Tyler Perry Puts $800M Studio Expansion On Hold After Seeing OpenAI’s Sora: “Jobs Are Going to Be Lost”::Tyler Perry is raising the alarm about the impact of OpenAI's Sora on Hollywood.
Sora can sometimes do 1 minute clips that mostly look ok as long as you don't pay too close attention. We are incredibly far away from coherent, feature-length narratives and even those aren't likely to be thematically interesting or engaging.
Yep. I watched their demo clips, and the "good" ones are full of errors, have lots of thematically incoherent content, and - this is the biggie - can't be fixed.
Say you're a 3D animator and build an animation with thousands of different assets and individual, alterable elements. Your editor comes to you and says, "This furry guy over here is looking in the wrong direction, he should be looking at the kangaroo king over there, but it looks like he's just glaring at his own hand."
So you just fix it. You go in, tweak the furry guy's animation, and now he's looking in the right direction.
Now say you made that animation with Sora. You have no manipulatable assets, just a set of generated frames that made the furry guy look in the wrong direction.
So you fire up Sora and try to fine-tune its instructions, and it generates a completely new animation that shares none of the elements of the previous one, and has all sorts of new, similarly unfixable errors.
If I use an AI assistant while coding, I can correct its coding errors. But you can't just "correct" frames of video it has created. If you try, you're looking at painstakingly hand-painting every frame where there's an error. You'll spend more time trying to fix an AI-generated animation that's 90% good and 10% wrong than you will just doing the animation with 3D assets from scratch.
Now say you made that animation with Sora. You have no manipulatable assets, just a set of generated frames that made the furry guy look in the wrong direction.
"Sora, regenerate $Scene153 with $Character looking at $OtherCharacter. Same Style."
Or "Sora, regenerate $Scene153 from time mark X to time mark Y with $Character looking at $OtherCharcter. Same Style".
It's a new model, you won't work with frames anymore you'll work with scenes and when the tools get a bit smarter you'll be working with scene layers.
"Sora, regenerate $Scene153 with $Character in Layer1 looking at $OtherCharacter in Layer2. Same Style, both layers."
I give it 36 months or less before that's the norm.
This seems like a fundamental misunderstanding of how generative AI works. To accomplish what you're describing you'd need:
An instance of generative AI running for each asset.
An enclosing instance of generative AI running for each scene.
A means for each AI instance to discard its own model and recreate exactly the same asset, tweaked in precisely the manner requested, but immediately being able to reincorporate the model for subsequent generation.
A coordinating AI instance to keep it all working together, performing actions such as mediating asset collisions.
The whole system would need to be able to rewind to specific trouble spots, correct them, and still generate everything that comes after unchanged. We're talking orders of magnitude more complexity and difficulty.
And in the meantime, artists creating 3D assets the regular way would suddenly look a lot less expensive and a lot less difficult.
If all you have is a hammer, everything looks like a nail. Right now, generative AI is everyone's really attractive hammer. But I don't see it working here in 36 months. Or 48. Or even 60.
The first 90% is easy. The last 10% is really fucking hard.
And ironically when we do get to the point where an AI can string together a semi-coherent narrative, the first things it'll start to produce will probably be exactly the sort of mid-level dross that Tyler Perry likes to make.
This won’t get used for key narrative content. This will be used to a lot of b-roll and the quick cuts that audiences don’t examine closely. A lot of a movie is content like that, and since the dawn of the effects industry, editors and effects artists have known that they can get away with janky stuff in certain places. The audience won’t know it’s there because they’re not watching the film frame by frame.
It seems pretty good with backgrounds though, and it’s only going to get better. I think the threats of job losses are a lot more imminent than people are ready to admit.