Technology @lemmy.world catculation @lemmy.zip 9 mo. ago

OpenAI introduces Sora, its text-to-video AI model

www.theverge.com OpenAI introduces Sora, its text-to-video AI model

It can create videos up to one minute long.

https://openai.com/sora

Archive https://archive.is/V8Fv3

142

You're viewing a single thread.

142 comments

This is still so bizarre to me. I've worked on 3D rendering engines trying to create realistic lighting and even the most advanced 3D games are pretty artificial. And now all of a sudden this stuff is just BAM super realistic. Not just that, but as a game designer you could create an entire game by writing text and some logic.
- In my experience as a game designer, the code that LLMs spit out is pretty shit. It won't even compile half the time, and when it does, it won't do what you want without significant changes.
  
  The correct usage of LLMs in coding imo is for a single use case at a time, building up to what you need from scratch. It requires skill both in talking to AI for it to give you what you want, knowing how to build up to it, reading the code it spits out so that you know when it goes south and the skill of actually knowing how to build the bigger picture software from little pieces but if you are an intermediate dev who is stuck on something it is a great help.
  
  That or for rubber ducky debugging, it s also great in that
  
  That sounds like more effort than just... writing the code.
  
  It s situationally useful
  
  Chatgpt once insisted my JSON was actually YAML
  
  Technically it is, but I agree that is imprecise and nobody would say so IRL. Unless they are being a pedantic nerd, like I am right now.
  
  I just want to dump some thoughts that might be interesting to someone:
  
  I wasn't really thinking of normal code, more like "story beats" - like that dog on the ledge. Output a language more like a "move here" / "activate this" language for characters and environments, simple stream of output token language that can't "syntax error".
  
  A combination of e.g.dwarf fortress that generates story beats based on a simulation and outputs mostly text that is turned into visual elements. But that just as an example, basically any game / sim engine can be reduced to outputing text story beats. Then LLM turns the synthetic "A [...] appears and wants to [...]" statement into something of a story and then into graphics. Basically like a giant lever for storytelling. Something like that is going to be the new minecraft.
  
  And it's ALSO going to make streamers more redundant. Instead of some centralized dude talking in your ear you'd have a commentator a bit like in Portal or Bastion. Or you have a party of people / friends that comment or interact with the world.
  
  Or e.g. a sim game based on the world of Pride and Prejudice where social interaction has important meaning and you manage your estate and dynasty and try to marry off your sons and daughters. All "filled out" with AI generated content based on relatively simple text output. Or more like a settler game.
  
  Basically you could overcome the problem with "samieness" in procedurally generated games (e.g. Elite Dangerous) and fill it up with interesting things. But you would need some kind of basic narrative language that an LLM can generate without syntax errors and can be turned into logic facing the player. Like generating puzzles like in Myst.
  
  Like your simulation spits out "you arrive at the temple of Ninkompub. The entrance is locked." and the game engine generates visuals and puzzles and enemies or some weird guy demanding you answer 3 questions. You'd need to adapt and train special AI models and need training data but I can see it being possible in the near future.
  
  Basically a game engine / game development system with a live "game master" to torture the player and try to keep him in a punishment simulation for as long as possible. And whenever the player wins or stops playing the AI is punished itself and mutated so it learns constantly and evolves under "theatrical selection". So when AI finally becomes sentient and evolves into an AGI it already knows exactly what to do :D
  
  It could also allow for easier multiplayer, because the story lines and game progress can be generated "on the fly" without both players having to be on the same linear / narrative path.
  
  Now combine that with VR and something like an AI girlfriend in your party that is designed to manipulate the player and create emotional bonds like a tamagotchy and you have something really disruptive.
  
  @nucleative@lemmy.world, @HeavyDogFeet@lemmy.world, @Traister101@lemmy.today, @FatCrab@lemmy.one, @mods_are_assholes@lemmy.world, @NounsAndWords@lemmy.world, @sudoreboot@slrpnk.net
  
  You should refine your thoughts more instead of dumping a stream of consciousness on people.
  
  Essentially what this stream of consciousness boils down to is "Wouldn't it be neat if AI generated all the content in the game you are playing on the fly?" Would it be neat? I guess so but I find that incredibly unappealing very similar to how AI art, stories and now video is unappealing. There's no creativity involved. There's no meaning to any of it. Sentient AI could probably have creativity but what people like you who get overly excited about this stuff don't seem to understand is how fundamentally limited our AI actually is currently. LLMs are basically one of the most advanced AI things rn and yet all it does is predict text. It has no knowledge, no capacity for learning. It's very advanced auto correct.
  
  We've seen this kind of hype with Crypto with NFTs and with Metaverse bullshit. You should take a step back and understand what we currently have and how incredibly far away what has you excited actually is.
  
  I don't mean to be dismissive of your entire train of thought (I can't follow a lot of it, probably because I'm not a dev and not familiar with a lot of the concepts you're talking about) but all the things you've described that I can understand would require these tools to be a fuckload better, on an order we haven't even begun to get close to yet, in order to not be super predictable.
  
  It's all wonderful in theory, but we're not even close to what would be needed to even half-ass this stuff.
- Keep in mind that this isn't creating 3d Billy volumes at all. While immensely impressive, the thing being created by this architecture is a series of 2d frames.
- Because it's trained on videos of the real world, not on 3d renderings.
  
  Lol you don't know how cruel that is. For decades programmers have devoted their passion to creating hyperrealistic games and 3D graphics in general, and now poof it's here like with a magic wand and people say "yeah well you should have made your 3D engine look like the real world, not to look like shit" :D
- Welcome to the club my friend... Expert after expert is having this experience as AI develops in the past couple years and we discover that the job can be automated way more than we thought.
  
  First it was the customer service chat agents. Then it was the writers. Then it was the programmers. Then it was the graphic design artists. Now it's the animators.
  
  Another programmer here. The bottleneck in most jobs isn't in getting boilerplate out, which is where AI excels, it's in that first and/or last 10-20%, alongside dictating what patterns are suitable for your problem, what proprietary tooling you'll need to use, what API's you're hitting and what has changed in recent weeks/months.
  
  What AI is achieving is impressive, but as someone that works in AI, I think that we're seeing a two-fold problem: we're seeing a limit of what these models can accomplish with their training data, and we're seeing employers hedge their bets on weaker output with AI over specialist workers.
  
  The former is a great problem, because this tooling could be adjusted to make workers lives far easier/faster, in the same way that many tools have done so already. The latter is a huge problem, as in many skilled worker industries we've seen waves of layoffs, and years of enshitification resulting in poorer products.
  
  The latter is also where I think we'll see a huge change in culture. IMO, we'll see existing companies bet it all and die from supporting AI over people, and a new wave of companies focus on putting output of a certain standard to take on larger companies.
  
  This is a really balanced take, thank you
  
  Writer here, absolutely not having this experience. Generative AI tools are bad at writing, but people generally have a pretty low bar for what they think is good enough.
  
  These things are great if you care about tech demos and not quality of output. If you actually need the end result to be good though, you’re gonna be waiting a while.
  
  If you actually need the end result to be good though, you’re gonna be waiting a while.
  
  I agree with everything you said, but it seems in the context of AI development "a while" is like, a few years.
  
  That remains to be seen. We have yet to see one of these things actually get good at anything, so we don’t know how hard that last part is to do. I don’t think we can assume there will be continuous linear progress. Maybe it’ll take one year, maybe it’ll take 10, maybe it’ll just never reach that point.
  
  Yeah a real problem here is how you get an AI which doesn't understand what it is doing to create something complete and still coherent. These clips are cool and all, and so are the tiny essays put out by LLMs, but what you see is literally all you are getting; there are no thoughts, ideas or abstract concepts underlying any of it. There is no meaning or narrative to be found which connects one scene or paragraph to another. It's a puzzle laid out by an idiot following generic instructions.
  
  That which created the woman walking down that street doesn't know what either of those things are, and so it can simply not use those concepts to create a coherent narrative. That job still falls onto the human instructing the AI, and nothing suggests that we are anywhere close to replacing that human glue.
  
  Current AI can not conceptualise -- much less realise -- ideas, and so they can not be creative or create art by any sensible definition. That isn't to say that what is produced using AI can't be posed as, mistaken for, or used to make art. I'd like to see more of that last part and less of the former two, personally.
  
  Current AI can not conceptualise – much less realise – ideas, and so they can not be creative or create art by any sensible definition.
  
  I kinda 100% agree with you on the art part since it can't understand what it's doing... On the other hand, I could swear that if you look at some generated AI imagines it's kind of mocking us. It's a reflection of our society in a weird mirror. Like a completely mad or autistic artist that is creating interesting imagery but has no clue what it means. Of course that exists only in my perception.
  
  But it the sense of "inventive" or "imaginative" or "fertile" I find AI images absolutely creative. As such it's telling us something about the nature of creative process, about the "limits" of human creativity - which is in itself art.
  
  When you sit there thinking up or refining prompts you're basically outsourcing the imaginative visualizing part of your brain. An "AI artist" might not be able draw well or even have the imagination, but he might have a purpose or meaning that he's trying to visualize with the help of AI. So AI generation is at least some portion of the artistic or creative process but not all of it.
  
  Imagine we could have a brain computer interface that lets us perceive virtual reality like with some extra pair of eyes. It could scan our thoughts and allows us to "write text" with our brain, and then immediately feeds back a visual AI generated stream that we "see". You'd be a kind of creative superman. Seeing / imagining things in their head is of course what many people do their whole life but not in that quantity or breadth. You'd hear a joke and you would not just imagine it, you'd see it visualized in many different ways. Or you'd hear a tragedy and...
  
  Like a completely mad or autistic artist that is creating interesting imagery but has no clue what it means.
  
  Autists usually have no trouble understanding the world around them. Many are just unable to interface with it the way people normally do.
  
  It’s a reflection of our society in a weird mirror.
  
  Well yes, it's trained on human output. Cultural biases and shortcomings in our species will be reflected in what such an AI spits out.
  
  When you sit there thinking up or refining prompts you’re basically outsourcing the imaginative visualizing part of your brain. [...] So AI generation is at least some portion of the artistic or creative process but not all of it.
  
  We use a lot of devices in our daily lives, whether for creative purposes or practical. Every such device is an extension of ourselves; some supplement our intellectual shortcomings, others physical. That doesn't make the devices capable of doing any of the things we do. We just don't attribute actions or agency to our tools the way we do to living things. Current AI possess no more agency than a keyboard does, and since we don't consider our keyboards to be capable of authoring an essay, I don't think one can reasonably say that current AI is, either.
  
  A keyboard doesn't understand the content of our essay, it's just there to translate physical action into digital signals representing keypresses; likewise, an LLM doesn't understand the content of our essay, it's just translating a small body of text into a statistically related (often larger) body of text. An LLM can't create a story any more than our keyboard can create characters on a screen.
  
  Only once/if ever we observe AI behaviour indicative of agency can we start to use words like "creative" in describing its behaviour. For now (and I suspect for quite some time into the future), all we have is sophisticated statistical random content generators.
  
  Still waiting on the programmer part. In a nutshell AI being say 90% perfect means you have 90% working code IE 10% broken code. Images and video (but not sound) is way easier cause human eyes kinda just suck. Couple of the videos they've released pass even at a pretty long glance. You only notice funny businesses once you look closer.
  
  I can't imagine that digital artists/animators have reason to worry. At the upper end, animated movies will simply get flashier, eating up all the productivity gains. In live action, more effects will be pure CGI. At the bottom end, we may see productions hiring VFX artists, just as naturally as they hire makeup artists now.
  
  When something becomes cheaper, people buy more of it, until their demand is satisfied. With food, we are well past that point. I don't think we are anywhere near that point with visual effects.
  
  It seems to me that AI won't completely replace jobs (but will do in 10-20 years). But will reduce demand because oversaturation + ultraproductivity with AI. Moreover, AI will continue to improve. A work of a team of 30 people will be done with just 3 people.
  
  Yeah. And it's not just how good the images look it's also the creativity. Everyone tries to downplay this but I've read texts and those videos and just from the prompts there is a "creative spark" there. It's not very bright spark lol but it's there.
  
  I should get into this stuff but I feel old lol. I imagine you could generate interesting levels with obstacles and riddles and "story beats" too.
  
  Because sometimes the generator just replicates bits of its training data wholesale. The "creative spark" isn't its own, it's from a human artist left uncredited and uncompensated.
  
  Artists are "inspired" by existing art or things they see in real life all the time. So that they can replicate art doesn't mean they can't generate art. It's a non sequitur. But I'm sure people are going to keep insisting on this so lets not argue back and forth on this :D

You've viewed 142 comments.