Skip Navigation

You're viewing a single thread.

14 comments
  • I think that the technology just isn't there for most generated dialog.

    What we're doing today is taking a training corpus and then directly, without higher-level processing, producing more text like it, given a prompt.

    What limitations exist here?

    • Written and spoken language are not the same, and a lot of training data is from written language. In English, written sentences are longer than spoken ones. People use some different words and grammatical structures. Try reading a play or transcribing what someone says, and it kind of drives the point home. That's not an unsolvable problem, but it's a good argument that gluing something like ChatGPT to a speech synthesizer is a long way from where you want to be.

    • You need a training corpus similar to the way a given character would speak. Maybe if you want "a generic American", you're okay. And there's legitimate uses for that in games, certainly. But what if you want to have Celechir, high-elven guard in the kingdom of Arandie? How do you build up a list of things that Celechir would talk about? How much germane training data is there out there?

    • There is no strong association between text and game world state, which is normally desirable. Let's say that I'm creating a character in Fallout 4. How am I going to get them to talk about the world around them? I can encode some world state and state about that character in a prompt, but that's sharply-bounded using existing mechanisms -- I can provide maybe a couple of hundred prompt terms, which is not a lot to try to describe the world and relevant characters and all that. I'd guess that any such generation mechanism is going to require some level of pre-processing as to world state that doesn't exist today.

    I am all for using generative AI to do speech synth. I've been impressed with output there. We may not be quite to the point of good, emotive speech yet, but we're good-enough for a lot of uses, and it lets one do things that cannot be done with pre-recorded, static samples from a voice actor, like dynamically-generated text.

    But for writing dialog via generative AI? I'm a lot more hesitant there in the near future, given what I've seen so far.

    Now, I am sure that you can make video games in certain limited genres that do leverage what's there. But I think that it's far enough from a drop-in replacement for hand-written text that it's not a great option. Maybe you can make a so-so sexy chatbot or something like that that's isolated from a broader video game world. Maybe you can create characters that speak in fairly-constrained ways. But I don't think that we can just create NPCs on par with human-written-dialog characters via gluing ChatGPT to them and providing a handful of human-language directives about how the character should act, the way we could for a human writer, which I think is what some people are dreaming of. Further down the line, maybe, but I think that it's still a fair way from where we are in 2024.

You've viewed 14 comments.