The authors added that OpenAI’s LLMs could result in derivative work “that is based on, mimics, summarizes, or paraphrases” their books, which could harm their market.
Ok, so why not wait until those hypothetical violations occur and then sue?
Because the outcome of suing first is to address the potential outcome of what could happen based on what OenAI is doing right now. Kind of like how safety regulations are intended to prevent future problems based on what has happened previously, but expanded similar potential dangers instead of waiting for each exact scenario to happen.
The difference is that you're trying to sue someone based on what could happen. That's like sueing some random author because they read your book and could potentially make a story that would be a copy of it.
LLM's are trained on writings in the language and understand how to structure sentences based on their training data. Do AI models plagiarize anymore than someone using their understanding of the English language is plagiarizing when they construct a brand new sentence? After all, we learn how to write by reading the language and learning the rules, is the training data we read when we were kids being infringed whenever we write about similar topics?
When someone uses AI to plagiarize you sue them into eternity for all I care, but no one seems concerned with the implications of trying to a sue someone/something because they trained an intelligence by letting it read publicly available written works. Reading and learning isn't allowed because you could maybe one day possibly use that information to break copyright law.
I see this more like suing a musician for using a sample of your recording or a certain amount of notes or lyrics from your song without your consent. The musician created a new work but it was based on your previous songs. I'm sure if a publisher asked ChatGBT to produce a GRRM-like novel, it would create a plagiarism-lite mash up of his works that were used as writing samples, using pieces of his plots and characters, maybe even quoting directly. Sampling GRRM's writing, in other words.
Except doing all of that is perfectly legal. With music it's called a remix or a cover. With stories it's called fanfic.
If the AI is exactly replicating an artists works then that is copyright infringment without a doubt. But the AI isn't doing that and it likely isn't even capable of doing that.
But wouldn't the person who made the remix/cover or fanfic have to pay if they made money off of their work? Don't they need permission of the writer to sell that work? That is what I have always known, unless the original work is in the public domain. I'm not talking about someone creating an inspired work for their own private or not for sale use - in my example I was talking about a publishing company creating a work for sale.
Nope. Those are all transformative works and are fair use. The remix, cover, or fanfic are all considered new works as far as copyright is concerned and the writer of them can do whatever they want with them including sell them. People get their fanfics published all the time they just usually don't sell well. People make covers of songs and sell them all the time. I can think of several youtube channels that only do exactly that. Anyone can just go record themselves playing Wonderwall and try to sell it because them playing that song is a unique work. I think trademarked stuff is more restricted on what you can do with it but I'm not sure on that.
AI is also even more limited in regards to transformative works than humans because you can't copyright the direct output of an AI. So if, for example, you made an AI output a cover of a song you could still do whatever you want with it but you couldn't own the rights to it. Anyone else could also take it and profit off of it. The only way to copyright AI output is to create a transformative work based on that output. You can use the AI output to create a new work but you can't just call the AI output your work. In my opinion that's exactly where the law should be. You can use AI as a creative tool but you can't just have one generate every possible picture of something and copyright them all.
Suing anyone for copyright infringement based on current infringement always includes justification that includes current and future potential losses. You don't get paid for the potential losses, but they are still justification for them to stop infringing right now.
There is no current infringement unless they've discovered some knockoff series that was created with AI and is being used for profit. That's what I'm saying.
The copyright holders did not give OpenAI permission to copy their text into OpenAI, whether as direct text or an abstracted copy of the text, for commercial purposes.
Do AI models plagiarize anymore than someone using their understanding of the English language is plagiarizing when they construct a brand new sentence?
The way I've heard it described: If I check out a home repair book and use that knowledge to do some handy-man work on the side, do I owe the publisher a cut of my profits?
If, without asking for permission, 1 person used my work to learn from it and taught themself to replicate it I'd be honoured. If somebody is teaching a class full of people that, I'd have objections. So when a company is training a machine to do that very same thing, and will be able to do that thousands of time per second, again, without asking for permission first, I'd be pissed.
So how about someone who loves to read books wants to become a writer, and uses the plot twists, characters, environments, writing style of books they already read.
Inspiration is something we do through conscious experience. Just because some statistical analysis of a word cloud can produce sentences that trick a casual observer into thinking a person wrote them doesn’t make it a creative process.
In fact, I can prove to you that (so-called) AI can never be creative.
To get an AI to do anything, we have to establish a goal to measure against. You have to quantify it.
If you tell a human being “this is what it means to be creative; we have an objective measure of it”, do you know what they tend to do? They say “fuck your definition” and try to make something that breaks the rules in an interesting way. That’s the entire history of art.
You can even see that playing out with respect to AI. Artists going “You say AI art can’t be art, so I’m gonna enter AI pieces and see if you can even tell.”
That’s a creative act. But it’s not creative because of what the AI is doing. Much like Duchamp’s urinal wasn’t a creative object, but the act of signing it R Mutt and submitting it to a show was.
The kinds of AIs we design right now will never have a transformative R Mutt moment, because they are fundamentally bounded by their training. They would have to be trained to use novel input to dismantle and question their training (and have that change stick around), but even that training would then become another method of imitation that they could not escape. They can’t question quantification itself, because they are just quantitative processes — nothing more than word calculators.
Those rules or objectives exist for human artists too. They're just liquid, and human artists try to break them, or test the limits of stated rules to find the edges of the envelope of what counts as art. And more often than not (95% according to Theodore Sturgeon) they fail to sell, which could be from exceeding the boundaries of the expected art threshold, or just by doing it poorly.
Now you could argue (and I think you might be arguing) that creative acts or inspiration are both properties of personhood: That which we regard as a person can do art. If it's done by nature, by a non-person animal (e.g. the Monkey Selfie) or by a mechanical process doesn't count as a creative act, as inspiration, or as art. I get it, just as someone who uses a toaster to warm up pop-tarts is not regarded as actually cooking. That said:
a) you'd have to make that assertion by fiat. And your definition doesn't count for anyone else, unless you pass a law or convince art-defining social groups to adhere to your definitions.
b) Capitalist interests don't care. If it's cheaper to make AI design their website or edit their film, and it does an adequate job cheaper than hiring an artist, they're going to do it. Even if we make it illegal to use some works to train AI, that won't stop it from leaking through via information technology services that scrape webs. Similarly ALPR companies, which use traffic cameras to track you in your car to determine your driving habits then sell that information to law enforcement who are totally violating your fourth-amendment rights when they do it, but it doesn't stop them, and that information is used in court to secure convictions.
c) It's not artists that control intellectual property, but publishing companies, and they've already been looking to churn out content as product the results of which we've seen in most blockbuster cinema offerings. The question is not if Fast & Furious XXIII is art but if people will pay to watch it. And IP law has been so long rotted to deny the public a robust public domain, we can expect they'll lobby our representatives until they can still copyright content that is awash with AI elements.
Ultimately the problem is also not whether artist get paid for their work doing art. It's that the most of us are desperate to get paid for anything and so it's a right privilege when that anything is doing something arty. The strikes, the lawsuits, these are survival precarity talking. If we didn't have to worry about that (say in an alternate reality where we had a robust UBI program) AI replacing artists would be a non-issue. People would continue to create for the sake of creation as we saw during the epidemic lockdown of 2020 and the consequential Great Resignation.
Generative AI is not at the magical level that managers and OG artists and capitalists thing it is, but short of war, a food crisis or our servers getting overrun by compound foul weather, it's going to get better and eventually AI will outpace Theodore Sturgeon's threshold of quality material to crap. This isn't what is going to confine human-crafted content to the fan-art section. It's that our shareholder-primacy-minded capitalist masters are looking to replace anyone they pay with a cheaper robot, and will at first opportunity. That's the problem we have to face right now.
I'm not the same person as @snooggums@kbin.social, but it did look like they were replying on my behalf, so I understand the assumption. No worries there.
I agree with what you're saying.
I would just wanna clarify that you're primarily talking about "art as a marketable commodity" and the societal problems with how that interacts with AI development, where I was talking primarily about "art as a cultural message" and the fundamental inability of AI to cross the threshold from "art as a product" to "art as a message" because the model itself has nothing to message about. (With the caveat that a person may use the AI's product as a message, but then the meaning comes from the person, not the AI.) I think we agree with each other here.
Btw, and you probably already know this, Cory Doctorow has some really sharp insights and recommendations when it comes to the past, present, and future of IP law and how we might be able to protect creators going forward.
I do wanna respond to something that wasn't really directed at me, just cuz it overlaps with my original comment and I think it's kind of interesting:
Again, you can say by fiat an AI has the personhood of a toaster, but that doesn’t make the content it creates less quality or less real. And given in the past how often we’ve disparaged art for being made by women, by non-whites, by Jews, we as a social collective have demonstrated our opinion is easily biased to arbitrarily favor those sources we like.
You’re not going to find any way to objectively justify including only human beings as qualified to make art.
You're right that, without an objective measure of what counts as an artistic endeavor, we're permitted to be as discriminatory as we feel like being. Which seems... not great, right?
But I don't think you ever can make an objective measure of what counts as art, because art is like the observable physical effect of something that's going on in our consciousness -- an immaterial world that can't directly map 1:1 with the physical world.
So I think art is always destined to be this amorphous thing that you can't exactly pin down. It's maybe more of a verb than a noun. Like I can't look at an inert object sitting on a table and figure out that it's art. But if someone tells me that this is the last sculpture their aunt made before she died and she started it when she felt fine, but by the end she could barely hold her hands still, and she never finished it... Well, suddenly I catch a glimpse of the conscious experience of that person. And it's not that her conscious experience was baked into the object, but that I can imagine being in her place and I can feel the frustration of the half-finished angles and the resignation of staring at it after touching it for the last time.
Yes, there is a real history of people saying "Those savages aren't conscious", or that they are technically conscious but a "lower" kind of consciousness. And I know it makes us uncomfortable to think we might do that again, and so I think some of us have developed a reflex to say we need to make an objective rational view of the world so that human subjectivity doesn't come into it and poison things... But I don't think it's possible, as long as the nature of consciousness remains a mystery to us.
And I also think if we do come to agree on a rationalist framework for living, we will have lost something. Once you have rules and measures, there's no room for... well, for lack of a better word, "soul". I'm an atheist, but I'm also conscious. And I don't think that the totality of my conscious experience is somehow quantifiable, or especially that if we could replay those exact quantities then it's just as good as consciousness. Like, I am experiencing something here, and there's no good reason to think that matter precedes consciousness and not the other way around.
Ultimately the problem is also not whether artist get paid for their work doing art. It’s that the most of us are desperate to get paid for anything and so it’s a right privilege when that anything is doing something arty. The strikes, the lawsuits, these are survival precarity talking. If we didn’t have to worry about that (say in an alternate reality where we had a robust UBI program) AI replacing artists would be a non-issue. People would continue to create for the sake of creation as we saw during the epidemic lockdown of 2020 and the consequential Great Resignation.
This is a perfect framing for this discussion. I think people are pissed that AI disrupts this economic model of compensating creators, but the problem isn't AI it's the economic model.
I think this is also the conversation people like Altman were hoping to have around AI (sorry if that's too much benefit of the doubt for him), I think enthusiasts hope AI can help transition us to a more equitable economy. People are (rightly) concerned that instead of bringing about economic change, AI will further consolidate economic forces and make life even more miserable for everyone. Throwing copyright law at the problem to me seems like a desperate attempt to keep the boat afloat.
I am saying AI won't have biological living experiences, only abstract concepts of biological living experiences that are fed into it.
You are reading way more into my point than my actual point. Another way of saying it is that we can try to understand a dog and explain why dogs do what they do, but we are not actual dogs and cannot use the actual experience of being a dog when creating art. Or how someone will never know the exact experience of someone of a other race even though they can understand the concepts of differences. Experience is different than understanding an abstract.
Firstly, @snooggums@kbin.social = @kibiz0r@midwest.social ? I was responding to the latter, so when you say I am saying (implicit format, to clarify, when I said X, I was [meaning to say] Y. ) I don't know which part of what reply fulfills X, unless you just mean to be emphatic. (e.g. He's mad! Mad, I tell you! ) So my thread context is lost.
Secondly the AI's lack of human experience seems irrelevant. Human artists commonly guess at what dogs think / feel, what it is to be a racial minority, another sex or whatever it is to not be themselves. And we're not great at it. AI, guessing at what it is to be human doesn't have a high bar to overcome. We depend on abstracts and third-party information all the time to create empathizable characters.
For that matter, among those empathizable characters, synthetic beings are included. The whole point of Blade Runner 2049 is that everyone, synthetic or otherwise, is valid, is deserving of personhood.
Again, you can say by fiat an AI has the personhood of a toaster, but that doesn't make the content it creates less quality or less real. And given in the past how often we've disparaged art for being made by women, by non-whites, by Jews, we as a social collective have demonstrated our opinion is easily biased to arbitrarily favor those sources we like.
You're not going to find any way to objectively justify including only human beings as qualified to make art.
Well, I am not saying that only humans can make art. I think a lot of other animals are fully capable of making art, even if we frequently call it instinct. Hell, bird mating rituals are better displays of physical dancing than humans in a lot of cases!
I am saying what we currently call AI, which is just mismashing existing art and not creating anything new or with any kind of complex emotions, will make technical art that has no depth or background that is commonly associated with art.
I really wish you lot would educate yourself on AI and the history of AI creativity and art before convincing yourself you know what you're talking about snd giving everyone your Hot Take.
Can you elaborate? "AI and the history of AI creativity and art" is a pretty broad scope, so I'm sure I have some massive blind spots within it, and I'd love some links or summaries of the areas I might be missing.
Generative AI training is not the same thing as human inspiration. And transformative work has this far has only been performed by a human. Not by a machine used by a human.
Clearly using a machine that simply duplicates a work to resell runs afoul of copyright.
What about using a machine that slightly rewords that work? Is that still allowed? Or a machine that does a fairly significant rewording? What if it sort of does a mashup of two works? Or a mashup of a dozen? Or of a thousand?
Under what conditions does it become ok to feed a machine with someone's art (without their permission) and sell the resulting output?
That's the point, it's almost a ship of Theseus situation.
At what point does the work become its own compared to a copy? How much variation is required? How many works are needed for sampling before its designing information based on large scale sentence structures instead of just copying exactly what it's reading?
Legislation can't occur until a benchmark is reached or we just say wholesale that AI is copyright infringement based purely on its existence and training.
The difference is that, to sue someone, you have to demonstrate that they were acting outside of existing laws and caused you real harm. Case law was never intended to proactively address hypothetical future scenarios—that’s what lawmakers and regulators are for.
In this case they are suing based on current copyright infringement by OpenAI, with the justification of predicable outcomes. Like how you can sue someone who is violating zoning ordinances and using predictable negative outcomes based on similar cases to justify the urgency of making them stop now instead of just trying to get money back when things get even worse.
Because that is far harder to prove than showing OpenAI used his IP without permission.
In my opinion, it should not be allowed to train a generative model on data without permission of the rights holder. So at the very least, OpenAI should publish (references to) the training data they used so far, and probably restrict the dataset to public domain--and opt-in works for future models.
I don't see why they (authors/copyright holders) have any right to prevent use of their product beyond purchasing. If I legally own a copy of Game of Thrones, I should be able to do whatever the crap I want with it.
And basically, I can. I can quote parts of it, I can give it to a friend to read, I can rip out a page and tape it to the wall, I can teach my kid how to read with it.
Why should I not be allowed to train my AI with it? Why do you think it's unethical?
Next if you come up with some ideas of your own fantasy environment after watching game of thrones, they'll want to chase you down considering they didn't give you expressed permission to be "inspired" by their work 🙄
And basically, I can. I can quote parts of it, I can give it to a friend to read, I can rip out a page and tape it to the wall, I can teach my kid how to read with it.
These are things you're allowed to do with your copy of the book. But you are not allowed to, for example create a copy of it and give that to a friend, create a play or a movie out of it. You don't own the story, you own a copy of it on a specific medium.
However, just as a person does not own the work of an author, the authors do not own words, grammar, sentences or even their own style. Similarly, they do not own the names of the characters in their books or the universe in which the plot is happening. They even do not "own" their own name.
So the only question remaining becomes whether is AI allowed to "read" a book. In the future authors might prohibit it, but hey, we're just going to end up with a slightly more archaic-speaking GPT over time because it will not train on new releases. And that's fine by me.
I think that in the end it should be a matter of licenseship (?). The author might give you the right to train a model on it, if you pay them for it. Just like you'd have get permission if you want to turn their work into a play or a show.
I don't think the argument (not yours, but often seen in discussions like these) about "humans can be inspired by a work, so a computer should be allowed to be as well" holds any ground. For it would take a human much more time to make a style their own, as well as to recreate large amounts of it. For a ai model the same is a matter of minutes and seconds, respectively. So any comparison is moot, imho.
But the thing is, it's not similar to turning their work into a play or a TV show. You aren't replicating their story at all, they put words in a logical order and you are using that to teach the AI what the next word logically could be.
As for humans taking much more time to properly mimic style, of course that's true (assuming untrained). But an AI requires far more memory and data to do that. A human can replicate a style with just examples of that style given time. An AI needs to scrape basically the entire internet (and label it, which takes quite some time) to be able to do so. They may need different things but it's ridiculous to say that they're completely incomparable. Besides, you make it sound like AI is it's own entity that wasn't created, trained, and used by humans in the first place.
It's not the same as turning it into a play, but it's doing something with it beyond its intended purpose, specifically with the intention to produce derivatives of it at an enormous scale.
Whether or not a computer needs more or less of it than a human is not a factor, in my opinion. Actually, the fact that more input is required than for a human only makes it worse, since more of the creators work has to be used without their permission.
Again, the reason why I think it's incomparable is that when a human learns to do this, the damage is relatively limited. Even the best writer can only produce so many pages per day. But when a model learns to do it, the ability to apply it is effectively unlimited. The scale of the infraction is so exponentially more extreme, that I don't think it's reasonable to compare them.
Lastly, if I made it sound like that, I apologise, that was not my intention. I don't think it's the models fault, but the people who decided to (directly or indirectly by not vetting their input data) take somebody's copyrighted work and train an LLM on it.
I don't think the potential difference between how much damage can be caused is a reasonable argument. After all, economic damages to writers from others copying, plagiarizing their work or style or world is limited not because it's hard for humans to do so, but because we made it illegal to make something so similar to another person's copyrighted work.
For example, Harry Potter has absolutely been copied to the extent legally allowed, but no one cares about any of those books because they're not so similar that they affect the sales of Harry Potter at all. And that's also true for AI. It doesn't matter how closely it can replicate someone's style or story if that replication can never be used or sold due to copyright infringement, which is already the case right now. Sure you can use it to generate thousands of books that are just different enough to not get struck down, but that wouldn't affect the original book at all.
Now, to be fair, with art you can be more similar to others art, because of how art works. But also, to be fair, the art market was never about how good an artist was, it was about how expensive the rich people who bought your art wanted it to be for tax purposes. And I doubt AI art is valuable for that.
Ownership is never absolute. Just like with music - you are not allowed to use it commercially i.e. in your restaurant, club, beauty salon, etc. without paying extra. You are also not allowed to do the same with books - for example, you shouldn't share scans online, although it's "your" book.
However, it is not clear how AI infringes on the rights of authors in this case. Because a human may read a book and produce a similar book in the same style legally.
Assuming that books used for GPT training were indeed purchased, not pirated, and since "AI training" was not prohibited at the time of the purchase, the engineers had every right to use them. Maybe authors in the future could prohibit "AI training" but for the books purchased before they do, "AI training" is a fair usage.
Okay, the problem is there are only about three companies with either enough data or enough money to buy it. Any open source or small time AI model is completely dead in the water. Since our economy is quickly moving towards being AI driven, it would basically guarantee our economy is completely owned by a handful of companies like Getty Images.
Any artist with less weight than GRR and Taylor Swift is still screwed, they might get a peanut or two at most.
I'd rather get an explosion of culture, even if it mean GRR doesn't get a last fat paycheck and Hollywood loses control of its monopoly.
I get it. I download movies without paying for it too. It's super convenient, and much cheaper than doing it the right thing.
But I don't pretend it's ethical. And I certainly don't charge other people money to benefit from it.
Either there are plenty of people who are fine with their work being used for AI purposes (especially in a open source model), or they don't agree to it - in which case it would be unethical to do so.
Just because something is practical, doesn't mean it's right.
There's so much more at stake, it's not remotely the same as pirating. AI is poised to take over any kind of job that requires only a computer and a telephone. I'd rather have robust open source options that a handful of companies exerting a subscription tax on half the economy.
Any overt legislation will only hurt us the consumer while 99.9% of the actual artists and contributers won't see any benefit whatsoever.
Short of aggressively nationalizing any kind of AI endeavour, making it as free and accessible as possible is the best option imo.