OpenAI declares AI race “over” if training on copyrighted works isn’t fair use
OpenAI declares AI race “over” if training on copyrighted works isn’t fair use

OpenAI declares AI race “over” if training on copyrighted works isn’t fair use

OpenAI declares AI race “over” if training on copyrighted works isn’t fair use
OpenAI declares AI race “over” if training on copyrighted works isn’t fair use
That's a good litmus test. If asking/paying artists to train your AI destroys your business model, maybe you're the arsehole. ;)
Not only that, but their business model doesn't hold up if they were required to provide their model weights for free because the material that went into it was "free".
There's also an argument that if the business was that reliant on free things to start with, then it shouldn't be a business.
No-one would bat their eyes if the CEO of a real estate company was sobbing that it's the end of the rental market, because the company is no longer allowed to get houses for free.
even the top phds can learn things off the amount of books that openai could easily purchase, assuming they can convince a judge that if the works aren't pirated the "learning" is fair use. however, they're all pirating and then regurgitating the works which wouldn't really be legal even if a human did it.
also, they can't really say how they need fair use and open standards and shit and in the next breathe be begging trump to ban chinese models. the cool thing about allowing china to have global influence is that they will start to respect IP more... or the US can just copy their shit until they do.
imo that would have been the play against tik tok etc. just straight up we will not protect the IP of your company (as in technical IP not logo, etc.) until you do the same. even if it never happens, we could at least have a direct tik tok knock off and it could "compete" for american eyes rather than some blanket ban bullshit.
This particular vein of "pro-copyright" thought continuously baffles me. Copyright has not, was not intended to, and does not currently, pay artists.
Its totally valid to hate these AI companies. But its absolutely just industry propaganda to think that copyright was protecting your data on your behalf
Copyright has not, was not intended to, and does not currently, pay artists.
You are correct, copyright is ownership, not income. I own the copyright for all my work (but not work for hire) and what I do with it is my discretion.
What is income, is the content I sell for the price acceptable to the buyer. Copyright (as originally conceived) is my protection so someone doesn't take my work and use it to undermine my skillset. One of the reasons why penalties for copyright infringement don't need actual damages and why Facebook (and other AI companies) are starting to sweat bullets and hire lawyers.
That said, as a creative who relied on artistic income and pays other creatives appropriately, modern copyright law is far, far overreaching and in need of major overhaul. Gatekeeping was never the intent of early copyright and can fuck right off; if I paid for it, they don't get to say no.
Copyright has not, was not intended to, and does not currently, pay artists.
Wrong in all points.
Copyright has paid artists (though maybe not enough). Copyright was intended to do that (though maybe not that alone). Copyright does currently pay artists (maybe not in your country, I don't know that).
Interesting copyright question: if I own a copy of a book, can I feed it to a local AI installation for personal use?
Can a library train a local AI installation on everything it has and then allow use of that on their library computers? <— this one could breathe new life into libraries
First off, I'm by far no lawyer, but it was covered in a couple classes.
According to law as I know it, question 1 yes if there is no encryption, and question 2 no.
In reality, if you keep it for personal use, artists don't care. A library however, isn't personal use and they have to jump through more hoops than a circus especially when it comes to digital media.
But you raise a great point! I'd love to see a law library train AI for in-house use and test the system!
I wonder if there's some validity to what OpenAI is saying though (but I certainly don't completely agree with them).
If the US makes it too costly to train AI models, then maybe China will relax any copyright laws so that Chinese AI models can be trained quickly and cheaply. This might result in China developing better AI models than the US.
Maybe the US should require AI companies to pay a large chunk of their profits to copyright holders. So copyright holders would be compensated, but an AI company would only have to pay if they generate profits.
Maybe someone more knowledgeable in this field will tell me I'm totally wrong.
No, it means that copyrights should not exist in the first place.
I'm fine with this. "We can't succeed without breaking the law" isn't much of an argument.
Do I think the current copyright laws around the world are fine? No, far from it.
But why do they merit an exception to the rules that will make them billions, but the rest of us can be prosecuted in severe and dramatic fashion for much less. Try letting the RIAA know you have a song you've downloaded on your PC that you didn't pay for - tell them it's for "research and training purposes", just like AI uses stuff it didn't pay for - and see what I mean by severe and dramatic.
It should not be one rule for the rich guys to get even richer and the rest of us can eat dirt.
Figure out how to fix the laws in a way that they're fair for everyone, including figuring out a way to compensate the people whose IP you've been stealing.
Until then, deal with the same legal landscape as everyone else. Boo hoo
🌏👨🚀🔫👨🚀🌌
I also think it's really rich that at the same time they're whining about copyright they're trying to go private. I feel like the 'Open' part of OpenAI is the only thing that could possibly begin to offset their rampant theft and even then they're not nearly open enough.
They are not releasing anything of value in open source recently.
Sam altman said they were on the wrong side of history about this when deepseek released.
They are not open anymore I want that to be clear. They decided to stop releasing open source because 💵💵💵💵💵💵💵💵.
So yeah I can have huge fines for downloading copyrighted material where I live, and they get to make money out of that same material without even releasing anything open source? Fuck no.
But I can't pirate copyrighted materials to "train" my own real intelligence.
Now you get why we were all told to hate AI. It's a patriot act for copywrite and IP laws. We should be able too. But that isn't where our discussions were steered was it
That's because the elites don't want you to think for yourself, and instead are designing tools that will tell you what to think.
True!
I mean, if they are allowed to go forward then we should be allowed to freely pirate as well.
In the end, we're just training some non-artifical intelligence.
Yeah, you can train your own neural network on pirated content, all right, but you better not enjoy that content at the same time or have any feelings while watching it, because that's not covered by "training".
Don't worry: the law will be very carefully crafted so that it will be legal only if they do it, not us.
Fine by me. Can it be over today?
I'll get the champagne for us and tissues for Sam.
Unfortunately, the tissues have a 1000% tarrif. Perhaps sandpaper will do?
Shit, save your $$$ and get some GPUs since the market would crash.
I'll bring the meth
Training that AI is absolutely fair use.
Selling that AI service that was trained on copyrighted material is absolutely not fair use.
Agreed... although I would go a step further and say distributing the LLM model or the results of use (even if done without cost) is not fair use, as the training materials weren't licensed.
Ultimatelly it's "Doing Research that advances knowledge for everybody" that should be allowed free use of copyrighted materials, whils activities for direct or indirect commercial gains (included Research whose results are Patented and then licensed for a fee) should not, IMHO.
"We can't succeed without breaking the law. We can't succeed without operating unethically."
I'm so sick of this bullshit. They pretend to love a free market until it's not in their favor and then they ask us to bend over backwards for them.
Too many people think they're superior. Which is ironic, because they're also the ones asking for handouts and rule bending. If you were superior, you wouldn't need all the unethical things that you're asking for.
Sounds like you are describing the orange baboon in the white house.
these kinds of asshats are all the same. Only difference is the size of the hat.
That sounds like a you problem.
"Our business is so bad and barely viable that it can only survive if you allow us to be overtly unethical", great pitch guys.
I mean that's like arguing "our economy is based on slave plantations! If you abolish the practice, you'll destroy our nation!"
Good point. I've never seen it framed this way before. Poignant.
Thanks, heh, I just came back to look at what I'd written again, as it was 6am when I posted that, and sometimes I say some stupid shit when I'm still sleepy. Nice to know that I wasn't spouting nonsense.
Good if AI fails because it can't abuse copyright. Fuck AI.
*except the stuff used for science that isn't trained on copyrighted scraped data, that use is fine
Yeah unfortunately we’ve started calling any LLM “AI”
In ye old notation ML was a subset of AI, and thus all LLM would be considered AI. It's why manual decision trees that codify get NPC behaviour are also called AI, because it is.
Now people use AI to refer only to generative ML, but that's wrong and I'm willing to complain every time.
Come on guys, his company is only worth $157 billion.
Of course he can't pay for content he needs for his automated bullshit machine. He's not made of money!
Company burning stacks of hundred dollar bills to generate power to run hallucination machine worth $157 billion. What a world.
Sam Altman is a grifter, but on this topic he is right.
The reality is, that IP laws in their current form hamper innovation and technological development. Stephan Kinsella has written on this topic for the past 25 years or so and has argued to reform the system.
Here in the Netherlands, we know that it's true. Philips became a great company because they could produce lightbulbs here, which were patented in the UK. We also had a booming margarine business, because we weren't respecting British and French patents and that business laid the foundation for what became Unilever.
And now China is using those exact same tactics to build up their industry. And it gives them a huge competitive advantage.
A good reform would be to revert back to the way copyright and patent law were originally developed, with much shorter terms and requiring a significant fee for a one time extension.
The current terms, lobbied by Disney, are way too restrictive.
I totally agree. Patents and copyright have their place, but through greed they have been morphed into monstrous abominations that hold back society. I also think that if you build your business on crawled content, society has a right to the result to a fair price. If you cannot provide that without the company failing, then it deserves to fail because the business model obviously was built on exploitation.
I agree, which is why I advocate for reform, not abolishment.
Perhaps AI companies should pay a 15% surcharge on their services and that money goes directly into the arts.
But Sam is talking about copyright and all your examples are patents
It just so happens that in AI it's about copyright and with margarine (and most other technologies) it's about patents.
But the point is the same. Technological development is held back by law in both cases.
If all IP laws were reformed 50 years ago, we would probably have the technology from 2050, today.
It's all the same shit. No patents and copyrights should exist.
That's not fair to change the system only when businesses require it. I received a fuckin' letter from a government entity where I live for having downloaded the trash tier movie "Demolition".
I agree copyright and patents are bad but it's so infuriating that only the rich and powerful can choose not to respect it.
So I think openAI has to pay because as of now that shitty copyright and patent system is still there and has hurt many individuals around the world.
We should try to change the laws for copyright but after the big businesses pay their due.
I mean, I'd say there's a qualitative difference between industrial products and a novel, for example.
Lmao Sam Altman doesn't want tbe rules chanhed for you. He wants it changed for him.
You will still be beholden to the laws.
Slave owners might go broke after abolition? 😂
I'm going to have to remember this
So pirating full works for commercial use suddenly is "fair use", or what? Lets see what e.g. Disney says about this.
That's like calling stealing from shops essential for my existence and it would be "over" for me if they stop me. The shit these clowns say is just astounding. It's like they have no morals and no self awareness and awareness for people around them.
That's like calling stealing from shops essential for my existence and it would be "over" for me if they stop me.
What's really fucked up is that for some people this is not far from their reality at all
I think they are either completely delusional, or they know very well how important AI is for the government and the military. The same cannot be said for regular people and their daily struggles.
In America, companies have more rights than the human person.
If companies say that they need to do something to survive, that makes it ok. If a human needs to do something to survive, that's a crime.
Know the difference. (/s)
It’s like stealing from shops except the shops didn’t lose anything. You’re up a stolen widget, but they have just as many as before.
The only way this would be ok is if openai was actually open. make the entire damn thing free and open source, and most of the complaints will go away.
Truly open is the only way LLMs make sense.
They're using us and our content openly. The relationship should be reciprocal. Now, they need to somehow keep the servers running.
Perhaps a SETI like model?
“The plagiarism machine will break without more things to plagiarize.”
Copyrights should have never been extended longer than 5 years in the first place, either remove draconian copyright laws or outlaw LLM style models using copyrighted material, corpos can't have both.
Bro, what? Some books take more than 5 years to write and you want their authors to only have authorship of it for 5 years? Wtf. I have published books that are a dozen years old and I'm in my mid-30s. This is an insane take.
The one I thought was a good compromise was 14 years, with the option to file again for a single renewal for a second 14 years. That was the basic system in the US for quite a while, and it has the benefit of being a good fit for the human life span--it means that the stuff that was popular with our parents when we were kids, i.e. the cultural milieu in which we were raised, would be public domain by the time we were adults, and we'd be free to remix it and revisit it. It also covers the vast majority of the sales lifetime of a work, and makes preservation and archiving more generally feasible.
5 years may be an overcorrection, but I think very limited terms like that are closer to the right solution than our current system is.
You don't have to stop selling when a book becomes public domain, publishers and authors sell public domain/commons books frequently, it's just you won't have a monopoly on the contents after the copyright expires.
Thanks that's very insightful and I'll amend my position to 15 years 5 may be just a little zealous. 100 year US copyrights have been choking innovation due to things like Disney led trade group lobbyists, 15 years would be a huge boost to many creators being able to leverage more IPs and advancements being held in limbo unused or poorly used by corpo entities.
I think copyright lasting 20 years or so is not unreasonable in our current society. I'd obviously love to live in a society where we could get away with lower. As a compromise, I'd like to see compulsory licensing applied to all written work. (E.g., after n years, anyone can use it if they pay royalties and you can't stop them; the amount of royalties gradually decreases until it's in the public domain.)
I agree that copyright is far too long, but at 5 years there's hardly incentive to produce. You could write a novel and have it only starting to get popular after 5 years.
I think 5 years is a bit short.
Send This comment To the top
the issue is that foreign companies aren't subject to US copyright law, so if we hobble US AI companies, our country loses the AI war
I get that AI seems unfair, but there isn't really a way to prevent AI scraping (domestic and foreign) aside from removing all public content on the internet
If I had to pay tuition for education (buying text books, pay for classes and stuff), then you have to pay me to train your stupid AI using my materials.
Whoever brings Aaron Swartz back gets to violate all the copyright laws
Aaron Swartz was 100% opposed to all copyright laws, you remember that yah?
Yes, and he killed himself after the FBI was throwing the book at him for doing exactly what these AI assholes are doing without repercussion
I'm not just a copyright abolitionnist, I also abhor all intellectual property. Yes, even trademsrk
And he also said "child pornography is not necessarily abuse."
In the US, it is illegal to possess or distribute child pornography, apparently because doing so will encourage people to sexually abuse children.
This is absurd logic. Child pornography is not necessarily abuse. Even if it was, preventing the distribution or posession of the evidence won't make the abuse go away. We don't arrest everyone with videotapes of murders, or make it illegal for TV stations to show people being killed.
Wired has an article on how these laws destroy honest people's lives.
https://web.archive.org/web/20130116210225/http://bits.are.notabug.com/
Big yikes from me whenever I see him venerated.
TLDR: "we should be able to steal other people's work, or we'll go crying to daddy Trump. But DeepSeek shouldn't be able to steal from the stuff we stole, because China and open source"
Fuck these psychos. They should pay the copyright they stole with the billions they already made. Governments should protect people, MDF
Good. I hope this is what happens.
I am good with that.
dont threaten me with a good time
Look we may have driven Aaron Swartz to suicide for doing basically the same thing on a smaller scale, but dammit we are getting very rich of this. And, if we are getting rich, then it is okay to break the law while actively fucking over actually creative people. Trust us. We are tech bros and we know what is best for you is for us to become incredibly rich and out of touch. You need us.
In case anyone is unfamiliar, Aaron Swartz downloaded a bunch of academic journals from JSTOR. This wasn't for training AI, though. Swartz was an advocate for open access to scientific knowledge. Many papers are "open access" and yet are not readily available to the public.
Much of what he downloaded was open-access, and he had legitimate access to the system via his university affiliation. The entire case was a sham. They charged him with wire fraud, unauthorized access to a computer system, breaking and entering, and a host of other trumped-up charges, because he...opened an unlocked closet door and used an ethernet jack from there. The fucking Secret Service was involved.
https://en.wikipedia.org/wiki/Aaron_Swartz#Arrest_and_prosecution
The federal prosecution involved what was characterized by numerous critics (such as former Nixon White House counsel John Dean) as an "overcharging" 13-count indictment and "overzealous", "Nixonian" prosecution for alleged computer crimes, brought by then U.S. Attorney for Massachusetts Carmen Ortiz.
Nothing Swartz did is anywhere close to the abuse by OpenAI, Meta, etc., who openly admit they pirated all their shit.
You're correct that their piracy was on a much more egregious scale than what Aaron did, but they don't openly admit to their piracy. Meta just argued that it isn't piracy because they didn't seed.
Edit: to be clear. I don't think that Aaron Swartz did anything wrong. Unlike the chatGPT, meta, etc.
Where are the copyright lawsuits by Nintendo and Disney when you need them lol
At the end of the day the fact that openai lost their collective shit when a Chinese company used their data and model to make their own more efficient model is all the proof I need they don't care about being fair or equitable when they get mad at people doing the exact thing they did and would aggressively oppose others using their own work to advance their own.
Then die. I don't know what else to tell you.
If your business model is predicated on breaking the law then you don't deserve to exist.
You can't send people to prison for 5 years and charge them $100,000 for downloading a movie and then turn around and let big business do it for free because they need to "train their AI model" and call one of thief but not the other...
Absolutely. But in this case the law is also shit and needs to be reformed. I still want to see Altman fail, because he's an asshole. But copyright law in its current form is awful and does hold back society.
If your business model is predicated on breaking the law then you don’t deserve to exist.
All of Wall Street sweating nervously
The law isn't automatically moral.
This issue just exposes how ridiculous copyright law is and how much it needs to be changed. It exists specifically to allow companies to own, for hundreds of years, intellectual property.
It was originally intended to protect individual artists but has slowly mutated to being a tool of corporate ownership and control.
But, people would rather use this as an opportunity to dunk on companies trying to develop a new technology rather than as an object lesson in why copyright rules are ridiculous.
I don't disagree but the idea being that the law is made by supposedly moral men and that law is at least moral within the perspective and context of society at the time.
It's literally worse than piracy, since the AI companies are also trying to sell shittier versions of the works they copy from
Like selling camrips except done by multi-billion dollar companies ripping off individuals and stores are trying to put them right next to the original DVDs in the store
over it is then. Buh bye!
If artificial intelligence can be trained on stolen information, then so should be "natural" intelligence.
Oh, wait. One is owned by oligarchs raking in billions, the other just serves the plebs.
couldnt' have said it better...the irony...
So pirating full works suddenly is fair use, or what?
Only if you're doing it to learn, I guess
Wait until all those expensive scientific journals hear about this
Business that stole everyone's information to train a model complains that businesses can steal information to train models.
Yeah I'll pour one out for folks who promised to open-source their model and then backed out the moment the money appeared... Wankers.
Oh no! How will I generate a picture of Sam Altman blowing himself now!?
Photoshop, just like the rest of us.
Wdym? He removed his rib or something?
I was thinking more of a Sam 1 and Sam 2 type situation.
Good.
Fuck Sam Altman's greed. Pay the fucking artists you're robbing.
Sounds like another way of saying "there actually isn't a profitable business in this."
But since we live in crazy world, once he gets his exemption to copyright laws for AI, someone needs to come up with a good self hosted AI toolset that makes it legal for the average person to pirate stuff at scale as well.
I mean, pirating media at scale for your own consumption can be considered "training of a neural network" as well..
First step, be a business. Second step, accept Trump's dick in your ass. Congratulations, here's your "get out of jail free" card.
Also, pirating media at scale isn't that hard to do right now anyway lol
If training an ai on copyrighted material is fair use, then piracy is archiving
I'm fine with that haha
God forbid you offer to PAY for access to works that people create like everyone else has to. University students have to pay out the nose for their books that they "train" on, why can't billion dollar AI companies?
This is basically a veiled admission that OpenAI are falling behind in the very arms race they started. Good, fuck Altman. We need less ultra-corpo tech bro bullshit in prevailing technology.
If everyone can 'train' themselves on copyrighted works, then I say "fair game.''
Otherwise, get fucked.
Sounds fair, shut it down.
What if we had taken the billions of dollars invested in AI and invested that into public education instead?
Imagine the return on investment of the information being used to train actual humans who can reason and don’t lie 60% of the time instead of using it to train a computer that is useless more than it is useful.
But you have to pay humans, and give them bathroom breaks, and allow them time off work to spend with their loved ones. Where's the profit in that? Surely it's more clever and efficient to shovel time and money into replacing something that will never be able to practically develop beyond current human understanding. After all, we're living in the golden age of humanity and history has ended! No new knowledge will ever be made so let's just make machines that regurgitate our infallible and complete knowledge.
I have conflicting feelings about this whole thing. If you are selling the result of training like OpenAI does (and every other company), then I feel like it’s absolutely and clearly not fair use. It’s just theft with extra steps.
On the other hand, what about open source projects and individuals who aren’t selling or competing with the owners of the training material? I feel like that would be fair use.
What keeps me up at night is if training is never fair use, then the natural result is that AI becomes monopolized by big companies with deep pockets who can pay for an infinite amount of random content licensing, and then we are all forever at their mercy for this entire branch of technology.
The practical, socioeconomic, and ethical considerations are really complex, but all I ever see discussed are these hard-line binary stances that would only have awful corporate-empowering consequences, either because they can steal content freely or because they are the only ones that will have the resources to control the technology.
No amigo, it's not fair if you're profiting from it in the long run.
I'm fine for them to use copyrighted material, provided that everyone can do the same without reprecautions Fuck double standards. Fuck IP. People should have access to knowledge without having to pay.
PS. I know this might be an unpopular opinion
Edit: typos
On the other side, creators should be paid for their labor.
I couldn't agree more. The thing with IP is that it tends to last almost forever, thus it almost never enters public domain, at least in a man's lifetime. The result is it stifles innovation and prevents knowledge NAD entertainment to the masses. Lastly almost always, it's not the creator that benefits of it, rather than a huge corp
For Sam:
Then let it be over then.
If AI gets to use copyrighted material for free and makes a profit off of the results, that means piracy is 1000% Legal. Excuse me while I go and download a car!!
All you have to do is present credible evidence that these companies are distributing copyrighted works or a direct substitute for those copyrighted works. They have filters to specifically exclude matches though, so it doesn’t really happen.
So Deepmind is good to train on your models then right?
Oh, so now you're just going to surrender our precious natural resources to the Imperialist Chinese?!
Guys, I think we've got a Wumao over here. Someone get what's left of the FBI to arrest him and show his ass the fucking door.
Good. Fuck AI