Pressure grows on artificial intelligence firms over the content used to train their products
‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says::Pressure grows on artificial intelligence firms over the content used to train their products
If it ends up being OK for a company like OpenAI to commit copyright infringement to train their AI models it should be OK for John/Jane Doe to pirate software for private use.
But that would never happen. Almost like the whole of copyright has been perverted into a scam.
I guess the lesson here is pirate everything under the sun and as long as you establish a company and train a bot everything is a-ok. I wish we knew this when everyone was getting dinged for torrenting The Hurt Locker back when.
Remember when the RIAA got caught with pirated mp3s and nothing happened?
Wow! You’re telling me that onerous and crony copyright laws stifle innovation and creativity? Thanks for solving the mystery guys, we never knew that!
I'm dumbfounded that any Lemmy user supports OpenAI in this.
We're mostly refugees from Reddit, right?
Reddit invited us to make stuff and share it with our peers, and that was great. Some posts were just links to the content's real home: Youtube, a random Wordpress blog, a Github project, or whatever. The post text, the comments, and the replies only lived on Reddit. That wasn't a huge problem, because that's the part that was specific to Reddit. And besides, there were plenty of third-party apps to interact with those bits of content however you wanted to.
But as Reddit started to dominate Google search results, it displaced results that might have linked to the "real home" of that content. And Reddit realized a tremendous opportunity: They now had a chokehold on not just user comments and text posts, but anything that people dare to promote online.
At the same time, Reddit slowly moved from a place where something may get posted by the author of the original thing to a place where you'll only see the post if it came from a high-karma user or bot. Mutated or distorted copies of the original instance, reformated to cut through the noise and gain the favor of the algorithm. Re-posts of re-posts, with no reference back to the original, divorced of whatever context or commentary the original creator may have provided. No way for the audience to respond to the author in any meaningful way and start a dialogue.
This is a miniature preview of the future brought to you by LLM vendors. A monetized portal to a dead internet. A one-way street. An incestuous ouroborous of re-posts of re-posts. Automated remixes of automated remixes.
--
There are genuine problems with copyright law. Don't get me wrong. Perhaps the most glaring problem is the fact that many prominent creators don't even own the copyright to the stuff they make. It was invented to protect creators, but in practice this "protection" gets assigned to a publisher immediately after the protected work comes into being.
And then that copyright -- the very same thing that was intended to protect creators -- is used as a weapon against the creator and against their audience. Publishers insert a copyright chokepoint in-between the two, and they squeeze as hard as they desire, wringing it of every drop of profit, keeping creators and audiences far away from each other. Creators can't speak out of turn. Fans can't remix their favorite content and share it back to the community.
This is a dysfunctional system. Audiences are denied the ability to access information or participate in culture if they can't pay for admission. Creators are underpaid, and their creative ambitions are redirected to what's popular. We end up with an auto-tuned culture -- insular, uncritical, and predictable. Creativity reduced to a product.
But.
If the problem is that copyright law has severed the connection between creator and audience in order to set up a toll booth along the way, then we won't solve it by giving OpenAI a free pass to do the exact same thing at massive scale.
It's not "impossible". It's expensive and will take years to produce material under an encompassing license in the quantity needed to make the model "large". Their argument is basically "but we can have it quickly if you allow legal shortcuts."
I can't make a Jellyfin server full of content without copyrighted material either, but the key difference here is I'm not then trying to sell that to investors.
This situation seems analogous to when air travel started to take off (pun intended) and existing legal notions of property rights had to be adjusted. IIRC, a farmer sued an airline for trespassing because they were flying over his land. The court ruled against the farmer because to do otherwise would have killed the airline industry.
It feels to be like every other post on lemmy is taking about how copyright is bad and should be changed, or piracy is caused by fragmentation and difficulty accessing information (streaming sites).
Then whenever this topic comes up everyone completely flips. But in my mind all this would do is fragment the ai market much like streaming services (suddenly you have 10 different models with different licenses), and make it harder for non mega corps without infinite money to fund their own llms (of good quality).
Like seriously, can't we just stay consistent and keep saying copyright bad even in this case? It's not really an ai problem that jobs are effected, just a capitalism problem. Throw in some good social safety nets and tax these big ai companies and we wouldn't even have to worry about the artist's well-being.
If the copyright people had their way we wouldn't be able to write a single word without paying them. This whole thing is clearly a fucking money grab. It is not struggling artists being wiped out, it is big corporations suing a well funded startup.
But our current copyright model is so robust and fair! They will only have to wait 95y after the author died, which is a completely normal period.
If you want to control your creations, you are completely free to NOT publish it. Nowhere it's stated that to be valuable or beautiful, it has to be shared on the world podium.
We'll have a very restrictive Copyright for non globally transmitted/published works, and one for where the owner of the copyright DID choose to broadcast those works globally. They have a couple years to cash in, and then after I dunno, 5 years, we can all use the work as we see fit. If you use mass media to broadcast creative works but then become mad when the public transforms or remixes your work, you are part of the problem.
Current copyright is just a tool for folks with power to control that power. It's what a boomer would make driving their tractor / SUV while chanting to themselves: I have earned this.
A ton of people need to read some basic background on how copyright, trademark, and patents protect people. Having none of those things would be horrible for modern society. Wiping out millions of jobs, medical advancements, and putting control into the hands of companies who can steal and strongarm the best. If you want to live in a world run by Mafia style big business then sure.
"Impossible"? They just need to ask for permission from each source. It's not like they don't already know who the sources are, since the AIs are issuing HTTP(S) requests to fetch them.
Burglary is impossible without breaking some doors and locks. So you have to make it legal to break doors and locks now, because otherwise I cannot go on with my profession.
TBH I only use LLMs when traditional search fails and even then I'm not sure if I'm getting something useful or hallucination. I need better search engines not fancy AI bullshitters
Why do they have free reign to store and use copyrighted material as training data? AIs don’t learn as a human would, and comparisons can’t be made between the learning processes.
I wonder if the act of picking cotton was copyrighted, would we had got the cotton gin? We have automated most non-creative pursues and displaced their workers. Is it because people can take joy out of creative pursues that we balk at the automation? If you have a particular style in picking items to fulfill Amazon orders, should that be copyrighted and protected from being used elsewhere?
The developer OpenAI has said it would be impossible to create tools like its groundbreaking chatbot ChatGPT without access to copyrighted material, as pressure grows on artificial intelligence firms over the content used to train their products.
Chatbots such as ChatGPT and image generators like Stable Diffusion are “trained” on a vast trove of data taken from the internet, with much of it covered by copyright – a legal protection against someone’s work being used without permission.
AI companies’ defence of using copyrighted material tends to lean on the legal doctrine of “fair use”, which allows use of content in certain circumstances without seeking the owner’s permission.
John Grisham, Jodi Picoult and George RR Martin were among 17 authors who sued OpenAI in September alleging “systematic theft on a mass scale”.
Getty Images, which owns one of the largest photo libraries in the world, is suing the creator of Stable Diffusion, Stability AI, in the US and in England and Wales for alleged copyright breaches.
The submission said it backed “red-teaming” of AI systems, where third-party researchers test the safety of a product by emulating the behaviour of rogue actors.
The original article contains 530 words, the summary contains 190 words. Saved 64%. I'm a bot and I'm open source!
My hot take is that it's not like most of those independent artists are getting compensated fairly by the companies that own them anyway if at all. Stealing ai training content is just stealing from corporations. Corporations who are probably politically fighting to keep things worse for the average person in your country.
Theft is "a crime" but I never saw anyone complaining about how unfair it was all those times I myself got fucked over by google bullshitting their way out of giving me my ad revenue. If normal people can't profit from stuff like this, we shouldn't be doing anything to protect the profits of evil corporations.
So if I look at a painting study it and then emulate the original painter's artstyle, then I'm in breach of their copyright?
Or if I read a lot of fantasy like GRRM or JK Rowling and I also write a fantasy book and say, that they were my Inspiration, I'm breaching their copyright??
That's not how it works, and if it is, it shouldn't be!
Sure, if a start reproducing work, i.e. plagiarizing the work of others, then I'm doing sth wrong.
And to spin this further: If I raise a child on children's books by a specific author, am I breaching copyright, when my child enters the workforce and starts to earn money????
Stupid, yes! But so are the copyright claims against LLMs, in my opinion.
Copyright protection only exists in the context of generating profit from someone else's work. If you were to figure out cold fusion and I'd look at your research and say "That's cool, but I am going to go do some woodworking." I am not infringing any copyrights. It's only ever an issue if the financial incentive to trace the profits back to it's copyrighted source outway the cost of doing so. That's why China has had free reign to steal any western technology, fighting them in their courts is not worth it. But with AI it's way easier to trace the output back to it's source (especially for art), so the incentive is there.
The main issue is the extraction of value from the original data. If I where to steal some bricks from your infinite brick pile and build a house out of them, do you have a right to my house? Technically I never stole a house from you.