A group of authors filed a lawsuit against Meta, alleging the unlawful use of copyrighted material in developing its Llama 1 and Llama 2 large language models....
The Internet Archive is currently fighting in the courts to maintain free digital library access to over 500,000 books they own from their own collection, yet Meta uses a pirated dataset of nearly 200,000 books to train their proprietary AI and is just allowed to get away with that??
Publishers will go after a charity making fair use of their content, but not the corporation outright stealing from them. What utter bollocks.
Easy solution. "The Internet Archive" should rebrand itself to "Archiving the Internet" to confuse everyone who talks about how "AI" should be able to steal books.
It'd be better if they went after literally every other AI corp than Meta in this case. Meta is the only one that's ironically releasing open-source models and leading the way for open-source LLMs. I don't want Meta to stop doing this.
Yeah, but we're not looking at the root cause here. Their purpose is to train energy glutton, error prone "AI" even if experience teaches us that those ML models fuck up more often than confirmation bias allows.
"AI" is a bourgeoise and Capitalist tool and, same as with cryptocurrency, we cannot dismantle the master's house with the master's tools. Fuck AI down the drain. Make things with your own minds, your own hands.
Meta has acknowledged using parts of the Books3 dataset but argued that its use of copyrighted works to train LLMs did not require "consent, credit, or compensation." The company refutes claims of infringing the plaintiffs' "alleged" copyrights, contending that any unauthorized copies of copyrighted works in Books3 should be considered fair use.
Furthermore, Meta is disputing the validity of maintaining the legal action as a Class Action lawsuit, refusing to provide any monetary "relief" to the suing authors or others involved in the Books3 controversy. The dataset, which includes copyrighted material sourced from the pirate site Bibliotik, was targeted in 2023 by the Danish anti-piracy group Rights Alliance, demanding that digital archiving of the Books3 dataset should be banned and is using DMCA notices to enforce those takedowns.
What sort of crack are they on that they think unauthorized use of an entire work for commercial gain is fair use? I think copywrite laws are ridiculous but that is a pretty low bar they are trying to set.
They should have to pay for their usage or retrain the model without it. Going to guess they would prefer to pay up.
Training an AI does not involve copying anything so why would you think that fair use is even a factor here? It's outside of copyright altogether. You can't copyright concepts.
Downloading pirated books to your computer does involve copyright violation, sure, but it's a violation by the uploader. And look at what community we're in, are we going to get all high and mighty about that?
Cranky enough to demand satisfaction (in the courts if not the dueling field), but no one in the company will think their own ire warrants empathy for those from whom they pirate.
I just asked it about this and it denied it. Then I said Meta acknowledged it and you are lying and it apologised and said it did use copywrite material without permission. Fuck I hate AI
For anyone else that was curious. This makes me feel sick. People are already treating AI as some unbiased font of all knowledge, training it to lie to people is surely not going to cause any issues at all (stares at HAL 9000).
Internal documents on how the AI was trained were obviously not part of the training data, why would they be. So it doesn't know how it was trained, and as this tech always does, it just hallucinates an English sounding answer. It's not "lying", it's just glorified autocomplete.
Saying things like "it's lying" is overselling what it is. As much as any other thing that doesn't work is not malicious, it just sucks.
Sure, they are working to solve these concerns by teaching their LLM to lie and obfuscate, and by becoming so big nobody sues them anymore. I'm sick of this.
I'm no fan of megacorps, and I definitely know that they are breaking the law. However, copyright laws should change so that any schmuck can use any text to train any AI. I'm all for punishing mega corporations and I understand that they play by their own set of rules (that is unfair), but piracy is piracy even when mega corporations do it and I believe that piracy is the moral choice. Meta then choosing to make their model not fully open I definitely have a problem with and that does not meet my bar for okay, but I strongly believe that all information for all people or entities should be free to transfer without restriction.
Meta's llama models are generally open. In fact Meta is the main megacorp that's driving open-source AI right now. Everyone else keeps their models proprietary.
Until that happens though, they must not be allowed to have it both ways - call us "pirates" when we copy their shit without paying for it, and tell us that paying for shit they copy is "impossible".
Meta train open llms, only big techs can train AI... Go pursuit OpenAI or Google and leave Meta (I'm really not a fan of Meta but their "open" AIs are great examples of good works) do their work!