Some of the prior cases described in this article, as precedents that could spell trouble for OpenAI, frankly sound like miscarriages of justice. Using copyright to prevent organizations from photocopying articles for internal use? What the heck?
If anything, my take home message is that the reach of copyright law is too long and needs to be taken down a peg.
You are not wrong that monopolies granted by copyright are regularly and unfairly abused.
That being said, AI trainers are getting away with plagiarism right now. More importantly, it's not just violation of a single copy, it's potentially the creation of tools that enable mass derivative copies. Authors that create training data need to be compensated.
Authors that create training data need to be compensated.
There should not be a problem with that. The people who work on training datasets are already being paid.
The reason you are getting downvoted is that these lawsuits are not about that. These are about giving money to corporations like the NYT - or Reddit, or Facebook, etc - for the "intellectual property" that they already have lying around. It's pure grift.
Because the creation of all that is already paid for, that leaves all the more money for lawyers and PR campaigns to extract money for nothing from society.
If anything, my take home message is that the reach of copyright law is too long and needs to be taken down a peg.
Exactly! Copyright law is terrible. We need to hold AI companies to the same standard that everyone else is held. Then we might actually get big corporations lobbying to improve copyright law for once. Giving them a free pass right now would be a terrible waste of an opportunity in addition to being an injustice.
I think the photocopying thing models fairly well with user licenses for software. Without commenting on whether that's right in the grand scheme of things, I can see that as analogous. Most folks accept that they need individual user licenses for software right? I get that photocopying can't be controlled the same way software can but the case was in the 90s? I mean these things aren't about whether the provider of the article/software faces increased marginal cost for additional copies/users but that the user/company is getting more use than they paid for. License agreements. Seems like a problem with the terms of licenses and laws rather than how they were judged as following them or not. Their use didn't seem to be transformative and the for profit nature of their use sort of overruled the "research" fair use.
I also think the mp3.com thing sucks, but again, the way the law is, that's a reasonable/logical outcome. Same thing that will kill someone offering ebooks to people who show a proof of purchase.
I don't know the solution to the situation with NYT/open AI. It's a pretty bad look to be able to spit out an article nearly verbatim. We do need copyright reform, but I think that's at the feet of the legislators, not judges. I only need to see the recent Alabama IVF court ruling to be reminded of the danger of more... interpretative rulings.
In its blog post responding to the Times lawsuit, OpenAI wrote that “training AI models using publicly available Internet materials is fair use, as supported by long-standing and widely accepted precedents.”
The most important of these precedents is a 2015 decision that allowed Google to scan millions of copyrighted books to create a search engine.
Stability AI and Anthropic will undoubtedly make similar arguments as they face copyright lawsuits of their own.
But fewer people remember MP3.com, a music startup that tried harder to color inside the lines but still got crushed in the courts.
When a customer wanted to add a CD to their collection, they would put it in their CD-ROM drive just long enough to prove they owned it.
“Defendant purchased tens of thousands of popular CDs in which plaintiffs held the copyrights, and, without authorization, copied their recordings onto its computer servers,” wrote Judge Jed Rakoff in a decision against MP3.com.
The original article contains 644 words, the summary contains 155 words. Saved 76%. I'm a bot and I'm open source!