Shows evidence that GPT-based systems will reproduce Times articles if asked.
A number of suits have been filed regarding the use of copyrighted material during training of AI systems. But the Times' suit goes well beyond that to show how the material ingested during training can come back out during use. "Defendants’ GenAI tools can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style, as demonstrated by scores of examples," the suit alleges.
The suit alleges—and we were able to verify—that it's comically easy to get GPT-powered systems to offer up content that is normally protected by the Times' paywall. The suit shows a number of examples of GPT-4 reproducing large sections of articles nearly verbatim.
The suit includes screenshots of ChatGPT being given the title of a piece at The New York Times and asked for the first paragraph, which it delivers. Getting the ensuing text is apparently as simple as repeatedly asking for the next paragraph.
The suit is dismissive of attempts to justify this as a form of fair use. "Publicly, Defendants insist that their conduct is protected as 'fair use' because their unlicensed use of copyrighted content to train GenAI models serves a new 'transformative' purpose," the suit notes. "But there is nothing 'transformative' about using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it."
The suit seeks nothing less than the erasure of both any GPT instances that the parties have trained using material from the Times, as well as the destruction of the datasets that were used for the training. It also asks for a permanent injunction to prevent similar conduct in the future. The Times also wants money, lots and lots of money: "statutory damages, compensatory damages, restitution, disgorgement, and any other relief that may be permitted by law or equity."
There's really no cat. We've been using algorithms to do stuff for a very long time (thousands of years), and there's literally no intelligence behind what people are calling "artificial intelligence". It's just another algorithm. This is just another increment in automation like all the rest (plow, printing press, loom, assembly line, computer, etc.), except the marketing is making it sound even more fundamental, when in reality it's really less impressive (i.e. a spell-checking feature added to a word-processing program! ).
Will capitalists still use the term "artificial intelligence" to try to justify whatever BS they're pulling—against other capitalists in the market, but especially against workers? Of course. Just like they're likely to keep using the term "sharing" to bypass labor protections and other regulation having to do with taxis, hotels, etc.
Anyway, we really don't have a horse in this race. Whether the capitalists wanting to preserve "intellectual property" win and Napsters and Pirate Bays keep getting taken down, or the SPAM engine capitalists win and everything we try to do gets flooded with so much barely camouflaged marketing junk that we can't sort through it all. Heads they win/tails we lose. Or whatever boring dumbassery winds up getting settled on in the middle to maximally both preserve and enhance our exploitation, which is the most likely result.
I hate the NYT. I also hate this AI Wild West, their fans, and their brazen attempts to reduce the human mind to microprocessors stealing other people’s work. So I guess I’m praying for the downfall of the latter more.
But NYT asking for billions in damages in pretty insane lol. I can’t see any judge agreeing to that amount or even being warm enough to make OpenAI and Microsoft panic to negotiate with NYT
The only good outcome is this ending in the abolishment of copyright
That’ll never happen, so this is just another chapter in our boring dystopia. Will probably end in with the most boring agreement or some sort of slow burn that will end up making life worse for everyone. AI bullshit is here to stay imo
Honestly I hope NYT goes bankrupt, they have always given me annoyingly smug elitist vibes. Where NY Post does transphobia overtly, NYT does transphobia 'respectfully'
In April 2020, the New York Times ran a story with a strong headline about the situation of press freedom in India: “Under Modi, India’s Press Is Not So Free Anymore.” In that story, the reporters showed how Modi met with owners of the major media houses in March 2020 to tell them to publish “inspiring and positive stories.”
When the case against NewsClick appeared to go cold, the New York Times – in August 2023 – published an enormously speculative and disparaging article against the foundations that provided some of NewsClick’s funds. The day after the story appeared, high officials of the Indian government went on a rampage against NewsClick, using the story as “evidence” of a crime. The New York Times had been warned previously that this kind of story would be used by the Indian government to suppress press freedom.
consider this capitalist rivalry. openai basically undermining the writing sector. To cheapen costs NYT would have to eventually adopt this, but it basically cheapens the content produced. it is basically massed produced stuff that will eventually just be really dull. sports illustrated just recently fired their ceo for having already used ai generated stuff.
Buzzfeed tried to replace their news department with ChatGPT and failed. They shut down the division instead. Even though Buzzfeed was already creating nothing original and just publishing a mashup of shit from other sources. Still too complicated for the dumbass plagiarism algorithms, which are basically incapable of producing anything that humans find interesting for more than a few seconds and a couple of "oohs and ahs". LOL.
How do we realistically feel this is gonna play out long term? I think there's no shot the old guard of the ruling class that wants to prevent AI slop from ruining everything wins over the very real incentive to embrace the AI gold rush. Feels like that ship sailed once AI generated articles kept getting lots of clicks even when they're devoid of content. This is very much vibes based though, I hope someone here can illuminate some other factors in their cost benefit analysis.
The cat's already out of the bag. I would be extremely surprised if the NYT gets what they want instead of a "win" where OpenAI pinky promises to stop using NYT content and pays $30 million in damages.
That's what I see as the most likely outcome, yeah, but isn't there gonna be a point where other corporations step in because cheap AI slop is a genuine threat to their bottom line?
In this case, NYT most likely is actually just looking for a cut of the money. Their claims in this are too absurd to actually hold up under scrutiny, nobody is using ChatGPT to bypass NYT's paywall on whatever years old content they actually have in their training data, people are using browser extensions for that. I would also want to know who is the target of the claims that ChatGPT hallucinating about NYT articles is damaging NYT's reputation.
One of the more significant things that could happen is that OpenAI could be forced to disclose details of their training data as part of discovery, which they really will not want to do. It would then be pretty easy to gauge exactly how overfit ChatGPT is (GPT 4.0 has 1.1 trillion parameters, depending on what precision they run it at this would be around a terabyte or more in size, I think 3.5 is closer to 350B, if the dataset has less entropy than the model parameters it is effectively guaranteed to start spitting out exact copies of training data). It would also be very useful info for OpenAI's competitors, so OpenAI will try to get the suit dismissed or settle before then. Deleting their dataset like NYT is demanding is absolutely not going to happen, since at most they have standing to make them delete their articles from their training dataset. Finetuning the model to not comply with NYT-related requests would also be enough to get their model to no longer infringe on their copyrights as well.
They might also be angling for government regulation with a lawsuit making bold claims that they expect to catch headlines and shape public opinion but don't completely expect to stick in court, since that's a recurring pattern in a lot of lawsuits against AI firms, like the Stable Diffusion lawsuit which contained absolute bangers like the claim that it stores compressed images just like JPEG compression and that the text-prompt interface "creates a layer of magical misdirection that makes it harder for users to coax out obvious copies of training images" (this is actually in the announcement for that lawsuit, I'm not making this shit up. It's really not surprising that most of that suit got thrown out).
There's no real endgame for them where they get anything further than a cut. AI companies can still train on copyright-free or licensed data and over time will get similar results, so there's not really anything that can be done to stop that in general. Copyright-reliant industries can certainly secure themselves a better position within that, though, where they might be able to gain either a steady income from licensing fees or exclusive use of their content for models under their control.
Their claims aren’t that absurd; Their articles likely were all used for training data. You could make an argument that that is copyright violation anyways.
I don’t AGREE with copyright but I don’t think the concept is absurd, especially when you’ve already established that legally protecting information behind paywalls is allowed (also stupid).
That's a good question, right? You'd think that the established media tycoons like Murdoch would have the kind of pull to have killed this baby in the womb, but they didn't. Is that because they're confident they can adapt to it?
You can say what you want about AI art, I think it’s inherently designed to reproduce societal attitudes in its current form, but it’s arguably still an art form despite that.
But, AI writing is basically irredeemable. Only case I can think of for it to have any purpose is in helping disabled people communicate or express themselves. Other than that, it’s literally just a magic lying machine (I don’t care if they’ve decreased the amount of lies, that just makes it more unexpected when it does)