Google was hit with a wide-ranging lawsuit on Tuesday alleging the tech giant scraped data from millions of users without their consent and violated copyright laws in order to train and develop its artificial intelligence products.
This line of attack against LLMs seems just foolish. The data was put into the public for public consumption. There is no right to control whether the data is used to train something; that's just something people are making up.
Google and the others have been indexing all the public content on the web for years and nobody complained about search engines “stealing” public data to make the web more usable. Turn it into a chat bot and suddenly it’s some heinous crime against humanity.
They did complain a bit when google started pulling the answers to queries out of the sources and displaying them directly in the search results, which is probably what they're concerned with now-- google (et al) is no longer driving traffic to the sites, so the benefit to the sites is no longer there.
However, this still does not magically make it illegal. Intellectual Property laws have, imo, always been of dubious value to society-- especially in the last 100 years or so-- and we shouldn't just roll over when rightsholders make up a new "right" they think they should have.