Technology @lemmy.world return2ozma @lemmy.world 9 mo. ago

Reddit has reportedly signed over its content to train AI models

mashable.com Reddit has reportedly signed over its content to train AI models

A Reddit-pilled AI?!

206

Lemmy.org - Technology @lemmy.org Mazdak @lemmy.org 9 mo. ago

Reddit has reportedly signed over its content to train AI models

mashable.com /article/reddit-signs-ai-content-licensing-deal

chapotraphouse @hexbear.net Ideology [she/her] @hexbear.net 9 mo. ago

Reddit has reportedly signed over its content to train AI models

mashable.com /article/reddit-signs-ai-content-licensing-deal

206 comments

A LLM that behaves like a typical Redditor?

What possible use is that?
This is what the 3rd party access to API was really all about.

When API access was allowed , all reddit content was effectively free: They needed to ban 3rd party apps so they could sell the accumulated content. I expect using content to train AI also factors into it.
Reddit is a trove of user built content under the guise of community. What Spez did was to say "thanks for all the free work, suckers!", put a price sticker on it, and laughed all the way to the bank.

~~And this is why I'm not active on any Internet community anymore.~~ Nevermind, I guess I just can't help myself...
Considering some of the very wrong and upvoted domain specific knowledge I've seen on Reddit over the years I'm not sure the training data is going to be useful for much beyond what every other model can do.
This is why I don't blame anyone for editing/deleting their post history on reddit.
Considering how much of Reddit is already bots, I'm sure this will end fantastically.
The AI:

"IANAL so could you ELI5, so AITA?

THIS."
Their content?
"Reddit has given access to YOUR conversations and posts to AI companies.". FTFY

These were created by people, for peoole, and I will ALWAYS disagree that this data is Reddit's or any other platforms.

Don't forget your direct messages aren't end to end encrypted on Reddit, so now AI will be trained on your craziest "private" conversations
Reddit is all bots, porn, ads and political shit posts. Good luck getting any useful training content out of that.
Out of all things to hate Reddit for, giving data to AI isn't something fediverse users can really criticize it for, though making money from it perhaps.
Remember: All data in federated platforms is available for free and likely already being compiled into datasets. Don't be surprised if this post and its comments end up in GPT5 or 6 training data.
With reddits severe bot problem, it'll be like training on unfiltered sewage. Garbage in, garbage out.
One of the original Reddit memes was quite prescient:

https://i.imgur.com/Fza1Cut.jpg
Damn it. I haven't deleted my account due to how many people I've supported and helped, I stopped using it while ago. It seems I'll have to.
Good thing I scrubbed all of my posts and comments that I could. Fuck that site, straight up and down.
Where's my cut?
In before poisoning your comments on Reddit turns into the new protest.
It will get trained on some comment posts.

Let reddit die. Join Lemmy or /kbin. https://join-lemmy.org/ https://kbin.pub/
I wish there was a license for content like the GPL, that states if you use this content to train generative AI, the model must be open source. Not sure that would legally be enforceable though (due to fair-use).
Good. Maybe when it cogitates the things I've written it might start offering up some better ideas.
*laughs villainously* This is all going to plan, now there will be some chatbot spewing my insane beliefs
Why does it sound like reddit trained AI will only get dumber.
Who's dumb enough to pay for that? Everyone else is just scraping it for free.
So AI models are not farming the federation?
i am so glad i deleted all my posts. im sure they have backup hisory though :(.
Glad I nuked all my posts and comments and deleted my account last year
Good thing I had multiple bots overwrite my content before I deleted it all. Not that someone couldn't recover it, I'm not naive. But the AI bots should miss me.
I am not sure on what I'm going to say, but I think that LLMs are a technological dead end. They might get some use now, but eventually the industry will shift towards better models for machine text generation. And, if those models rely on a tiny corpus of hand-reviewed data, instead of shoving down as much text as possible into the model (the first "L" in "LLM" is "large"), then Reddit posts/comments will become outright useless.

In other words: Reddit is degrading further the trust of its userbase, and it might not even get much in return.
Enjoy training on my -checks notes- DELETED POST HISTORY YOU FUCKING CLOWNS.

Stay ForeverFucked™ spez.
I feel like AI companies have been scraping Reddit for their datasets already since the beginning and without permission. In fact, unless there's been a regulation change that i'm not aware of, i'm not sure why they would have Reddit "sign away" the data when they can just scrape it.

Also dubious if the current form of AI has a future. They seem like they should revolutionize every sector when you look at their capacities, but in practice their applications might be more limited than we thought?

Anyway, if Reddit does go public i will be deleting my account within the hour. The only reason i haven't yet is that i've been a moderator of the same subreddit for eight years and it's the only thing that's been consistent in my life in that time, i'm kind of attached. The reason i will is i didn't sign up to create value for shareholders, i signed up to create value for a community.
FUCK REDDIT! FUCK U/SPEZ! The Red-exit shall endure, VIVA LA LEMMY!!
Good thing I'm not on that shitty platform anymore.
"Its content", sure.
It already happened without their consent. You've been able to get it to produce "reddit text posts", for years. This is a bit harrowing, though.
Is this why the privacy policy was updated?
Oh no! My outdated political takes and league of legends rants are going to be used to train AI!?

We're all doomed!
They say it’s $60 million on an annualized basis. I wonder who’d pay that, given that you can probably scrape it for free.

Maybe it’s the AI act in the EU. That might cause trouble in that regard. The US is seeing a lot of rent-seeker PR, too, of course. That might cause some to hedge their bets.

Maybe some people had not realized that yet, but limiting fair use does not just benefit the traditional media corporations but also the likes of Reddit, Facebook, Apple, etc. Making “robots.txt” legally binding would only benefit the tech companies.
Gross
Meanwile I'm on Matrixstoemmy
Signed over its content.

Just like that? No thought or anything put into what makes good vs bad training data?

Good luck lmfao.

Makes you wonder how hard it would be to clog up the training data with outputs from other AI models to really bake in that echo defect that they all seem to have to some extent as fast as possible. Wouldn’t that suck!
/r/leopardsatemyface
Permanently Deleted

You've viewed 206 comments.