TL;DR: I got a response from Reddit that basically says they’re not violating anything.
There was a post here 3 weeks ago that talked about the GDPR violations Reddit is committing.
reddit is telling it's future investors with recent news and more info on their IPO, that they're currently selling and looking to sell their user's data to companies wanting to train their LLMs, including Google.
I’m not sure of anyone else has gotten a response from them yet so I thought I’d share the email.
The Email:
Hello,
Thank you for contacting Reddit.
As stated in Reddit's Privacy policy much of the information on the Services is public and accessible to everyone, even without an account. By using the Services, you are directing us to share this information publicly and freely.
Reddit prohibits use of its service to infringe people’s intellectual property rights or any other proprietary rights, and prohibits unauthorized scraping of Reddit content. Please note, however, that when you submit content (including a post, comment, or chat message) to a public part of the Services, any visitors to and users of our Services will be able to see that content, the username associated with the content, and the date and time you originally submitted the content.
Reddit allows moderators to access Reddit content using moderator bots and tools. Reddit also allows other third parties to access public Reddit content using Reddit's developer services, including Reddit Embeds, our APIs, Developer Platform, and similar technologies. We limit third-party access to this content. Reddit's Developer Terms are our standard terms governing how these services are used by third parties.
Please note that you can use the Services without choosing to share information publicly and freely on them, and you can also remove your content from Reddit at your discretion. For more information, please check out our help center articles for more information here
I worked at tech companies that were doing obviously illegal things, who will actively deny it to anybody outside the company but then when they finally get a fine, will tell employees, "It's the cost of doing business."
1 Consent should be given by a clear affirmative act establishing a freely given, specific, informed and unambiguous indication of the data subject’s agreement to the processing of personal data relating to him or her,
[...]
3 Silence, pre-ticked boxes or inactivity should not therefore constitute consent.
4 Consent should cover all processing activities carried out for the same purpose or purposes.
5 When the processing has multiple purposes, consent should be given for all of them.
6 If the data subject’s consent is to be given following a request by electronic means, the request must be clear, concise and not unnecessarily disruptive to the use of the service for which it is provided.
Long tale short, it depends, but likely yes unless reddit stops what it is doing.
Almost every post will contain experiences that could identify someone, so the wisest move would be to assume yes, or naively try to classify each post as 'bread-crumb' or 'not bread-crumb' for their specific processing then store and sell each separately. Non exhaustive list of personal data criteria:
If the comments are tied to, or not stored separately from, your identifiers, (email, IP, handle, site ID, location, etc,) then yes
If your comments are not anonymous or include details about you, then yes.
If the data will be processed to identify you, then yes.
If the data will be used to profile you, then yes.
Unique information about you, such as your subscribed sub-reddits, your browsing habits, the time spent on each link, your writing style, etc may also count as personal data if used to identify or target you.
(1) ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;
[...]
(4) ‘profiling’ means any form of automated processing of personal data consisting of the use of personal data to evaluate certain personal aspects relating to a natural person, in particular to analyse or predict aspects concerning that natural person’s performance at work, economic situation, health, personal preferences, interests, reliability, behaviour, location or movements;
(5) ‘pseudonymisation’ means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person;
[...]
(15) ‘data concerning health’ means personal data related to the physical or mental health of a natural person, including the provision of health care services, which reveal information about his or her health status;--
If you dislike this, keep in mind Lemmy provides a wide-open API for free scraping from pretty much any server (including yours). And if that wasn't enough, people can also set up their own servers to pull upvote and downvote counts, all without vetting.
Yup. Lemmy beats Reddit in a lot of areas, but privacy isn't one of them. In fact, federated services value transparency instead. Lemmy also kind of goes against the idea of anonymity somewhat, since many instances require email validation (you can use a temporary email though).
People really find it hard to grasp that stuff you willingly post online in a public way can be seen by everyone. There was a thread here earlier about people flabbergasted that the admins of email services can read their unencrypted emails you send through their servers. Top response was said admins going "yes we can read your emails, no we don't, we have better things to be doing with our time."
I do not see where the violation can be if all this data sharing / selling has been explained by reddit and only info that is shared are your posts and comments, not your mail address or IP address.
Why would you even consider that platform where you publicly post things would not be able to do something with that info. Anyone being able to read this comment is also a violation?
No way they can form a proper response to you on GDPR without citing GDPR. This is either utter incompetence or a lie. Wondering if one could sue them just for this reply message.
They're not infringing on your copyright, because you agreed to the following:
[...] you grant Reddit the following license to use that Content:
When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.
Is that an EULA? I thought that was for buying software? I mean I'm pretty sure we have other forms of contracts here in the EU?! Like Terms of service.
Is that a known fact about Reddit's terms of service / "EULA", or something you made up?
And some EULA's are valid in the EU. Just not the American ones that you get to read after you bought something.
I think this is the issue here. OP is mixing content copyright with the GDPR. But the GDPR regulates personal data, not copyright on text. And that's what Reddit is trying to sell, the content of posts, not their user's personal data... So the GDPR doesn't apply to that. Hence Reddit say they aren't violating anything, because the copyright is in the ToS.
I think that's also my issue with the original letter. It wants to sound official and legalese, but it confuses several things. Intellectual property, copyright and privacy /data protection laws. I don't think the author(s) understand the GDPR. It includes a definition what personal data is. And the letter is mostly talking about something unrelated. Also there are additional requirements. For example identifiability. And they also fail to address any of that... I also don't like some of the things Reddit does, but I think this is just not a well reasoned argument. If I were in customer support or a lawyer, I'd brush it off, too.
Public posts on the Internet can be scraped by anyone for free. Reddit is more selling easy to consume access to that information via structured high bandwidth APIs. You should do as they said, and tell them to delete all your data so they aren't allowed to host or profit off it anymore.
GDPR doesn't care if they make money on the data. But in practice they do go after the bigger offenders, who often make billions of euros (and have been fined over a billion euros)