Nothing sinister, we just don't delete what we say we delete. Instead we keep it in your profile to feed the algorithms and set the "deleted" flag to make you think it's gone.
But clearly the data is not overwritten and this was intentional. How do I know? Because that would amount to a massive amount of data, if it was de to a bug in Apple software or underlying filesystems, it would be detected in monitoring systems "Hey, we're using 10x the data we should be, maybe we should look into it".
The mistake was in the flag code that was supposed to fool us.
no when I say "overwritten" I mean that the area is set as deleted in the filesystem and the next time something writes to that area the data that was there before is disregarded.
and the next time something writes to that area the data that was there before is disregarded.
A single overwrite might not be enough to defeat physical forensics because shadows of the old data persist in how the new data is stored. Also when it comes to SSDs you might be waiting a long time for the data to get overwritten as the drive will wear-level its erm sectors (what are those things called with SSDs?).
So are you saying that they suffered from a filesystem bug that caused deletion failure? I'd imagine they use standard filesystems on their backend, I haven't heard about any bugs like this.
If you ask me, what's more likely, that a company known for shitty behavior lies about deleting files so they can continue to use that information to profit, -- OR -- that they are experiencing a filesystem bug on their backend, I'll choose the former.
no I don't believe a damn word of what apple's gonna say on this, I just wanted to get the message out there that generally file deletion works by allowing data to be overwritten, so if the images are local this could very well just be that either it's showing data that hasn't been overwritten yet or it accidentally brought things out of the "recently deleted" depending on how long ago it was deleted.
Seriously: I don’t think the cost benefit is there to intentionally make a maneuver like this. Any crap they pull needs to have a perfectly proper explanation, with our agreement to a specific term buried somewhere in their policies. Can only imagine how much money they blew throwing these billboards up all over the San Francisco Bay area. We have to buy Apple over Google for ostensible privacy gains, and Apple has to lock us in to their walled gardens to make up for their comparatively smaller ad/data business.
This post assumes Apple is aethical (that’s like amoral but for ethics right?) but still a self-interested economic actor. They can’t let short-term greed get in the way of long-term greed!
Seriously: I don’t think the cost benefit is there to intentionally make a maneuver like this. Any crap they pull needs to have a perfectly proper explanation, with our agreement to a specific term buried somewhere in their policies. Can only imagine how much money they blew throwing these billboards up all over the San Francisco Bay area. We have to buy Apple over Google for ostensible privacy gains, and Apple has to lock us in to their walled gardens to make up for their comparatively smaller/data business.
This post assumes Apple is aethical (that’s like amoral but for ethics right?) but still a self-interested economic actor. They can’t let short-term greed get in the way of long-term greed!
shred doesn't even necessarily work at the OS level. If you use something like ext3 and I assume ext4, normally when you overwrite data in a file, you're not overwriting data even at the logical level in the block device. Journalling entails that you commit data to somewhere else on the disk, then update the metadata atomically to reference the new data.
It was more-practical in an era of older filesystems.
Only necessary on the ol spinning rust, with SSDs not only is it completely unnecessary, but it also burns extra writes.
Spinny's store data magnetically on the platter with 1s and 0s, SSDs store data on the NAND as a held charge. If there's a charge in the block it's a 1 if there's no charge it's a 0.
With spinny's, a file gets marked as "deleted" but the residual magnetic 1s and 0s will remain on the platter until eventually overwritten
With SSDs a file gets marked "deleted" and within no more than a few minutes TRIM comes along and ensures the charge on the NAND is released for that data, there's no residuals to worry about like with spinny's and is in fact necessary to ensure decent lifespans.
Wow, the SSD can hold the charges perfectly while unplugged for ages? Amazing.
In a post apocalyptic world where I am in charge of building a storage drive and I’m given all the instructions and fabs, the world is going without storage.
That's skipping over the fact that recovering deleted data, even if it isn't overwritten, is not an "oops". It it takes extra effort, and if that data isn't being protected it would be overwritten incidentally as drives are used.
There is a big difference in a database between "flagging" data and actually removing the association of the data to the database.
That's how a lot of people handle deleted data in database, it's literally just a flag. That's why there's a recommendation to edit Reddit posts before deleting them, to ensure they're actually overwritten so they can't just be restored.
Every time someone says something like this I have to explain CDC and regular old backups. There’s no way in hell Reddit doesn’t keep cold and hot backups of their shit. And while Reddit is unlikely to be doing CDC for soc2 or other compliance reasons, it’s the easiest method to capture data for analytics purposes.
CDC stands for change data capture. It’s generally done with databases by streaming the change log or ref log to a bucket or a service like Kafka where you can fast forward and rewind the log queue to see the state of the DB at any point in time. Even if you edit your comments it’s likely sitting in a Kafka topic or a snowflake bucket outside of the DB or cache used for the presentation layer.
Zero large scale websites operate with a truly single data store. There is always another layer that your user operations don’t impact
Yes, that's certainly possible, but it's also out of my control. I have basically three options:
Delete account - we know this doesn't delete comments
Delete comment - "seems" to delete comments, but we've seen comments get restored - so probably using a "deleted" flag
Edit comment with nonsense and when delete - should poison comment if they're just using the deleted flag
That's it. There's no guarantee it works, but it has a much higher chance of working than the other two.
And there's a good chance they delete old backups. Hosting every edit is expensive, so there's a decent chance they clean up old data after some months.
In 2019 the total size of the text stored by Reddit was only 50TB. A Petabyte of data in cold storage is only 12k a year so even if they 500x in size since 2019 (very unlikely) it’s a drop in their ARR. given they sell the data for advertising and for AI, they are not deleting it. Reddit also self hosts a lot of their infra (they used to present their architecture at kubecon) so the storage costs would be even lower