So if you're the only user (let's assume for ease) then, that represents all the updates (posts, comments, votes) from each community that you are subscribed to?
Yeah, and I purposely subscribe to (or sometimes have a dedicated "federation helper bot" account I run subscribe to) most of the most popular communities on the most popular instances so I can get a decent sampling of what's going on in the fediverse on the "All" feed. So I assume my storage usage is maybe a bit higher than what an "average" single-user instance may be...
Yeah it's not automated or anything, I just pop an incognito window and use it when there is a communitI think is worth seeing sometimes in "All" (or just for archiving purposes) but don't want to clutter "Subscribed". I may make something to auto-subscribe to communities meeting some criteria or something at some point in the future...
Do you also post stuff? I mean my instance is only about an hour old, but I've subscribed to some communities, yet I don't see the picture service consuming the S3 storage I've configured
Lemmy caches every thumbnail of every post for like a month or something using Pictrs, so that storage will eventually hit a sort of equilibrium and start growing much more slowly (only reflecting post/thumbnail volume during the cache time).
Between profile images, community banners/icons, post images etc. there are probably a few dozen images that will be sticking around for the long haul at the moment.
Your instance only caches thumbnails, so it won't take much space. The full images are served from the remote instance. So you basically only store whatever your users upload.
It won't scale linearly. A lot of those users will be subscribed to subs the instance is already replicating. It would only be new subs that would add to the growth.
Question if you know: does a lemmy instance have to be publically accessable to work? Like, if I make an instance on my homelab can the instance "fetch" content and serve it faster locally? Could I reply to a post and have others see it? Etc
At the end of the day the vast majority of what needs to be saved is text. If media content is embedded, the the server just has to save the path to the file not the file itself.
Feels like this will benefit from some sort of fuzzy deduplication in the pictrs storage. I bet there are a lot of similar pics in there. E.g. if one pic or a gif is very similar to another, say just different quality or size, or compression, it should keep only one copy. It might already do this for the same files uploaded by different people as those can be compared trivially via hashing, but I doubt it does similarity based deduplication.