Say you decide to self-host a Lemmy instance. When you create that instance, do you immediately need to download and store all the data that has ever been posted to all federated Lemmy instances? Or perhaps you only need to download and store everything that is posted to the federated Lemmy instances from that point forward? Or better yet, do you only store what the users on that instance do (i.e. their posts, and posts to the communities hosted on that instance)?
When you create that instance, do you immediately need to download and store all the data that has ever been posted to all federated Lemmy instances?
Run my own instance. @Candelestine@lemmy.world is right but there are more details. Federation is not a "sync." When your instance needs to fetch from another instance it will, but it does not get history. You can get a specific comment or post from any time however.
Or perhaps you only need to download and store everything that is posted to the federated Lemmy instances from that point forward?
This is not by default either. Only communities that your users subscribe to will be updated by their "origin" instances.
Or better yet, do you only store what the users on that instance do (i.e. their posts, and posts to the communities hosted on that instance)?
This does happen, but it also stores what your users do on remote instances as well as "copies" of what they interact with. Images (currently the only media hosted by lemmy servers) are linked to thier "origin" as well. So you are storing text of posts and comments.
When your instance needs to fetch from another instance it will
Meaning it will only fetch what is being actively looked at?
Only communities that your users subscribe to will be updated by their “origin” instances.
So when an external community is subscribed to from an account located on your located instance, from the point of subscribing forward, your local instance will begin downloading every single post that will ever be made to that subscribed communty, regardless of who posted it?
Or better yet, do you only store what the users on that instance do (i.e. their posts, and posts to the communities hosted on that instance)?
This does happen, but it also stores what your users do on remote instances as well as “copies” of what they interact with. Images (currently the only media hosted by lemmy servers) are linked to thier “origin” as well. So you are storing text of posts and comments.
This is the main point of confusion to me. From my current understanding, it feels as if it contradicts what you had previously said:
Only communities that your users subscribe to will be updated by their “origin” instances.
If it's already pulling in all posts and comments on that community, what use is specifically storing anything that the users do on that community? Would it not be already stored?
It works a lot like like email between instances. Let’s call your self hosted instance “A” and the popular remote instance “B.”
User on A searches for “poodles” and finds a community !poodles@B. When they click the search results: A sends B mail saying “send me the last 10 posts for poodles.” B sends A mail with the posts and the user sees the posts, but none have comments.
If nothing else happens then those 10 posts will just hang out doing nothing on A, but if the user clicks subscribe then A sends another mail to B saying “my user wants to follow poodles.” B replies saying “cool, I’ll send you everything from poodles now.” Now, anything a post or comment happens B checks lots list of subscribing instances and sends copies of them.
If user on A comments on !poodles@B or posts, it creates it on A but sends a mail to B saying “here is some new stuff for poodles!”
Your lemmy instance starts off blank. Then you create local communities and maybe post there. To connect to other communities, you search for the URL of that community using the search function. At that point, it pulls the current posts but none of the comments, and if you subscribe then you start seeing the comments on the posts.
One thing, joining a bunch of remote instances takes some time at first since it's a manual process. Soon you have a really solid timeline though.