Update on federation issues with Lemmy.world
Update on federation issues with Lemmy.world
A while back I made a post about Lemmy.world federation and a lot has changed since then so I thought I'd do an update post.
TL;DR Lemmy.world posts, comments, votes coming to lemmy.nz have been delayed by a gradually increasing amount over recent months, peaking at about 4 days behind, but we should be back on track soon.
Background
Check the above post for some background, but TL;DR our server is in Auckland, NZ and Lemmy.world's server is in Helsinki, Finland. We are about as far apart as we can get from each other. Because Lemmy can currently only send one action at a time (Post, comment, vote), we can only accept about 4 or 5 actions per second as it takes around 1/5 of a second to make the round trip. Lemmy.world is now creating more than this on average, which means we have been falling behind more and more.
Pre-fetcher
After that post, things continued to get worse. While hanging out on Lemmy Matrix rooms discussing the problem, someone (I'm not actually sure of their Lemmy account) offered to set up a pre-fetcher. Roughly how it works is it monitors lemmy.world for new posts and comments, then sends a request to lemmy.nz to see that content. Lemmy.nz then requests it from lemmy.world because it doesn't exist.
This helps because when lemmy.world sends lemmy.nz a post or comment, lemmy.nz then needs to make other requests. e.g. it might not know about the user, so it needs to request the user from their home instance, it may need to generate a thumbnail, etc. By pre-fetching the posts, this means when lemmy.world sends it's normal outbound federation lemmy.nz already has it so can move straight on to the next one. We can't prefetch everything, notably votes, so for a long time we have had lemmy.world posts show with zero votes until the federation activities start coming through. Unfortunately comments from lemmy.world users on lemmy.nz posts can't be pre-fetched so they were still taking a long time to come through.
Prior to this pre-fetcher being turned on, Lemmy.nz was doing a lot worse than aussie.zone (as seen in the above post). After turning it on, within not too long we were in better shape than aussie.zone, but unfortunately we were both still getting worse.
This is the state we have been in until yesterday. Gradually things were getting worse and worse until we were at about 4 days behind lemmy.world, so if a lemmy.world user posted on one of our posts then it took 4 days to show up (you might have noticed this if you got a notification of a reply to your post or comment that then said it was from days ago).
Batcher
The same user who created the pre-fetcher was also working on a batching process. The basic idea was that instead of lemmy.world sending each item halfway across the world, instead you add an extra server that is hosted close to lemmy.world. Lemmy.world sends their federation items to that server, then that server collects them up into one batch, which gets sent to some software running on the lemmy.nz server. That software then unbundles them into separate pieces again then feeds them into lemmy.
The idea here is that you greatly reduce the lag. Lemmy.world gets a very quick response from the extra server and so can send the next activity almost straight away. The software that passes it to lemmy.nz is on the same server as lemmy.nz so communication is very quick. And collecting up the items into a batch for the trip across the world saves a lot of time in back and forths, so we can keep our goal of receiving things in the correct order while also not having to send one at a time. Receiving in the correct order is important, for example, if you accedentally downvoted then quickly changed it to an upvote, you wouldn't want another instance to receive the upvote first and then the downvote as it would show you downvoted instead of upvoted.
This batcher I have set up (with a lot of help!) over the weekend, and turned on yesterday morning once testing had been completed and I could get lemmy.world to redirect their lemmy.nz traffic to this new server (this was done through a change in something called the hosts file, long story short it tells your server "ignore what anyone says, lemmy.nz is actually over here").
In the last 24 hours or so we have got from 1.5 million activities behind to about 970k activities behind. This puts us at about 2.3 days behind now, a huge improvement!
Pictures
Graph of activities behind lemmy.world
Here we have lemmy.nz in yellow and aussie.zone in green. The hump is from a large number of activities generated on the lemmy.world side, they didn't need federating but the way this is measured means they show up until it's worked out that they aren't needed.
You can see a sharp fall after the batcher was turned on yesterday.
Graph of time behind lemmy.world
Same colours, lemmy.nz in yellow underneath and aussie.zone in green on top. This one shows how long the delay is, or more accurately it looks at the last activity that was received from lemmy.world and checks what time that activity actually happened. So if the last activity was a comment from 4 days ago, it shows 4 days here.
Conclusion
So that's a breakdown of everything that has happened the last few months, hopefully this new batching process will bring us back in line with lemmy.world. If you see anything weird happening, please let me know!
Also this is a shout out to all the people who made this happen, and who are building all sorts of tools that we use. We have a selection of different front-end websites you can access lemmy through, an automod, prefetcher, batcher, and all sorts of help from others! I definitely couldn't do this stuff without help 😆
And as always, if you have any questions or want more detail on any of this, feel free to ask!
Edit:
We are now up to date! Yay!