Skip Navigation

Lemmy World is down once again.

Ugh.

133

You're viewing a single thread.

133 comments
  • 5801ms, terrible

    • This is what happens when people don't understand federation.

      • Yep. Sitting on Lemmy.today browsing Lemmy.world posts right now...so I don't know. Really advice people to not have just one account. :)

      • Do you know of the site_aggregates federation TRIGGER issue lemmy.ca exposed?

        • No. Care to explain please?

          • No. Care to explain please?

            On Saturday July 22, 2023... the SysOp of Lemmy.ca got so frustrated with constant overload crashes they cloned their PostgreSQL database and ran AUTO_EXPLAIN on it. They found 1675 rows being written to disk (missive I/O, PostgreSQL WAL activity) for every single UPDATE SQL to a comment/post. They shared details on Github and the PostgreSQL TRIGGER that Lemmy 0.18.2 and earlier had was scrutinized.

            • You've become fixated on this issue but if you look at the original bug, phiresky says it's fixed in 0.18.3

              • The issue isn't who fixed it it, the issue is the lack of testing to find these bugs. It was there for years before anyone noticed it was hammering PostgreSQL on every new comment and post to update data that the code never read back.

                There have been multiple data overrun situations, wasting server resources.

                • But now Lemmy has you and Phiresky looking over the database and optimizing things so things like this should be found a lot quicker. I think you probably underestimate your value and the gratitude people feel for your insight and input.

            • In layman's terms please?

            • I don't know that it's a DB design flaw if we're talking about federation messages to other instances inboxes (which created rows of that magnitude for updates does sound like federation messages outbound to me). Those need to be added somewhere. On kbin, if installed using the instructions as-is, we're using rabbitmq (but there is an option to write to db). But failures do end up hitting sql still and rabbit is still storing this on the drive. So unless you have a dedicated separate rabbitmq server it makes little difference in terms of hits to storage.

              It's hard to avoid storing them somewhere, you need to be able to know when they've been sent or if there are temporary errors store them until they can be sent. There needs to be a way to recover from a crash/reboot/restart of services and handle other instances being offline for a short time.

              EDIT: Just read the issue (it's linked a few comments down) it actually looks like a weird pgsql reaction to a trigger. Not based on the number of connected instances like I thought.

              • (which created rows of that magnitude for updates does sound like federation messages outbound to me)

                rows=1675 from lemmy.ca here: https://github.com/LemmyNet/lemmy/issues/3165#issuecomment-1646673946

                It was not about outbound federation messages. It was about counting the number of comments and posts for the sidebar on the right of lemmy-ui to show statistics about the content. site_aggregates is about counting.

                • Yep I read through it in the end. Looks like they were applying changes to all rows in a table instead of just one on a trigger. The first part of my comment was based on reading comments here. I'd not seen the link to the issue at that stage. Hence the edit I made.

    • Latest, at the time of this comment: still over 4 SECONDS

You've viewed 133 comments.