Is there an open source package that the Internet Archive runs? What is it? I assume sites like archive.is run the same. I'd like to know if I can also run it for self-hosted archiving.
i dont think op is looking to mirror archive.org, my take was that they wanted someyhing like archive.org but selfhosted and for personal / small-scale use
It seems like it's written in Python too, which means I can maintain it if need be.
Oh boy I wish I had set this up many years ago. I wouldn't have to resort to scouring !antiquememesroadshow@lemmy.world for the top quality memes of the past when I need them...
On a far side of the moon note, I wonder if ActivityPub could be used to federate multiple archiveboxes to create a more resilient Internet Archive alternative. 🤔 Then integrate that with Lemmy to autoarchive links from posts. Aaand lemmy.world ran out of disk space. 🤣
I believe they used heritrix at one point. The important bit is that there is a special archive format that they use which is a standard. There are several tools that support it (both capturing to it and viewing it) - it allows for capturing a website in a 'working' condition with history or something. I'm a bit fuzzy on it since it's been some time since I looked into it.
Kind of. Linkwarden seems to save as PDF. That's better than nothing, however preserving a functional copy of the pages would be better. Archivebox seems to do this.