Skip Navigation

Self hosting is hard. How do you overcome?

Not exactly self hosting but maintaining/backing it up is hard for me. So many “what if”s are coming to my mind. Like what if DB gets corrupted? What if the device breaks? If on cloud provider, what if they decide to remove the server?

I need a local server and a remote one that are synced to confidentially self-host things and setting this up is a hassle I don’t want to take.

So my question is how safe is your setup? Are you still enthusiastic with it?

76 comments
  • Right now I just play with things at a level that I don't care if they pop out of existence tomorrow.

    If you want to be truly safe (at an individual level, not an institutional level where there's someone with an interest in fucking your stuff up), you need to make sure things are recoverable unless 3 completely separate things go wrong at the same time (an outage at a remote data centre, your server fails and your local backup fails). Very unlikely for all 3 to happen simultaneously, but 1 is likely to fail and 2 is forseeable, so you can fix it before the 3rd also fails.

    • Exactly right there with the not worrying. Getting started can be brutal. I always recommend people start without worrying about it, be okay with the idea that you're going to lose everything.

      When you start really understanding how the tech works, then start playing with backups and how to recover. By that time you've probably set up enough that you are ready for a solution that doesn't require setting everything up again. When you're starting though? Getting it up and running is enough

      • Gonna just stream of consciousness some stuff here:

        Been thinking lately, especially as I have been self-hosting more, how much work is just managing data on disk.

        Which disk? Where does it live? How does the data transit from here to there? Why isn't the data moving properly?

        I am not sure what this means, but it makes me feel like we are missing some important ideas around data management at personal scale.

  • My profesional experience is in systems administration, cloud architecture, and automation, with considerations for corporate disaster recovery and regular 3rd party audits.

    The short answer to all of your questions boil down to two things;

    1: If you're going to maintain a system, write a script to build it, then use the script (I'll expand this below).

    2: Expect a catastrophic failure. Total loss, server gone. As such; backup all unique or user-generated data regularly, and practice restoring it.

    Okay back to #1; I prefer shell scripts (pick your favorite shell, doesn't matter which), because there are basically zero requirements. Your system will have your preferred shell installed within minutes of existing, there is no possibility that it won't. But why shell? Because then you don't need docker, or python, or a specific version of a specifc module/plugin/library/etc.

    So okay, we're gonna write a script. "I should install by hand as I'm taking down notes" right? Hell, "I can write the script as I'm manually installing", "why can't that be my notes?". All totally valid, I do that too. But don't use the manually installed one and call it done. Set the server on fire, make a new one, run the script. If everything works, you didn't forget that "oh right, this thing real quick" requirement. You know your script will bring you from blank OS to working server.

    Once you have those, the worst case scenario is "shit, it's gone... build new server, run script, restore backup". The penalty for critical loss of infrastructure is some downtime. If you want to avoid that, see if you can install the app on two servers, the DB on another two (with replication), and set up a cluster. Worst case (say the whole region is deleted) is the same; make new server, run script, restore backups.

    If you really want to get into docker or etc after that, there's no blocker. You know how the build the system "bare metal", all that's left is describing it to docker. Or cloudformation, terraform, etc, etc, etc. I highly recommend doing it with shell first, because A: You learn a lot about the system and B: you're ready to troubleshoot it (if you want to figure out why it failed and try to mitigate it before it happens again, rather than just hitting "reset" every time).

  • I started as more "homelab" than "selfhosted" as first - so I was just stuffing around playing with things, but then that seemed sort of pointless and I wanted to run real workloads, then I discovered that was super useful and I loved extracting myself from commercial cloud services (dropbox etc). The point of this story is that I sort of built most of the infrastructure before I was running services that I (or family) depended on - which is where it can become a source of stress rather than fun, which is what I'm guessing you're finding yourself in.

    There's no real way around this (the pressure you're feeling), if you are running real services it is going to take some sysadmin work to get to the point where you feel relaxed that you can quickly deal with any problems. There's lots of good advice elsewhere in this thread about bit and pieces to do this - the exact methods are going to vary according to your needs. Here's mine (which is not perfect!).

    • I'm running on a single mini PC & a Synology NAS setup for RAID 5
    • I've got a nearly identical spare mini PC, and swap over to it for a couple of weeks (originally every month, but stretched out when I'm busy). That tests my ability to recover from that hardware failure.
    • All my local workloads are in LXC containers or VM's on Proxmox with automated snapshots that are my (bulky) backups, but allow for restoration in minutes if needed.
    • The NAS is backed up locally to an external USB that's not usually plugged in, and to a lower speced similar setup 300km away.
    • All the workloads are dockerised, and I have a standard directory structure and compose approach so if I need to upgrade something or do some other maintenance of something I don't often touch, I know where everything is with out looking back to the playbook
    • I don't use a script or Terrafrom to set those up, I've got a proxmox template with docker and tailscale etc installed that I use, so the only bit of unique infrastructure is the docker compose file which is source controlled on Forgejo
    • Everything's on UPSs
    • A have a bunch of ansible playbooks for routine maintenance such as apt updates, also in source control
    • all the VPS workloads are dockerised with the same directory structure, and behind NGINX PM. I've gotten super comfortable with one VPS provider, so that's a weakness. I should try moving them one day. They are mostly static websites, plus one important web app that I have a tested backup strategy for, but not an automated one, so that needs addressed.
    • I use a local and an external UptimeKuma for monitoring, enhanced by running a tiny server on every instance that just exposes a disk free and memory free api that can be consumed by Uptime.

    I still have lots of single points of failure - Tailscale, my internet provider, my domain provider etc, but I think I've addressed the most common which would be hardware failures at home. My monitoring is also probably sub-par, I'm not really looking at logs unless I'm investigating a problem. Maybe there's a Netdata or something in my future.

    You've mentioned that a syncing to a remote server for backups is a step you don't want to take, if you mean managing your own is a step you don't want to take, then your solutions are a paid backup service like backblaze or, physically shuffling external USB drives (or extra NASs) back and forth to somewhere - depending on what downtime you can tolerate.

  • TrueNAS scale helps a lot, as it makes many popular apps just a few clicks away. Or for more power-users, stuff like the linux cockpit also really helps.

    To directly answer your questions...

    • In the event of DB corruption (which hasn't happened to me yet) I would probably rollback that app to the previous snapshot. I suspect that TrueNAS having ZFS as an underlayment may help in this regard, as it actually detects bitrot and bitflips, which may be the underlying cause of such corruption.
    • In the case where a device breaks... if it's a hard drive that broke, I just pop in a new one and add it to the degraded mirror set. If it's "something else" that broke, my plan is to pop one of the mirror shards into a spare PoS computer (as truenas scale runs on common x86 hardware) and deal with the ugly-factor until I repair or replace the bigger issue.
    • The only way to defend against a cloud provider is replication, so plan accordingly if that is a concern.
    • If by "sync'd confidentially" you mean encrypted in transit, I'm pretty sure that TrueNAS has built in replication over SSH. If you meant TNO, then you probably want to build your setup over a cryfs filesystem so no cleartext bits hit the cloud, although on second thought... it's not really meant for multi-master synchronization... my case just happens to fit it (only one device writes)... so there is probably a better choice for this.
    • Setup is a hassle? Yes... just be sure that you invest that hassle into something permanent, if not something like a TrueNAS configuration (where the config gets carried along for the ride with the data) then maybe something like ansible scripts (which is machine-readable documentation). Depending on your organization skills, even hand-written notes or making your own "meta" software packages (with only dependencies & install scripts) might work. What you don't want to do is manually tweak a linux install, and then forget what is "special" about that server or what is relying on it.
    • How safe is my setup? Depends... I still need to start rotating a mirror shard as an offsite backup, so not very robust against a site disaster; Security-wise... I've got a lot of private bits, and it works for my needs... as far as I know :)
    • Still enthusiastic? I try to see everything as both temporary and a work-in-progress. This can be good in ways because nothing has to be perfect, but can be bad in ways that my setup at any given time is an ugly amalgamation of different experimental ideas that may or may not survive the next "iteration". For example, I still have centos 7 & python 2 stuff that needs to be migrated or obsoleted.
  • Automate as much as possible. I rsync to both an online and home NAS for all of my hosted stuff, both at home and in the cloud. Updates for the OS and low level libraries are automated. The other updates are generally manual, that allows me to set aside time for fixing problems that updates might cause while still getting most of the critical security updates. And my update schedules are generally during the day, so that if something doesn't restart properly, I can fix it.

    Also, whenever possible I assume a fair amount of time for updates, far beyond what it should actually take. That way I won't be rushed to fix the problem and end up having to revert to a backup and find time later to redo it. Then most of the time I have extra time for analyzing stats to see if I can improve performance or save money with optimizations.

    I've never had a remote provider just suddenly vanish though I use fairly well known hosts. And as for local hardware, I just have to do without until I can buy a replacement. Or if it's going to be some time, I do have old hardware that I could set up as a makeshift, temporary replacement like old desktop computers and some hardware that I use for experimenting like my Le Potato that isn't powerful enough for much, but ok for the short term.

    And finally I've been moving to more container-based setups that are easier to get up and running again. I've been experimenting with Nomad, Docker Swarm, K3s, etc., along with Traefik and some other reverse proxies so o can keep the workers air-gapped for security.

  • I try to balance things between what I find enjoyable/ worth the effort, and what ends up becoming more of a recurring headache

  • I have limited my usecases for selfhosting and thrown money at the problem. The usecases are:

    • image hosting and sharing with the family
    • backups of our family computers
    • digital file hosting
    • media hosting

    The last one is expendable. The first three are backed up into the cloud. I use a Synology, thus throwing money at the problem. Their cloud backup just works.

    Edit: use cases I do not self host are a mail server for example. The stress outweighs the 12€/year I pay for the service.

  • It doesn't have to be hard - you just need to think methodically through each of your services and assess the cost of creating/storing the backup strategy you want versus the cost (in time, effort, inconvenience, etc) if you had to rebuild it from scratch.

    For me, that means my photo and video library (currently Immich) and my digital records (Paperless) are backed up using a 2N+C strategy: a copy on each of 2 NASes locally, and another copy stored in the cloud.

    Ditto for backups of my important homelab data. I have some important services (like Home Assistant, Node-RED, etc) that push their configs into a personal Gitlab instance each time there's a change. So, I simply back that Gitlab instance up using the same strategy. It's mainly raw text in files and a small database of git metadata, so it all compresses really nicely.

    For other services/data that I'm less attached to, I only backup the metadata.

    Say, for example, I'm hosting a media library that might replace my personal use of services that rhyme with "GetDicks" and "Slime Video". I won't necessarily backup the media files themselves - that would take way more space than I'm prepared to pay for. But I do backup the databases for that service that tells me what media files I had, and even the exact name of the media files when I "found" them.

    In a total loss of all local data, even though the inconvenience factor would be quite high, the cost of storing backups would far outweigh that. Using the metadata I do backup, I could theoretically just set about rebuilding the media library from there. If I were hosting something like that, that is...

  • @iso@lemy.lol I think we need to accept that unless self-hosting is your full time job, things can and will break. At some point you have to accept it and let it go.

    Finally I know when I die, my spouse won't take care of my homelab and servers, all of it will go to the recycler.

  • Snapshots are the first line of defense for recovery from software errors. For hardware use ZFS raid.

    That still isn't a proper backup. Have a separate backup that can not easily be destroyed.

76 comments