How to avoid "things going wrong"? (Immutable distros?)
I'm working on starting up my first home server which I'm trying to make relatively foolproof and easily recoverable. What is some common maintenance people do to avoid dire problems, including those that accumulate over time, and what are ways to recover a server when issues pop up?
At first, I figured I'd just use debian with some kind of snapshot system and monitor changelogs to update manually when needed, but then I started hearing that immutable distros like microOS and coreOS have some benefits in terms of long term "os drift", security, and recovering from botched updates or conflicts? I don't even know if I'm going to install any native packages, I'm pretty certain every service I want to run has a docker image already, so does it matter? I should also mention, I'm going to use this as a file server with snapraid, so I'm trying to figure out if there will be conflicts to look out for there or with hardware acceleration for video transcoding.
First and foremost, backups. Back up everything and back up often. Immutability can’t do anything for critical hardware failure.
Issues happening on something only running container workloads isn’t common but I think it’s worth the extra little effort to reduce the risk even further. Fedora CoreOS or Flatcar is ideal since its declarative nature makes it easily reproducible. Fedora IOT can get you there too, but it doesn’t use ignition so you’ll be setting the server up manually.
Immutability is good. Declarative configuration is good. Manage cattle, not a pet.
I see a lot of posts like this and it's always people overthinking something they haven't tried to do yet.
So my advice is to just do it.
You may lose everything at some point in the future, Satan knows I have a few times, but because you've actually done it, you can do it again.
Now, because you're just thinking about doing it, it seems like a massive deal because you've not gone out and done it yet.
As for recommendations, I use a Proxmox VM with Debian and Docker. My Proxmox does backups, but my Docker compose is also a text document on my PC so I can recreate it all from scratch from that. I also have an idea what I did when I was learning how to do it, and have retained a good bit of that info so I could probably do it without either the backups or the Docker Compose, it would just take longer.
This! And, baby-steps: don't go about installing every app you see. Try backup strategies, put them to test (bring service down and up again with data from backup). Play, have fun.
Yeah I mean I get it because I was also thinking about self hosting for a long time and had a bunch of questions myself.
The problem is that a lot of the questions were not needed, and a bunch of the other questions I answered myself by just tooling around with the stuff.
Great comment btw, it's a good idea to have a list of the services you'd like to run, in order of importance z then work through it.
I did that then found ways to combine a bunch of services, to the point where I had multiple stand alone VMs that are now just one for Home Assistant and second for Plex and Docker
dire problems, including those that accumulate over time
That's not a thing. You create problems over time by experimening in what is, effectively, production load. If all you ever did was install any distro and kept it up to date - not much can break. Granted - shit happens, but it's incredibly rare.
As an example - I've set up my mail server in May 2019. Chose archlinux, because I never wanted to go through a big upgrade. The only exta software installed there is mail-server related. Direct from the repos. I've become confident enough that now there's a nightly cronjob to update the system with a hook to reboot if kernel or init gets updated.
In all those 5 a bit years I've had one issue where I hqd to revert a kernel update.
Another example is tang on an ubuntu server. This was at a previous workplace, but essentially it's a piece of software from the repos. Originally installed on 16.04, has gone without reprovisioning all the way to 22.04. I've now left the company, but I hear it's still running.
Upgrading an ubuntu desktop fleet with a myriad of custom software, on the other hand... let's just not talk about it.
If your stuff is all Docker then yeah, immutable makes sense as it makes the entire box declarative and immutable: you can get back the exact same operating Docker environment on the server, and then you can get back the exact same Docker workloads going with the Docker compose configurations.
If you ever need to run stuff you'd run on Debian, you can just shove it in a Debian container.
That said, if most of the stuff is containers, the risk of just the core Debian breaking is fairly low. Pick whatever is easiest for you to deal with based on your needs. Immutable distros have a bit of a learning curve.
I mostly use debian + docker or alpine + docker for this kind of thing (usually running as VMs on a proxmox server). Both are utterly reliable in my experience, though I've been tending more often toward alpine these days, because it's just so light and simple. I haven't tried any of the immutable systems, in the general spirit of why fix what's not broken. I don't even bother with snapshotting either, though that's mostly because I use some of the proxmox tools for backing up the VMs.
I used proxmox and have played a little with nix and guix, but simplest is just use debian, put /home on a separate logical partition from the system partition so you can reinstall the system without clobbering user files, and as people keep saying, backup early and often.
Be cautious with the answers when asking things like this. In discussion boards like here many are (rightfully) very excited about selfhosting and eager to share what they learned, but may ("just") have very few years of experience. There's a LOT to learn in this space, and it also takes a very long time to find out what is truly foolproof and easily recoverable.
First of, you want your OS do be disposable. And just as the OS should be decoupled, all the things you run should be decoupled from one another. Don't build complex structures that take effort to rebuild. When you build something, that is state. You want to minimize the amount of state you need to keep track of. Ideally, the only state you should have is your payload data. That is impossible of course, but you get the idea.
Immutable distros are indeed the way to go for long term reliability. And ideally you want immutability by booting images (like coreOS or Fedora IoT). Distros like microOS are not really immutable, they still use the regular package manager. They only make it a little more reliable by encouraging flatpak/docker/etc (and therefore cutting down on packages managed by the package manager) and a slightly more controlled update-procedure (making them transactional). But ultimately, once your system is in some way defect, the package manager will build on top of that defect. So you keep carrying along that fault. In that sense it is not immune to "os drift" (well expressed), it is just that drift happens slower. "Proper" immutable distros that work with images are self healing, because once you rebase to another image (could be an update or a completely different distribution, doesn't matter), you have a fresh system that doesn't have anything to do with the previous image. Furthermore the new image does not get composed on your computer, it gets put together upstream. You only run the final result with you know is bit for bit what was tested by the distro maintainers. So microOS is like receiving a puzzle and a manual how to put it together, and gluing it in a frame is the "immutability". Updates are like losening the glue of specific pieces and gluing in new ones. In coreOS you receive the glued puzzle and do not have to do anything yourself. Updates are like receiving an entire new glued puzzle. This also comes down to the state idea: some mutable system that was set up a long time ago and even drifted a bit has a ton of state. A truly immutable distro has a very tiny state, it is merely the hash of the image you run, and your changes in /etc (which should be minimal and well documented!).
Also you want to steer clear from things like Proxmox and generally LXC-containers and VMs. All these are not immutable (let alone immune to drift), and you only burden yourself with maintaining more mutable things with tons of state.
Docker is a good way to run your stuff. Just make sure to put all the persistent data the belongs together in subfolders of a subvolume and snapshot that, and then backup these snapshots. That way you ensure that you meet the requirements for the data(base)'s ACID properties to work. Your "backups" will be corrupted otherwise, since they would be a wild mosaic from different points in time. To be able to roll back cleanly if an update goes wrong, you should also snapshot the image hash together with the persistent data. This way you can preserve the complete state of a docker service before updating. Here you also minimize the state: you only have your payload data, the image hash and your compose.yml.
The answer to your overarching question is not "common maintenance procedures", but "change management processes"
When things change, things can break. Immutable OSes and declarative configuration notwithstanding.
OS and Configuration drift only actually matter if you've got a documented baseline. That's what your declaratives can solve. However they don't help when you're tinkering in a home server and drifting your declaratives.
I’m pretty certain every service I want to run has a docker image already, so does it matter?
This right here is the attitude that's going to undermine everything you're asking. There's nothing about containers that is inherently "safer" than running native OS packages or even building your own. Containerization is about scalability and repeatability, not availability or reliability. It's still up to you to monitor changelogs and determine exactly what is going to break when you pull the latest docker image. That's no different than a native package.