Selfhosted @lemmy.world mhz @lemm.ee 11 mo. ago

Seeking advice about BTRFS RAID

Hi, I got a tiny Lenovo M720Q (i5-8400T / 8RAM / 128NVME / 1Tb 2,5" HDD) that I want to set as my home server with the ability to add 2 more drives (for RAID5 if possible) later using its two USB 3.1 Gen 2 (10gbps).

The OS (debian 12 + docker) will be exclusive to the nvme, I will mostly use 40/128GB of its capacity with no idea how to make use of the rest.
My data (medias, documents and ISO files) will resides on the HDD pool, while keeping a copy of my docs on my home pc.

I read a bit about BTRFS RAID I even experimented with it in a VM and it really got me interested in using it because of its flexibility of balancing between raid levels and the hot swapping of unequally sized drives in both stripped and mirrored arrays. However, most of what I read online predate kernel 6,2 (which improved BTRFS RAID56 reliability). So, Here I am asking if anyone here is using BTRFS RAID and if it is stable enough to use on a mostly idle server or should I stick with LVM instead. What good practices to do or bad ones to avoid?

Thank you.

You're viewing a single thread.

14 comments

I understood that software raid on USB is dangerous as sometimes the drives can get offline for a few seconds due to current fluctuations and then will lose the sync. Maybe it's ok for files that don't get accessed too often, like video file backups
- In my experience there are often issues with sata ssd over USB, but slower HDD seem to work fine. With btrfs I would set up a regular scrubbing job to find and fix possible data errors automatically.
  
  With btrfs I would set up a regular scrubbing job to find and fix possible data errors automatically.
  
  This only works for minor errors caused by tiny physical changes. A buggy USB drive dropping out and losing writes it claimed to have written can kill a btrfs (sometimes unfixably so) especially in a multi-device scenario.
  
  How so if the second drive in the raid1 retains a working copy and the checksum is correct? I have had USB drives drop out on me before for longer periods and it was never a problem after reconnecting them and doing a scrub.
  
  But of course raid is not a backup, so that is only the first line of defense against data loss 😉
  
  The problem is on the logic level. What happens when a drive drops out but the other does not? Well, it will continue to receive writes because a setup like this is tolerant to such a fault.
  
  Now imagine both connections are flakey and the currently available drive drops out aswell. Our setup isn't that fault tolerant, so FS goes read-only and throws IO errors on read.
  But, as the sysadmin takes a look, the drive that first dropped out re-appears, so they mount the filesystem again from the other drive and continue the workload.
  
  Now we have a split brain. The drive that dropped out first missed the changes that happened to the other drive. When the other drive comes back, they'll have diverged states. Btrfs can't merge these.
  
  That's just one possible way this can go wrong. A simpler one I allured to is a lost write where a drive will claim to have permanently written something but if power was cut at that moment and the same sector read upon restart, it will not actually be the new data. If that happens to all copies of a metadata chunk, good bye btrfs.
  
  A buggy USB drive dropping out and losing writes it claimed to have written can kill a btrfs
  
  Mixing USB and SATA drives sounds like a very bad idea, I'm holding on using an array of drives connected using USB. hank you for your comment
  
  It's not the mixing that's bad, it's using USB in any kind of multi-device setup or even using USB drives for active workloads at all.

You've viewed 14 comments.