Man, you’ll never know pain in computing until you have to work with disks that are misbehaving. Its really unlike everything else I’ve ever experienced. Some things can take forever while others fly by like usual. I’ve had a ‘ls’ command take over 10 seconds to run on an active disk. The worst part is that its inconsistent.
The most recent annoyance was when I rebalanced my btrfs cluster, so there was two weeks of my life waiting for that to run. Some things managed to run quickly and others failed to run at all. It was probably the week where I’ve had the most errors caused by programs in all. Repeating the same commands didn’t help at all, ‘ls’ a second time took no less time than the first, showing no advantage from caching.
The whole thing really ground everything to a halt, and anything using those disks was also slowed down. My set up didn’t help, especially due to the way I have all my data sitting on the same pool.
So far, I’ve been able to narrow down problems as disk problems when the same command responds differently depending on where it runs. ls would respond normally on the boot disk but take forever on the problem drive. I’ve found that to be pretty common with loaded disks. Hard disks seem to be particularly affected, my guess is that its due to their slower response times. Yet I’ve also had problems with ssds. Guess I’m really getting to the point where I’m good at breaking computers.
Its really impressive the number of things that expect a disk to be instant. Things just start erroring out and throwing hissy fits when they aren’t fed with data in enough time. It’d be interesting to see if/how it could be used as an attack vector, just going by the what I saw. Maybe I’ll have a look if I ever learn anything about infosec.