Home server - power saving HDDs with caches?

PaulCa · 6 May 2022 at 14:46

I have 5 HDDs currently in a USB3.0 enclosure. I find power managing them is difficult. Their outright performance is poor, as expected from bulk spinning disks.

I am pondering options to address both the power management AND the performance by using a big fat SSD cache and clever logical filesystem arrangement so the HDDs can spend more time spun down entirely.

I looked into LVM-Cache. Thankfully I'm used to low level Linux shell, so I was able to set up a tutorial sized VM and build the array and cache and it worked. Although it was rather pointless as all of it's disks where virtual disks on my Windows PC which has 32Gb of RAM and thus ALL of those disks where probably in RAM anyway. Silly through put speeds across the board etc.

The theory is, a large, 1-2Tb SSD or a RAID 1 pair is assigned as the cache pool for all of the spinning disks. It might need divided up, I'm not sure. Anything which is intermittently pinging stuff on those drives waking them up will end up with those files cached pretty quickly. With writeback caching (known risk) even writes will go into the cache and not wake up the spinning metal until a scheduled time when all disks can wake up and sync.

Of course if I open a movie that hasn't been acccessed in, whenever, and I will need to wait on the metal spinning up.

I haven't even fully looked into whether this will work or not.

I thought I would ask in here if any enterprise folks familiar with this "high speed cache pool for slow speed disks" architecture and could give me tips, hints or other options entirely.

I'd like to steer away from "off the shelf" "canned" NAS OSes, I expect, as I have the Linux shell knowledge they will just get in the way and annoy me.

PaulCa · 7 May 2022 at 11:16

I'm not that worried about that use case. The only way to avoid that is to run the disks 24/7 or pre-cache things ahead of time.

An example of how annoying this can be, I have a Raspberry PI saving a photo every 10 minutes for a time lapse. This seems to proceed fine for a while, then, I imagine the disk spins down because Linux write cache doesn't actually write to it every 10 minutes and it's set to spin down after 15. However, the next photo comes in over SFTP and linux wakes up the drive, but the sftp process blocks in kernel space until that completes, by which time the client has timed out. Seems I lost about 50% of photos yesterday. Although they can be recovered off the PI for up to a week.

So I set standby OFF on them all and put my 4 minute cron back to work pinging one disk to keep the enclosure from shutting down.

Putting the machine to sleep completely isn't really an option, it has got the dozen docker services running 24/7. Splitting things into a tiny solid state raspberry PI or Nuc running the 24/7 stuff and just a NAS for the disks. That might allow me to shut down the NAS part when not in use and use wake on LAN when it's needed. Interesting option.

I suppose I can try the LVM approach. I have at least one spare disk and I can probably make room on an SSD for an LVM cache volume. Run a few tests to see if that disk can stay spun down while reads and writes happen in the cache. That is the big question I feel. If LVM still wakes up the target volume drives for a read/write to the cache pool then it just won't work this way. There are other filesystem options, but I get the feeling they are not exactly solid and maybe a bit exotic, like overlay-fs and some user space FUSE dynamic filesystems. Just sounds like a 3am problem waiting to happen though.

PaulCa · 18 May 2022 at 21:23

I had an old 3rd gen intel USFF PC returned to me. It had a 120Gb SSD. So I put a pair of old 2.5" 500Gb spinners into a dual enclosure. Replicated the setup and tested.

Results.
Wow. I dumped a 500Mb Go pro file on it and it ran at 100Mbyte/s for the whole file basically. The copy completed and on the server side the cache said it was 50% "dirty". Slowly falling. So in "writeback" cache mode it writes immediately, but asynchronously.

Pulling back recently uploaded items initially gave odd results, like 30Mb/s. Then I check and it was still 25% dirty cache. 50% of the drive speed is allocated to syncing the cache and 50% to serving? Speculation.
After the cache settled, back to 60Mb+

So on performance, it works. Undeniably. When a part of a movie exists in the cache you can hold down -> arrow and it will scan/scrub live. When it hits a part on the spinner it mutes the live playback and updates the time OSD only until you let go. Even that feels smoother.

Power saving, well, I tried to standby both drives while the "dirty" value was 0%, but trying to access a random file, caused client and server (smbd) to lock up. Even lvmdisplay locked up. Reboot needed.

Again I expect a lot of this is to do with USB enclosures messing with drive power saving settings in the background and forcing them to standby causes the enclosure to get out of sync and ignore startup requests. Speculations.

PaulCa · 18 May 2022 at 21:24

That was 400Gb volume on a 1Tb group on USB, backed by a 55Gb SSD cache.

PaulCa · 28 May 2022 at 20:15

Well, it turns out. If I had of checked properly. The disk enclosure was running on USB2. Once I fixed that with a new cable, the HDD's alone can saturate the 1Gb link.

I still intend to investigate if the LVM cacheing can, or cannot be used for power saving, powering off the backing drives for long periods.