In some circumstances, a particular pattern of drive activity can result in the drive head being repeatedly parked and un-parked in short intervals, possibly* resulting in excess wear on the drive. Apparently* the drive head parking is recorded in the S.M.A.R.T. "Load Cycle Count" attribute.
I have two WD Red drives in my NAS, one for live data and one for backup. The latter drive is basically unused most of the day until scheduled backup jobs kick in and those jobs are all clustered together. I already unmount the backup filesystems when the jobs are not active (I wrote about this in mount-on-demand backups).
Inspecting the S.M.A.R.T. attributes was surprising:
|drive||power on hours||load cycle count|
It certainly looks like my backup drive has a much higher load cycle count than you might expect for a mostly-idle drive. I checked the attributes again 24 hours later and the regular drive had incremented by a single cycle, whilst the backup drive went up by 56.
There are some official tools from Western Digital that makes an adjustment to the idle timeouts for head parking on the drives. There's also an unofficial tool idle3tcl to do the same, which is packaged in Debian. The unofficial tool let you set and fetch a particular value from the drive firmware. I don't know for sure* that the official tool does exactly the same thing, and nothing else. One advantage of the unofficial tool is it lets you read the value as well as write it.
I tried the unofficial tool to get the drive's default value, which was 0x8a, and bump it to the maximum of 0xff. I then tried the official tool then fetched the value again: interestingly the official tool had reset the value back to 0x8a. I haven't managed to assess the impact of these changes on the attrition rate yet because I need to perform a cold boot for the change to take effect and that isn't convenient just now.
My plan is to try and disable the feature completely via the unofficial tool.
If that rectifies the issue I will then investigate changing the power
management settings by hand at backup start/end time, perhaps via
( The problem with these kind of issues is there is precious little in the way of reliable documentation as to the real issue, real drive behaviour, etc. I've marked a few sections of this blog post with * asterisks to indicate where we are having to make informed guesses. )