I'm curious what this data would look like collated by drive birth date rather than (or in 3D addition to) age. I wouldn't use that as the "primary" way to look at things, but it could pop some interesting bits. Maybe one of the manufacturers had a shipload of subpar grease? Slightly shittier magnets? Poor quality silicon? There's all kinds of things that could cause a few months of hard drive manufacture to be slightly less reliable…
(Also: "Accumulated power on time, hours:minutes 37451*:12, Manufactured in week 27 of year 2014" — I might want to replace these :D — * pretty sure that overflowed at 16 bit, they were powered on almost continuously & adding 65536 makes it 11.7 years.)
Over the past couple of years, I've been side hustling a project that requires buying ingredients from multiple vendors. The quantities never work out 1:1, so some ingredients from the first order get used with some from a new order from a different vendor. Each item has its own batch number which when used together for the final product yields a batch number on my end. I logged my batch number with the batch number for each of the ingredients in my product. As a solo person, it is a mountain of work, but nerdy me goes to that effort.
I'd assume that a drive manufacture does similar knowing which batch from which vendor the magnets, grease, or silicon all comes from. You hope you never need to use these records to do any kind of forensic research, but the one time you do need it makes a huge difference. So many people doing similar products that I do look at me with a tilted head while their eyes go wide and glaze over as if I'm speaking an alien language discussing lineage tracking.
I think it's helpful to put on our statistics hats when looking at data like this... We have some observed values and a number of available covariates, which, perhaps, help explain the observed variability. Some legitimate sources of variation (eg, proximity to cooling in the NFS box, whether the hard drive was dropped as a child, stray cosmic rays) will remain obscured to us - we cannot fully explain all the variation. But when we average over more instances, those unexplainable sources of variation are captured as a residual to the explanations we can make, given the avialable covariates. The averaging acts a kind of low-pass filter over the data, which helps reveal meaningful trends.
Meanwhile, if we slice the data up three ways to hell and back, /all/ we see is unexplainable variation - every point is unique.
This is where PCA is helpful - given our set of covariates, what combination of variables best explain the variation, and how much of the residual remains? If there's a lot of residual, we should look for other covariates. If it's a tiny residual, we don't care, and can work on optimizing the known major axes.
Exactly. I used to pore over the Backblaze data but so much of it is in the form of “we got 1,200 drives four months ago and so far none have failed”. That is a relatively small number over a small amount of time.
On top of that it seems like by the time there is a clear winner for reliability, the manufacturer no longer makes that particular model and the newer models are just not a part of the dataset yet. Basically, you can’t just go “Hitachi good, Seagate bad”. You have to look at specific models and there are what? Hundreds? Thousands?
> On top of that it seems like by the time there is a clear winner for reliability, the manufacturer no longer makes that particular model and the newer models are just not a part of the dataset yet.
That's how things work in general. Even if it is the same model, likely parts have changed anyway. For data storage, you can expect all devices to fail, so redundancy and backup plans are key, and once you have that set, reliability is mostly just a input into your cost calculations. (Ideally you do something to mitigate correlated failures from bad manufacturing or bad firmware)
(with a tinfoil hat on) I'm convinced that Backblaze is intentionally withholding and ambiguating data to prevent producing too-easily understood visualization that Seagate is consistently the worst of the last 3 remaining drive manufacturers.
Their online notoriety only started after a flooding in Thailand that contaminated all manufacturing clean room for spindle motors in existence, causing bunch of post-flood ST3000DM001 to fail quickly, which probably incentivized enough people for the Backblaze stat tracking to gain recognition and to continue to this date.
But even if one puts aside such models affected by the same problem, Seagate drives always exhibited shorter real world MTBF. Since it's not in interest of Backblaze or anyone to smear their brand, they must be tweaking data processing to leave out some of those obvious figures.
Since it's not in interest of Backblaze or anyone to smear their brand
It is if they want to negotiate pricing; and even in the past, Seagates were usually priced lower than HGST or WD drives. To me, it looks like they just aren't as consistent, as they have some very low failure rate models but also some very high ones; and naturally everyone will be concerned about the latter.
Honestly, at 8 years, I'd be leaning towards dirty power on the user's end. For a company like BackBlaze, I'd assume a data center would have conditioned power. For someone at home running a NAS with the same drive connected straight to mains, they may not receive the same life span for a drive from the same batch. Undervolting when the power dips is gnarly on equipment. It's amazing to me how the use of a UPS is not as ubiquitous at home.
I work there. Can't go into much detail, but we have absolutely had various adventures with power and cooling that were entirely out of our control. There was even an "unmooring" event that nearly gave us a collective heart attack, which I'll leave you to guess at :)
why people continue to misunderstand this befuddles me. If you bought a budget PSU, then who knows what the voltages really are coming down the +3/+5v lines. You hope they are only +3/+5, but what happens when the power dips. Is the circuitry in the bargain priced PSU going to keep the voltages within tolerance, or do they even have the necessary caps in place to handle the slightest change in mains? we've seen way too meany tear downs to show that's not a reliable thing to bank your gear on.
> It's amazing to me how the use of a UPS is not as ubiquitous at home.
Most users don't see enough failures that they can attribute to bad power to justify the cost in their mind. Furthermore, USPes are extremely expensive per unit of energy storage, so the more obviously useful use case (of not having your gaming session interrupted by a power outage) simply isn't there.
UPSes are a PITA. I have frequent enough outages that I use them on all of my desktops, and they need a new battery every couple years, and now I'm reaching the point where the whole thing needs replacement.
When they fail, they turn short dips, which a power supply might have been able to ride through into an instant failure, and they make terrible beeping at the same time. At least the models I have do their test with the protected load, so if you test regularly, it fails by having an unscheduled shutdown, so that's not great either. And there's not many vendors and my vendor is starting to push dumb cloud shit. Ugh.
Sounds like you have some APS model. I had those issues, and switched to Cyberpower. The alarm can be muted and the battery lasts for many years.
A UPS is a must for me. When I lived in the midwest, a lightening strike near me fried all my equipment, including the phones. I now live in Florida and summer outages and dips (brownouts) are frequent.
Many years ago I had the same thing happen - actually came in the phone line, fried my modem and everything connected to the motherboard. More recently I had lightning strike a security camera - took out everything connected to the same network switch, plus everything connected to the two network switches one hop away. Also lit up my office with a shower of sparks. Lightning is no joke.
I've got Cyberpowers actually. The alarm can be muted, but it doesn't stay muted. Especially when the battery (or ups circuitry) is worn out so a power dip turns into infinite beeping. But also if the computer is turned off.
Yes this is fairly standard in manufacturing environments. builds of material and lot or down to serial # level are tracked for production of complex goods.
I have a 13 years old NAS with 4x1TB consumer drives with over 10y head flying hours and 600,000 head unloads. Only 1 drive failed at around 7 years. The remaining 3 are still spinning and pass the long self test. I do manually set the hdparm -B and -S to balance head flying vs unloads, and I keep the NAS in my basement so everything is thermally cool.
I'm kinda of hoping the other drives will fail so I can get a new NAS but no such luck yet :-(
I built my home NAS in 2017 the two original drives were replaced after developing bad blocks (4 and 5 years, respectively). The two expansion drives (2018, 2021) are still fine.
I built a NAS for a client, which currently has 22 drives (growing bit by bit over the years) in it (270 GB of raw capacity) and since 2018 has lost only 3 drives.
When I am projecting prices I tend to assume a 5 year life for a consumer hard drive. I do wonder from this data and the change in purchasing from backblaze if the enterprise class drives might pay for their extra price if they survive out to more like 9 years. 20% extra cost per TB verses about 30%+ more life time. They do tend to consume a bit more power and make more noise as well. I wish they had more data on why the drives were surviving longer, if its purchasing in palettes there isn't a lot we can do, but if its that enterprise drives are a lot better than NAS or basic consumer drives then that we compare cost wise.
Personal anecdote - I would say (a cautious) yes. Bought 3 WD hard drives (1 external, 2 internal, during different time periods; in the last 10+ years) for personal use and 2 failed exactly after the 5 year warranty period ended (within a month or so). One failed just a few weeks before the warranty period, and so WD had to replace it (and I got a replacement HDD that I could use for another 5 years). That's good engineering! (I also have an old 500GB external Seagate drive that has now lasted 10+ years, and still works perfectly - probably an outlier).
That said, one thing that I do find very attractive in Seagate HDDs now is that they are also offering free data recovery within the warranty period, with some models. Anybody who has lost data (i.e. idiots like me who didn't care about backups) and had to use such services knows how expensive they can be.
I've bought a lot of WD drives over the years and my experience is they used to last 3 years (back when there was a 3 year warranty) and die right after the warranty expired. I think western digital does a very good job making their drives last to the end of the warranty and not a minute longer.
HDD manufacturers offering data recovery...kind of makes sense, and I'm surprised it's never been offered before. They're in a much better position to recover data than anyone else.
Yes, but the warranty is "irrelevant" when the drive actually last the whole 5 years (in other words, I am hoping the replacement drive is as well-engineered as its predecessor and lasts the whole 5 years - and it has so far in the last 3+ years).
Per charts in TFA, it looks like some disks are failing less overall, and failing after a longer period of time.
I'm still not sure how to confidently store decent amounts of (personal) data for over 5 years without
1- giving to cloud,
2- burning to M-disk, or
3- replacing multiple HDD every 5 years on average
All whilst regularly checking for bitrot and not overwriting good files with bad corrupted files.
Who has the easy, self-service, cost-effective solution for basic, durable file storage? Synology? TrueNAS? Debian? UGreen?
(1) and (2) both have their annoyances, so (3) seems "best" still, but seems "too complex" for most? I'd consider myself pretty technical, and I'd say (3) presents real challenges if I don't want it to become a somewhat significant hobby.
Offline data storage is a good option for files you don't need to access constantly. A hard drive sitting on a shelf in a good environment (not much humidity, reasonable temperature, not a lot of vibration) will last a very very long time. The same can't be said for SSDs which will lose their stored data in a mater of a year or two.
3. Park a small reasonably low-power computer at a friend's house across town or somewhere a little further out -- it can be single-disk or raidz1. Send ZFS snapshots to it using Tailscale or whatever. (And scrub that regularly, too.)
4. Bring over pizza or something from time to time.
As to brands: This method is independent of brand or distro.
I had to check for data integrity due to a recent system switch, and was surprised not to find any bitrot after 4y+.
It took ages to compute and verify those hashes between different disks. Certainly an inconvenience.
I am not sure a NAS is really the right solution for smaller data sets. An SSD for quick hashing and a set of N hashed cold storage HDDs - N depends on your appetite for risk - will do.
Don’t get me wrong: IMHO a ZFS mirror setup sounds very tempting, but its strength lie in active data storage. Due to the rarity of bitrot I would argue it can be replaced with manual file hashing (and replacing, if needed) and used in cold storage mode for months.
What worries me more than bitrot is that consumer disks (with enclosure, SWR) do not give access to SMART values over USB via smartctl. Disk failures are real and have strong impact on available data redundancy.
Data storage activities are an exercise in paranoia management: What is truly critical data, what can be replaced, what are the failure points in my strategy?
I have a simpler approach that I've used at home for about 2 decades now pretty much unchanged.
I have two raid1 pairs - "the old one", and "the new one", plus a third drive the same sizes as "the old pair". The new pair is always larger than the old pair, in the early days it was usually well over twice as big but drive growth rates have slowed since then. About every three years I buy a new "new pair" + third drive, and downgrade the current "new pair" to be the4 "old pair". The old pair is my primary storage, and gets rsynced to a partition that's the same size on the new pair. Te remainder of the new pair is used for data I'm OK with not being backed up (umm, all my BitTorrented Linux isos...) The third drive is on a switched powerpoint and spins up late Sunday night and rsyncs the data copy on the new pair then powers back down for the week.
>3. Park a small reasonably low-power computer at a friend's house across town or somewhere a little further out -- it can be single-disk or raidz1. Send ZFS snapshots to it using Tailscale or whatever. (And scrub that regularly, too.)
Unless you're storing terabyte levels of data, surely it's more straightforward and more reliable to store on backblaze or aws glacier? The only advantage of the DIY solution is if you value your time at zero and/or want to "homelab".
A chief advantage of storing backup data across town is that a person can just head over and get it (or ideally, a copy of it) in the unlikely event that it becomes necessary to recover from a local disaster that wasn't handled by raidz and local snapshots.
The time required to set this stuff up is...not very big.
Things like ZFS and Tailscale may sound daunting, but they're very light processes on even the most garbage-tier levels of vaguely-modern PC hardware and are simple to get working.
I'd much rather just have a backblaze solution and maybe redundant local backups with Time Machine or your local backup of choice (which work fine for terabytes at this point). Maybe create a clone data drive and drop it off with a friend every now and then which should capture most important archive stuff.
> 3. Park a small reasonably low-power computer at a friend's house across town or somewhere a little further out -- it can be single-disk or raidz1. Send ZFS snapshots to it using Tailscale or whatever. (And scrub that regularly, too.)
Maybe I’m hanging out in the wrong circles, but I would never think it appropriate to make such a proposal to a friend; “hey let me set up a computer in your network, it will run 24/7 on your power and internet and I’ll expect you to make sure it’s always online, also it provides zero value to you. In exchange I’ll give you some unspecified amount of pizza, like a pointy haired boss motivating some new interns”.
About the worst I can imagine happening (other than the new-found ability to rockroll someone's TV as a prank) is that said friend might take an interest in how I manage my data and want a hand with setting up a similar thing for themselves.
And that's all fine too. I like my friends quite a lot, and we often help eachother do stuff that is useful: Lending tools or an ear to vent at, helping to fix cars and houses, teaching new things or learning them together, helping with backups -- whatever. We've all got our own needs and abilities. It's all good.
Except... oh man: The electric bill! I forgot about that.
A small computer like what I'm thinking would consume an average of less than 10 Watts without optimization. That's up to nearly $16 per year at the average price of power in the US! I should be more cognizant of the favors I request, lest they cause my friends to go bankrupt.
/s, of course, but power can be a concern if "small" is misinterpreted.
Or find someone else with a similar backup need and then both just agree to have enough space to host remote backups for the other. I would have to increase my ZFS from N to 2N TB, but that would be less work and cheaper than setting up a backup computer for N TB somewhere else.
Get yourself a Xeon powered workstation that supports at least 4 drives. One will be your boot system drive and three or more will be a ZFS mirror. You will use ECC RAM (hence Xeon). I bought a Lenovo workstation like this for $35 on eBay.
ZFS with a three way mirror will be incredibly unlikely to fail. You only need one drive for your data to survive.
Then get a second setup exactly like this for your backup server. I use rsnapshot for that.
For your third copy you can use S3 like a block device, which means you can use an encrypted file system. Use FreeBSD for your base OS.
I don't understand what you're worried about with 3.
Make a box, hide it in a closet with power, every 3 months look at your drive stats to see if any have a buch of uncorrectable errors. If we estimate half an hour per checkup and one hour per replacement that's under three hours per year to maintain your data.
Hard drive failure seems like more of a cost and annoyance problem than a data preservation issue. Even with incredible reliability you still need backups if your house burns down. And if you have a backup system then drive failure matters little.
If you don't have too much stuff, you could probably do ok with mirroring across N+1 (distributed) disks, where N is enough that you're comfortable. Monitor for failure/pre-failure indicators and replace promptly.
When building up initially, make a point of trying to stagger purchases and service entry dates. After that, chances are failures will be staggered as well, so you naturally get staggered service entry dates. You can likely hit better than 5 year time in service if you run until failure, and don't accumulate much additional storage.
But I just did a 5 year replacement, so I dunno. Not a whole lot of work to replace disks that work.
IIRC, the things currently marketed as MDisc are just regular BD-R discs (perhaps made to a higher standard, and maybe with a slower write speed programmed into them, but still regular BD-Rs).
Unless you're basically a serious data hoarder or otherwise have unusual storage requirements, an 18TB drive (or maybe 2) get you a lot of the way to handling most normal home requirements.
Tapes would be great for backups - but the tape drive market's all "enterprise-y", and the pricing reflects that. There really isn't any affordable retail consumer option (which is surprising as there definitely is a market for it).
I looked at tape a little while ago and decided it wasn't gonna work out for me reliability-wise at home without a more controlled environment (especially humidity).
I feel like I’d like to see graphs in the shape you see in some medical trials – time on the x axis and % still alive on the y. You could group drives by the year they were purchased and have multiple lines for different years on there.
> The issue isn’t that the bathtub curve is wrong—it’s that it’s incomplete.
Well, yeah. The bathtub curve is a simplified model that is ‘wrong’, but it is also a very useful concept regarding time to failure (with some pretty big and obvious caveats) that you can broadly apply to many manufactured things.
Just like Newtonian physics breaks down when you get closer to the speed of light, the bathtub curve breaks down when you introduce firmware into the mix or create dependencies between units so they can fail together.
I know the article mentions these things, and I hate to be pedantic, but the bathtub curve is still a useful construct and is alive and well. Just use it properly.
Connected, but quite different to this subject, is how to long term store photos (cloud does not count). HDD still seem to be the best solution, but not sure how often should I rewrite them
Print out the ones you like and put them in an album or on the wall. Think how many photos you have that are like that in a family and still around, when all the rest are gone on dead phones or computers somewhere.
You can't get a perfect digital copy of a printed out photo. You're subjecting yourself to generational losses for no good reason.
If you're a fan of paper, you could base64 encode the digital photo and print that out onto paper with a small font, or store the digital data in several QR codes. You can include a small preview too. But a couple hard drives or microSD cards will hold many millions of times as many photos in less physical space.
So I had a random thought about what is the most platters that any hard drive has had. I looked it up it seems that the Western Digital Ultrastar® DC HC690 has eleven platters in a 3.5” form factor. That certainly gives you a lot more bandwidth, though not much help for seek time (unless you do the half-allocated trick).
It seems odd to look at failure rate in isolation, without considering cost and density; at scale, improved cost and density can be converted to lower failure rates via more aggressive RAID redundancy, no?
Future generations will blame us for damning them out of rare earths to build yet another cellphone. This is like us today with severely diminished whale populations just so Victorians could read the bible for another 2 hours a night. Was it worth it? Most would say no, save for the people who made a fortune off of it I'm sure.
That makes no sense whatsoever. We are not consuming rare earths; only moving them from one place to another.
Arguably, future generations would find it easier to mine them from former landfill sites, where they would be present in concentrated form, than from some distant mine in the middle of nowhere.
I'm curious what this data would look like collated by drive birth date rather than (or in 3D addition to) age. I wouldn't use that as the "primary" way to look at things, but it could pop some interesting bits. Maybe one of the manufacturers had a shipload of subpar grease? Slightly shittier magnets? Poor quality silicon? There's all kinds of things that could cause a few months of hard drive manufacture to be slightly less reliable…
(Also: "Accumulated power on time, hours:minutes 37451*:12, Manufactured in week 27 of year 2014" — I might want to replace these :D — * pretty sure that overflowed at 16 bit, they were powered on almost continuously & adding 65536 makes it 11.7 years.)
Over the past couple of years, I've been side hustling a project that requires buying ingredients from multiple vendors. The quantities never work out 1:1, so some ingredients from the first order get used with some from a new order from a different vendor. Each item has its own batch number which when used together for the final product yields a batch number on my end. I logged my batch number with the batch number for each of the ingredients in my product. As a solo person, it is a mountain of work, but nerdy me goes to that effort.
I'd assume that a drive manufacture does similar knowing which batch from which vendor the magnets, grease, or silicon all comes from. You hope you never need to use these records to do any kind of forensic research, but the one time you do need it makes a huge difference. So many people doing similar products that I do look at me with a tilted head while their eyes go wide and glaze over as if I'm speaking an alien language discussing lineage tracking.
Are you using a merkle tree for batch ids?:
…where f = hash for a merkle tree with fixed size (but huge!) batch numbers, and f = repr for increasingly large but technically decipherable pie IDs.I think it's helpful to put on our statistics hats when looking at data like this... We have some observed values and a number of available covariates, which, perhaps, help explain the observed variability. Some legitimate sources of variation (eg, proximity to cooling in the NFS box, whether the hard drive was dropped as a child, stray cosmic rays) will remain obscured to us - we cannot fully explain all the variation. But when we average over more instances, those unexplainable sources of variation are captured as a residual to the explanations we can make, given the avialable covariates. The averaging acts a kind of low-pass filter over the data, which helps reveal meaningful trends.
Meanwhile, if we slice the data up three ways to hell and back, /all/ we see is unexplainable variation - every point is unique.
This is where PCA is helpful - given our set of covariates, what combination of variables best explain the variation, and how much of the residual remains? If there's a lot of residual, we should look for other covariates. If it's a tiny residual, we don't care, and can work on optimizing the known major axes.
Exactly. I used to pore over the Backblaze data but so much of it is in the form of “we got 1,200 drives four months ago and so far none have failed”. That is a relatively small number over a small amount of time.
On top of that it seems like by the time there is a clear winner for reliability, the manufacturer no longer makes that particular model and the newer models are just not a part of the dataset yet. Basically, you can’t just go “Hitachi good, Seagate bad”. You have to look at specific models and there are what? Hundreds? Thousands?
> On top of that it seems like by the time there is a clear winner for reliability, the manufacturer no longer makes that particular model and the newer models are just not a part of the dataset yet.
That's how things work in general. Even if it is the same model, likely parts have changed anyway. For data storage, you can expect all devices to fail, so redundancy and backup plans are key, and once you have that set, reliability is mostly just a input into your cost calculations. (Ideally you do something to mitigate correlated failures from bad manufacturing or bad firmware)
I find it more straight forward to just model the failure rate with the variables directly, and look metrics like AUC for out of sample data.
Well said, and made me want to go review my stats text.
I personally am looking forward to BackBlaze inventing error bars and statistical tests.
(with a tinfoil hat on) I'm convinced that Backblaze is intentionally withholding and ambiguating data to prevent producing too-easily understood visualization that Seagate is consistently the worst of the last 3 remaining drive manufacturers.
Their online notoriety only started after a flooding in Thailand that contaminated all manufacturing clean room for spindle motors in existence, causing bunch of post-flood ST3000DM001 to fail quickly, which probably incentivized enough people for the Backblaze stat tracking to gain recognition and to continue to this date.
But even if one puts aside such models affected by the same problem, Seagate drives always exhibited shorter real world MTBF. Since it's not in interest of Backblaze or anyone to smear their brand, they must be tweaking data processing to leave out some of those obvious figures.
I don't think so, their posts still have all the details and the Seagates stick out like a very sore thumb in their tables:
https://backblazeprod.wpenginepowered.com/wp-content/uploads...
and graphs:
https://backblazeprod.wpenginepowered.com/wp-content/uploads...
Since it's not in interest of Backblaze or anyone to smear their brand
It is if they want to negotiate pricing; and even in the past, Seagates were usually priced lower than HGST or WD drives. To me, it looks like they just aren't as consistent, as they have some very low failure rate models but also some very high ones; and naturally everyone will be concerned about the latter.
OTOH, Seagate never sold customers SMR drives mislabeled for NAS use.
Agreed, these type of analyses benefit from grouping by cohort years. Standard practice in analytics.
Right. Does the trouble at year 8 reflect bad manufacturing 8 years ago?
Honestly, at 8 years, I'd be leaning towards dirty power on the user's end. For a company like BackBlaze, I'd assume a data center would have conditioned power. For someone at home running a NAS with the same drive connected straight to mains, they may not receive the same life span for a drive from the same batch. Undervolting when the power dips is gnarly on equipment. It's amazing to me how the use of a UPS is not as ubiquitous at home.
I work there. Can't go into much detail, but we have absolutely had various adventures with power and cooling that were entirely out of our control. There was even an "unmooring" event that nearly gave us a collective heart attack, which I'll leave you to guess at :)
Drives run off the regulated 12V supply, not the raw power line. "Dirty power" should not be a problem.
It would depend on how well done the regulation was in the power supply, wouldn't it?
why people continue to misunderstand this befuddles me. If you bought a budget PSU, then who knows what the voltages really are coming down the +3/+5v lines. You hope they are only +3/+5, but what happens when the power dips. Is the circuitry in the bargain priced PSU going to keep the voltages within tolerance, or do they even have the necessary caps in place to handle the slightest change in mains? we've seen way too meany tear downs to show that's not a reliable thing to bank your gear on.
> It's amazing to me how the use of a UPS is not as ubiquitous at home.
Most users don't see enough failures that they can attribute to bad power to justify the cost in their mind. Furthermore, USPes are extremely expensive per unit of energy storage, so the more obviously useful use case (of not having your gaming session interrupted by a power outage) simply isn't there.
UPSes are a PITA. I have frequent enough outages that I use them on all of my desktops, and they need a new battery every couple years, and now I'm reaching the point where the whole thing needs replacement.
When they fail, they turn short dips, which a power supply might have been able to ride through into an instant failure, and they make terrible beeping at the same time. At least the models I have do their test with the protected load, so if you test regularly, it fails by having an unscheduled shutdown, so that's not great either. And there's not many vendors and my vendor is starting to push dumb cloud shit. Ugh.
Sounds like you have some APS model. I had those issues, and switched to Cyberpower. The alarm can be muted and the battery lasts for many years.
A UPS is a must for me. When I lived in the midwest, a lightening strike near me fried all my equipment, including the phones. I now live in Florida and summer outages and dips (brownouts) are frequent.
Many years ago I had the same thing happen - actually came in the phone line, fried my modem and everything connected to the motherboard. More recently I had lightning strike a security camera - took out everything connected to the same network switch, plus everything connected to the two network switches one hop away. Also lit up my office with a shower of sparks. Lightning is no joke.
I've got Cyberpowers actually. The alarm can be muted, but it doesn't stay muted. Especially when the battery (or ups circuitry) is worn out so a power dip turns into infinite beeping. But also if the computer is turned off.
[flagged]
Yes this is fairly standard in manufacturing environments. builds of material and lot or down to serial # level are tracked for production of complex goods.
I have a 13 years old NAS with 4x1TB consumer drives with over 10y head flying hours and 600,000 head unloads. Only 1 drive failed at around 7 years. The remaining 3 are still spinning and pass the long self test. I do manually set the hdparm -B and -S to balance head flying vs unloads, and I keep the NAS in my basement so everything is thermally cool. I'm kinda of hoping the other drives will fail so I can get a new NAS but no such luck yet :-(
I admire the "use it until it dies" lifestyle. My NAS is at 7 years and I have no plans to upgrade anytime soon!
The problem with setting a nearly maintenance free nas is that you tend to forget about it just running away in the background.
Then a drive fails spectacularly.
And that's the story of how I thought I lost all our home movies. Luckily the home movies and pictures were backed up.
I built my home NAS in 2017 the two original drives were replaced after developing bad blocks (4 and 5 years, respectively). The two expansion drives (2018, 2021) are still fine.
I built a NAS for a client, which currently has 22 drives (growing bit by bit over the years) in it (270 GB of raw capacity) and since 2018 has lost only 3 drives.
I’d have thought 2 new drives to replace all that would be worth the investment in power savings alone.
So is that high usage compared to backblaze?
Is the 10y head flying for each head? Is it for heads actually reading/writing, or just for spinning drives/aloft heads?
I only skimmed the charts, they seemed to just measure time/years, but not necessarily drive use over time.
When I am projecting prices I tend to assume a 5 year life for a consumer hard drive. I do wonder from this data and the change in purchasing from backblaze if the enterprise class drives might pay for their extra price if they survive out to more like 9 years. 20% extra cost per TB verses about 30%+ more life time. They do tend to consume a bit more power and make more noise as well. I wish they had more data on why the drives were surviving longer, if its purchasing in palettes there isn't a lot we can do, but if its that enterprise drives are a lot better than NAS or basic consumer drives then that we compare cost wise.
Personal anecdote - I would say (a cautious) yes. Bought 3 WD hard drives (1 external, 2 internal, during different time periods; in the last 10+ years) for personal use and 2 failed exactly after the 5 year warranty period ended (within a month or so). One failed just a few weeks before the warranty period, and so WD had to replace it (and I got a replacement HDD that I could use for another 5 years). That's good engineering! (I also have an old 500GB external Seagate drive that has now lasted 10+ years, and still works perfectly - probably an outlier).
That said, one thing that I do find very attractive in Seagate HDDs now is that they are also offering free data recovery within the warranty period, with some models. Anybody who has lost data (i.e. idiots like me who didn't care about backups) and had to use such services knows how expensive they can be.
I've bought a lot of WD drives over the years and my experience is they used to last 3 years (back when there was a 3 year warranty) and die right after the warranty expired. I think western digital does a very good job making their drives last to the end of the warranty and not a minute longer.
HDD manufacturers offering data recovery...kind of makes sense, and I'm surprised it's never been offered before. They're in a much better position to recover data than anyone else.
> replacement HDD that I could use for another 5 years
But the warranty lasts only 5 years since the purchase of the drive, doesn't it?
Yes, but the warranty is "irrelevant" when the drive actually last the whole 5 years (in other words, I am hoping the replacement drive is as well-engineered as its predecessor and lasts the whole 5 years - and it has so far in the last 3+ years).
Per charts in TFA, it looks like some disks are failing less overall, and failing after a longer period of time.
I'm still not sure how to confidently store decent amounts of (personal) data for over 5 years without
All whilst regularly checking for bitrot and not overwriting good files with bad corrupted files.Who has the easy, self-service, cost-effective solution for basic, durable file storage? Synology? TrueNAS? Debian? UGreen?
(1) and (2) both have their annoyances, so (3) seems "best" still, but seems "too complex" for most? I'd consider myself pretty technical, and I'd say (3) presents real challenges if I don't want it to become a somewhat significant hobby.
Offline data storage is a good option for files you don't need to access constantly. A hard drive sitting on a shelf in a good environment (not much humidity, reasonable temperature, not a lot of vibration) will last a very very long time. The same can't be said for SSDs which will lose their stored data in a mater of a year or two.
One method that seems appealing:
1. Use ZFS with raidz
2. Scrub regularly to catch the bitrot
3. Park a small reasonably low-power computer at a friend's house across town or somewhere a little further out -- it can be single-disk or raidz1. Send ZFS snapshots to it using Tailscale or whatever. (And scrub that regularly, too.)
4. Bring over pizza or something from time to time.
As to brands: This method is independent of brand or distro.
I had to check for data integrity due to a recent system switch, and was surprised not to find any bitrot after 4y+.
It took ages to compute and verify those hashes between different disks. Certainly an inconvenience.
I am not sure a NAS is really the right solution for smaller data sets. An SSD for quick hashing and a set of N hashed cold storage HDDs - N depends on your appetite for risk - will do.
I've hosted my own data for twenty something years - and bitrot occurs but it is basically caused by two things.
1) Randomness <- this is rare 2) HW-failures <- much more common
So if you catch hw-failures early you can live a long life with very little bitrot... Little =! none so zfs is really great.
Don’t get me wrong: IMHO a ZFS mirror setup sounds very tempting, but its strength lie in active data storage. Due to the rarity of bitrot I would argue it can be replaced with manual file hashing (and replacing, if needed) and used in cold storage mode for months.
What worries me more than bitrot is that consumer disks (with enclosure, SWR) do not give access to SMART values over USB via smartctl. Disk failures are real and have strong impact on available data redundancy.
Data storage activities are an exercise in paranoia management: What is truly critical data, what can be replaced, what are the failure points in my strategy?
I have a simpler approach that I've used at home for about 2 decades now pretty much unchanged.
I have two raid1 pairs - "the old one", and "the new one", plus a third drive the same sizes as "the old pair". The new pair is always larger than the old pair, in the early days it was usually well over twice as big but drive growth rates have slowed since then. About every three years I buy a new "new pair" + third drive, and downgrade the current "new pair" to be the4 "old pair". The old pair is my primary storage, and gets rsynced to a partition that's the same size on the new pair. Te remainder of the new pair is used for data I'm OK with not being backed up (umm, all my BitTorrented Linux isos...) The third drive is on a switched powerpoint and spins up late Sunday night and rsyncs the data copy on the new pair then powers back down for the week.
>3. Park a small reasonably low-power computer at a friend's house across town or somewhere a little further out -- it can be single-disk or raidz1. Send ZFS snapshots to it using Tailscale or whatever. (And scrub that regularly, too.)
Unless you're storing terabyte levels of data, surely it's more straightforward and more reliable to store on backblaze or aws glacier? The only advantage of the DIY solution is if you value your time at zero and/or want to "homelab".
A chief advantage of storing backup data across town is that a person can just head over and get it (or ideally, a copy of it) in the unlikely event that it becomes necessary to recover from a local disaster that wasn't handled by raidz and local snapshots.
The time required to set this stuff up is...not very big.
Things like ZFS and Tailscale may sound daunting, but they're very light processes on even the most garbage-tier levels of vaguely-modern PC hardware and are simple to get working.
I'd much rather just have a backblaze solution and maybe redundant local backups with Time Machine or your local backup of choice (which work fine for terabytes at this point). Maybe create a clone data drive and drop it off with a friend every now and then which should capture most important archive stuff.
> 3. Park a small reasonably low-power computer at a friend's house across town or somewhere a little further out -- it can be single-disk or raidz1. Send ZFS snapshots to it using Tailscale or whatever. (And scrub that regularly, too.)
Maybe I’m hanging out in the wrong circles, but I would never think it appropriate to make such a proposal to a friend; “hey let me set up a computer in your network, it will run 24/7 on your power and internet and I’ll expect you to make sure it’s always online, also it provides zero value to you. In exchange I’ll give you some unspecified amount of pizza, like a pointy haired boss motivating some new interns”.
> In exchange I’ll give you some unspecified amount of pizza
You mean, in exchange we will have genuine social interactions that you will value much more highly than the electricity bill or the pizza.
Plus you will be able to tease me about my overengineered homelab for the next decade or more.
About the worst I can imagine happening (other than the new-found ability to rockroll someone's TV as a prank) is that said friend might take an interest in how I manage my data and want a hand with setting up a similar thing for themselves.
And that's all fine too. I like my friends quite a lot, and we often help eachother do stuff that is useful: Lending tools or an ear to vent at, helping to fix cars and houses, teaching new things or learning them together, helping with backups -- whatever. We've all got our own needs and abilities. It's all good.
Except... oh man: The electric bill! I forgot about that.
A small computer like what I'm thinking would consume an average of less than 10 Watts without optimization. That's up to nearly $16 per year at the average price of power in the US! I should be more cognizant of the favors I request, lest they cause my friends to go bankrupt.
/s, of course, but power can be a concern if "small" is misinterpreted.
Or find someone else with a similar backup need and then both just agree to have enough space to host remote backups for the other. I would have to increase my ZFS from N to 2N TB, but that would be less work and cheaper than setting up a backup computer for N TB somewhere else.
This works great although I should really do step 4 :)
Get yourself a Xeon powered workstation that supports at least 4 drives. One will be your boot system drive and three or more will be a ZFS mirror. You will use ECC RAM (hence Xeon). I bought a Lenovo workstation like this for $35 on eBay.
ZFS with a three way mirror will be incredibly unlikely to fail. You only need one drive for your data to survive.
Then get a second setup exactly like this for your backup server. I use rsnapshot for that.
For your third copy you can use S3 like a block device, which means you can use an encrypted file system. Use FreeBSD for your base OS.
I don't understand what you're worried about with 3.
Make a box, hide it in a closet with power, every 3 months look at your drive stats to see if any have a buch of uncorrectable errors. If we estimate half an hour per checkup and one hour per replacement that's under three hours per year to maintain your data.
Hard drive failure seems like more of a cost and annoyance problem than a data preservation issue. Even with incredible reliability you still need backups if your house burns down. And if you have a backup system then drive failure matters little.
If you don't have too much stuff, you could probably do ok with mirroring across N+1 (distributed) disks, where N is enough that you're comfortable. Monitor for failure/pre-failure indicators and replace promptly.
When building up initially, make a point of trying to stagger purchases and service entry dates. After that, chances are failures will be staggered as well, so you naturally get staggered service entry dates. You can likely hit better than 5 year time in service if you run until failure, and don't accumulate much additional storage.
But I just did a 5 year replacement, so I dunno. Not a whole lot of work to replace disks that work.
> 2- burning to M-disk, or
You can't buy those anymore. I've tried.
IIRC, the things currently marketed as MDisc are just regular BD-R discs (perhaps made to a higher standard, and maybe with a slower write speed programmed into them, but still regular BD-Rs).
Would tapes not be an option?
Not great for easy read access but other than that it might be decent storage.
>Would tapes not be an option?
AFAIK someone on reddit did the math and the break-even for tapes is between 50TB to 100TB. Any less and it's cheaper to get a bunch of hard drives.
Unless you're basically a serious data hoarder or otherwise have unusual storage requirements, an 18TB drive (or maybe 2) get you a lot of the way to handling most normal home requirements.
Tapes would be great for backups - but the tape drive market's all "enterprise-y", and the pricing reflects that. There really isn't any affordable retail consumer option (which is surprising as there definitely is a market for it).
I looked at tape a little while ago and decided it wasn't gonna work out for me reliability-wise at home without a more controlled environment (especially humidity).
I feel like I’d like to see graphs in the shape you see in some medical trials – time on the x axis and % still alive on the y. You could group drives by the year they were purchased and have multiple lines for different years on there.
> The issue isn’t that the bathtub curve is wrong—it’s that it’s incomplete.
Well, yeah. The bathtub curve is a simplified model that is ‘wrong’, but it is also a very useful concept regarding time to failure (with some pretty big and obvious caveats) that you can broadly apply to many manufactured things.
Just like Newtonian physics breaks down when you get closer to the speed of light, the bathtub curve breaks down when you introduce firmware into the mix or create dependencies between units so they can fail together.
I know the article mentions these things, and I hate to be pedantic, but the bathtub curve is still a useful construct and is alive and well. Just use it properly.
Connected, but quite different to this subject, is how to long term store photos (cloud does not count). HDD still seem to be the best solution, but not sure how often should I rewrite them
M-DISC. It's more expensive by size but for (private I'm assuming) pictures it doesn't make a difference.
Print out the ones you like and put them in an album or on the wall. Think how many photos you have that are like that in a family and still around, when all the rest are gone on dead phones or computers somewhere.
You can't get a perfect digital copy of a printed out photo. You're subjecting yourself to generational losses for no good reason.
If you're a fan of paper, you could base64 encode the digital photo and print that out onto paper with a small font, or store the digital data in several QR codes. You can include a small preview too. But a couple hard drives or microSD cards will hold many millions of times as many photos in less physical space.
Does this take into account the scandal of old drives being sold as new?
So I had a random thought about what is the most platters that any hard drive has had. I looked it up it seems that the Western Digital Ultrastar® DC HC690 has eleven platters in a 3.5” form factor. That certainly gives you a lot more bandwidth, though not much help for seek time (unless you do the half-allocated trick).
It seems odd to look at failure rate in isolation, without considering cost and density; at scale, improved cost and density can be converted to lower failure rates via more aggressive RAID redundancy, no?
Is backblaze single highhandedly driving QC on hard drive manufacturers with their yearly report?
Might be.
Not from the prices I'm seeing.
Recent and related:
Disk Prices https://news.ycombinator.com/item?id=45587280 - 1 day ago, 67 comments
Ah I haven’t seen the yearly backblaze post in some time now, glad it’s back.
Hard drives are not getting better.
Hard drives you can conveniently buy as a consumer - yes. There's a difference.
Do we have enough rare earth metals to provide storage for the AI boom?
The question is, do we have enough capacity to mine and refine them at a reasonable price? They're there, in the dirt for the taking.
Future generations will blame us for damning them out of rare earths to build yet another cellphone. This is like us today with severely diminished whale populations just so Victorians could read the bible for another 2 hours a night. Was it worth it? Most would say no, save for the people who made a fortune off of it I'm sure.
That makes no sense whatsoever. We are not consuming rare earths; only moving them from one place to another.
Arguably, future generations would find it easier to mine them from former landfill sites, where they would be present in concentrated form, than from some distant mine in the middle of nowhere.
Sounds mighty expensive if not impossible for extraction.
pleasant contradiction to betteridge's law