A number of months in the past, I used to be alerted by a colleague to a vital bulletin that was launched by the Hewlett Packard Enterprise Help Middle. The bulletin warned a couple of firmware defect that had been detected in sure fashions of solid-state drives (SSD) utilized in a number of completely different HP methods and home equipment. The title of the bulletin was troublesome sufficient to parse at first: “HPE SAS Solid State Drives – Critical Firmware Upgrade Required for Certain HPE SAS Solid State Drive Models to Prevent Drive Failure at 32,768 Hours of Operation.” The bulletin was initially launched in November and has since then been up to date 4 instances together with late final month. You’ll be able to learn the total bulletin here.
What all of it boils all the way down to is that if you happen to bought one of many affected HP methods and switch it on, you possibly can count on the SSD in it to catastrophically fail after precisely 3 years, 270 days and 8 hours (32,768 hours of operation). Nicely, not less than it’s good to know when one thing goes to fail so that you received’t have your socks knocked off when it happens.
After all, defective firmware isn’t the one factor that may trigger issues with SSDs. It’s a well-known fact that even SSDs which have had solely minimal use can immediately and unexpectedly fail after they have been experiencing sure sorts of masses. With onerous disk drives (HDDs) not less than you could possibly get SMART errors warning you that your drive was at risk of bugging out fairly quickly. SSDs, then again, can prematurely fail with out producing any SMART error situations. Nonetheless, the unimaginable pace that SSD applied sciences have over slower “spinning rust” applied sciences has led many firms emigrate a lot of their storage from HDD to SSD drives the place their finances has allowed them. And SSD costs continue to fall and are closing in fast in the direction of parity with the price of HDDs.
However the query stays: How will you put together your datacenter so a firmware concern like this received’t take down your servers and different home equipment? I talked with a number of colleagues about this and have distilled their consensus beneath as a sequence of finest practices or suggestions it is best to comply with.
Signal me up!
The very first thing it is best to do if you happen to use SSDs, or you have got methods or home equipment deployed which have SSDs inside them, is to enroll along with your vendor’s assist alerts mailing checklist if they’ve one. And don’t buy something from a vendor that doesn’t have a mailing checklist you possibly can join that gives alerts regarding points with their merchandise. Sadly, it may be onerous with some distributors to seek out out the place you possibly can join these sorts of assist alerts or bulletins. For instance, HP enables you to join Driver and Support eAlerts on this web page in addition to different bulletins which might be extra advertising and marketing oriented just by specifying your title, firm, and electronic mail deal with. Dell enables you to subscribe here to obtain driver and firmware Replace notifications, however this requires that you simply first create a Dell account on MyAccount. For different distributors nonetheless you both need to Google for varied phrases like “support bulletins” or “subscribe to alerts” and so forth, or simply go digging round on their web site for info on subscribe (and whether or not they actually have a checklist you possibly can subscribe to).
Befriend your TAM
In case you are an enterprise buyer you then in all probability have been assigned a technical account supervisor or TAM work works on the vendor and whose job is that will help you get solutions if you want them (and persuade you to purchase extra of their merchandise). My recommendation is that you simply attempt to construct a great working relationship along with your TAM and never simply deal with them as one other greedy appendage of your vendor’s gross sales division. A very good TAM could be a lifesaver in lots of varieties of inauspicious options, and a TAM who you are feeling comfy speaking with — and who feels comfy that they’ll attain out to you as properly with out feeling they’re intruding or seen as being too pushy — is simply the individual you want in your aspect when one thing like a vital firmware drawback is found in considered one of their merchandise. Ask your TAM to inform you if something like this could come up on their radar, and inform them that you simply’d respect them texting or calling you direct at once if something like this could come up. A very good TAM can’t solely warn you when there’s a firmware concern however may also enable you to discover and presumably even deploy the wanted firmware replace when it has been launched by your vendor. Or not less than your TAM can join you with somebody in your vendor’s assist crew who truly is aware of their sport and isn’t just following a script that was supplied to them.
Make common backups
My closing phrase of recommendation needs to be a no brainer because it applies to something in computing or networking that’s storage-related. That recommendation is to be sure to frequently again up the storage on all of your methods. With server methods, this needs to be simple and there’s no want to debate it any additional. Community home equipment are a distinct kettle of fish, nonetheless, as a result of a few of them could have SSD storage embedded inside them however could not floor any entry to their storage externally, besides maybe to the seller’s personal approved assist personnel. In such instances, you might have to construct some form of load balancing functionality into the place your gadget is positioned in your community in order that if the gadget unexpectedly fails its workload might be dealt with by one other gadget in your community. However simply don’t neglect the significance of doing backups wherever they’re doable.
Featured picture: Shutterstock