MTTF – What hard drive reliability really means
Reliability and operating conditions determine the choice of the right HDD
10/2019 – Author: Rainer W. Kaese*
There is still no way around HDDs for the cost-effective provision of storage capacity. Different applications require the use of different HDDs. The decisive selection criteria are reliability and operating conditions
The volume of data continues to grow unrestrained and the challenges for businesses and private users in regards of data storage are constantly increasing. In terms of efficiency, security and costs, it is essential to use the right drives for the different use cases. A decisive criteria is the reliability of an HDD, which relies on several factors such as product specification, ambient conditions, workload or specific application. Especially the aspects operating time, manufacturer warranty, Mean time to failure (MTTF) and annualized failure rate (AFR) must be considered in-depth.
A major reliability-related criterion for the selection of storage components is the operating duty, which refers to how many hours in a day a drive has been designed to be active for. Client drives for desktop or laptop computers are typically designed to handle operation of, on average, 8 hours per day. This reflects a typical use case for these types of machines. On the other hand, HDDs for Enterprise-class applications are optimized for continuous use – 24 hours a day, seven days a week, 365 days a year (24/7/365 or 24/7).
There are also differences in terms of warranty. The reliability of a product is warranted by the manufacturer for a defined period of time. For enterprise components this will typically be 5 years. Client drives will offer a warranty of somewhere between 1 and 3 years. This warranty is, however, dependent upon correct usage and deployment, meaning that the operation time as well as the environmental conditions need to be observed.
Furthermore, with regard to the reliability of a hard disk, the manufacturer’s information on the MTTF must be taken into account. The MTTF is a statistical value that defines after how much time a first failure in a population of devices may occur (measured in hours). If MTTF is given as 1 million hours, and the drives are operated within the specifications, one drive failure per hour can be expected for a population of 1 million drives. For the more realistic quantity of 1000 drives, a Managed Service Provider (MSP) should plan for a failure every 1000 hours (almost 42 days).
The expected statistical failure rate per year (Annualized Failure Rate – AFR) for drives in 24/7-operation can be calculated from the MTTF by the following formula:
The reduction by an exponential term is required because the drives that have failed during this timeframe have to be considered in the statistics. However, for small AFR%, reflecting drives that already failed in the formula has negligible impact, allowing the formula to be approximated as:
This means that 9 out of every 1,000 drives per year may fail within warranty time.
Operating and environmental conditions
In addition to the reliability criteria of a hard disk, the specific operating and environmental conditions must also be taken into account: this mainly affects operating temperature, rated workload, load / unload cycles and start-stop cycles. If a manufacturer guarantees a concrete MTTF value over a certain period of time, it does so only on condition that the drive is operated under certain environmental conditions and workloads.
- Operating temperature
In order to provide the longest possible warranty period and highest MTTF, the operating temperature range is defined to best match the target application. For MSPs running cooled data centers, enterprise drives are usually specified for use from 5°C to 55°C. Consumer or client drives are rated for 0°C to 60°C and specific Industrial drives are specified to an extended temperature range of – 40 ° C up to 85 ° C.
The temperature specification is typically defined as either the ambient temperature, Ta, or case temperature, Tc. Ambient temperature is the temperature of the air around the immediate vicinity of the drive, whereas case temperature is measured on the surface of the drive itself. Operating drives outside of the temperature specification will increase component wear and reduce the MTTF value, negatively impacting the AFR.
- Rated workload
With their spinning platters and moving heads, hard disk drives (HDD) have a number of components that can suffer wear. It should be clear then that the workload, i.e. the amount of data written and read, will have an impact on reliability. Drive manufacturers typically define a maximum workload per year for which the MTTF and AFR values remain valid.
NAS drives are typically rated at up to 180 TB/year. This is less than enterprise drives (550 TB/year) but significantly more than client drives (55 TB/year). The difference between the split between read and write workloads has no impact on rated workload.
HDDs support an idle-mode. In this mode the read/write head is parked on a mechanical ramp while the spinning platters are brought to a standstill. When access to the drive is required again, the platters are spun-up and the head is brought out of its parked position again. This is known as a load/unload cycle. But latest HDD models can support several hundred thousand Load/Unload cyles. There are almost no restrictions in this regard.
- Start/Stop Cycle (HDD)
For drives that are not specified for 24/7 operation, the maximum number of start/stop cycles for the spindle motor will be defined. This normally lies between 10,000 and 50,000 start-stop cycles.
Different HDD classes at a glance
HDDs are designed with a specific application in mind: enterprise-performance, enterprise-capacity, NAS, video and surveillance, as well as consumer and desktop.
The enterprise performance class HDDs are designed for mission-critical applications in 24/7 operation. The 2.5-inch hard disk drives with a Serial Attached SCSI (SAS) interface offer 10,000 to 15,000 rotations per minute , 500 input / output operations per second ( IOPS ) and up to 2.5TB of storage capacity. Manufacturers specify an MTTF of up to two million hours. The hard drives are designed for a workload of 550TB per year. With a continuous data transfer rate of 200MB/s over the five-year warranty time the Rated Workload be unlimited.
The Enterprise Capacity Class SAS or Serial ATA (SATA) HDDs, also designed for 24/7 operation, provide up to 16TB of storage capacity in 2019. They are mainly for nearline storage applications such as shared drives, cloud storage or archiving. In addition to the 550TB rated workload per year, they are characterized by high availability. The MTTF can be up to 2.5 million hours.
The NAS HDDs with SATA interface and up to 14TB are suitable for use in private NAS systems. Basic parameters include the 24/7 operation, a Rated workload of 180TB per year over the three-year warranty period and a MTTF of 1 million hours.
HDDs for video cameras and surveillance systems require 24/7 operation. They will be available in 2019 with up to 10TB. The hard drives with SATA interfaces are designed for a workload of 180TB per year (based on the three-year warranty) and offer a MTTF of 1 one million hours. Typically, HDDs of this category are designed for a wider temperature range, since surveillance systems are often used in locations that are not cooled as accurately as server rooms in data centers. The most critical factors for this type of drives include firmware peculiarities that support video and streaming- specific requirements. Thus, for example the maximum possible time for error correction is limited to avoid interruptions of the video stream.
The last category is consumer or desktop hard drives. They feature a SATA interface and they are available with up to 10TB in 2019. These are the cheapest HDD classes; accordingly, with some restrictions. Thus, the operating time is only 8 hours per day, the workload at just 55TB per year over the two-year warranty period and the MTTF at 600,000 hours.
Overall, it can be stated that hard drives are not just about capacity and price. When storing data, the reliability specifications and operating and environmental conditions of the hard drives should always play an important role. If the user chooses the “right” version, nothing stands in the way of efficient, secure HDD deployment.
Central parameters of the different HDD classes in the overview.
*The author Rainer W. Kaese is Senior Manager Business Development Storage Products at Toshiba Electronics Europe.