In the coming years, tens of billions of Industrial Internet of Things (IIoT) devices will be connected and generate massive amounts of data collected from sensors and applications. A significant portion of this IIoT data will eventually stored, processed, and even analyzed at the edge, requiring that storage devices there are able to respond quicker with high data integrity.
A major challenge facing IIoT edge computing are the harsh environments these systems will inevitably encounter, in particular extended temperatures. Unfortunately, a common misconception exists that by simply using off-the-shelf industrial-grade NAND components, the storage systems serving IIoT devices will be able to reliably operate at often-extreme temperatures, which should be enough to guarantee the reliability of mission-critical systems. In practice, taking such an approach can result in unacceptable levels of device performance and fault tolerance in NAND flash storage, an explanation of which is described below.
NAND characteristics, die shrink, and the influence of extreme temperatures
In manufacturing, lithographic node shrinkage, or “die shrink,” tends to raise the number of defective dies, resulting in unstable quality in NAND flash modules and ICs. Fewer electrons stored per memory cell can lead to an increase in the number of bit errors, which decreases data retention as well as endurance.
Extreme temperatures can further exacerbate the deterioration of NAND flash and create a change in the momentum of electrons momentum in modules and ICs, leading to data retention issues or even data loss. For example, Raw Bit Error Rate (RBER) and Early Life Failure Rate (ELFR) are two phenomena that result from electron leakage or retention issues in the tunnel oxide layer of a memory cell. During program/erase (P/E) cycles, high temperature can accelerate electrons into or out of a cell gate and make P/E easier, but at the same time, the buildup of charge traps (trapped electrons) at the tunnel oxide layer is increased. Over time, the de-trapping of these charges can result in threshold voltage shifts (Vt) that generate bit flips and retention failure.
On the other extreme at low temperatures, cell gates may end up with lower charges, and increasing tunnel oxide degradation can cause potential dielectric leakage despite data retention being improved.
The only method for safeguarding against such incidents in NAND flash devices is through rigorous reliability testing procedures.
IC-level test and product-level reliability demonstration testing for enhanced reliability
NAND flash IC tests can be employed to verify how error-correcting code (ECC) and temperature influences P/E endurance, data retention, and the operating life of NAND flash devices. For example, different levels of ECC per 1 KB of memory can be tested across temperature ranges in a reliability demonstration test (RDT) to determine a sufficient amount of ECC necessary against certain environmental factors.
For product-level testing, this same RDT process can be applied through burn-in tests of read/write quality assurance at temperatures of -40 ºC to +85 ºC, with block-by-block evaluations of entire drives, including firmware, user area, and other memory spaces. Verified weak blocks can be filtered out and replaced with spare blocks to strengthen the overall endurance of a NAND device throughout its lifecycle, and further validation testing can verify signal integrity across a SATA interface.
ATP’s ITemp MLC NAND flash solutions have adopted such validations to support high product reliability and long-term product lifecycle requirements in harsh temperatures.
To achieve the reliability required by IIoT applications, general testing methods for NAND IC components are insufficient. Advanced RDT against high/low temperatures can deliver enhanced reliability, prolonged product life, and lower total cost of ownership. Is your storage solution up to the task in harsh environments?
ATP Electronics, Inc.