The invention, subsequent perfection and continued advancement of monolithic integrated semiconductor microelectronics has, without a doubt, shaped our world for at least the last five decades. While research into new materials, such as carbon nanotubes, has the potential to revolutionise the industry, conventional semiconductor technology (as it may soon be called) will likely continue to dominate for the foreseeable future.
Our familiarity and dependency on semiconductors is based on many things, but high among them has to be their reliability. The semiconductor industry goes to great lengths to fully characterise each new process – still driven by Moore’s Law – before it becomes mainstream. This means it is extremely rare for an integrated device to fail in the field; they are inherently robust and extremely resilient if used correctly.
The same cannot be assumed about mass storage, which is an unfortunate truth but one that is widely understood and accepted. It simply comes down to the physics involved; as magnetic storage includes moving parts, it will naturally have a finite life expectancy based on the level of use. However, as solid-state memory has no obvious moving parts it may not be quite so obvious that it too has a finite lifetime, which is also based on usage.
The fundamental principle behind non-volatile, erasable memory such as Flash is the concept of a floating gate. Conventional transistors have a gate, of course, and it is the potential on the gate that influences the flow of charge carried through the channel from source to drain. In Flash memory, the potential on the gate indicates the binary status of the cell; 0 or 1 (or, in the case of a multi-level cell, 00, 01, 10, 11). Maintaining this state after power is removed defines the non-volatility of Flash memory, and the addition of a floating gate that can store charge after power is removed underpins Flash functionality. Unfortunately, the process of setting the potential on the floating gate is relatively arduous for the substrate, which means it experiences fatigue that is analogous to the moving parts in a hard disk drive wearing out.
It is important to accept that Flash memory will wear out and could fail, so it is also important and sensible to mitigate the impact of a potential failure. Part of the challenge, however, is that not all memory will wear out at the same rate, even under the same conditions. Compounding this fact is the stark reality that there are never two applications that use memory in exactly the same way, so even if it were predictable it would be subject to many factors, some of which may not be known.
This is where advanced monitoring and diagnostic tools can help. Because the nature of Flash wear is understood, it is possible to take precautions against the impact of a failure, as well as use the information gathered to make more informed design decisions about the kind of Flash memory to design-in. However, usually tools do not offer an insight into the physical level that is hidden even behind the Files System and behind the FTL.
Understanding your use-case
In addition to how a system is designed at a high level, the interdependency of functions at a low level will dynamically change the way it operates. This is where the skill and experience of the firmware engineers comes into play, as it is the ability to react to external conditions and stimuli in a reliable way that really dictates any system’s operation.
Of course, this almost random nature means that the way mass storage is used is effectively dependent on the use-case at a systemic level, which makes predicting this behaviour particularly difficult. Gaining insights into exactly how a system functions at a low level, down to how often memory is accessed, the frequency of program/erase cycles and the long term endurance this results in can be extremely useful in helping to predict the overall lifetime of the memory subsystem.
Hyperstone has developed methods for recording this low level activity in real-world applications, and using the data this generates to analyse exactly how the memory is being used. This analysis helps designers understand their own specific use-case and how the memory type chosen influences the overall system performance, as well as predict when a fault may occur.
The health-monitoring technology developed by Hyperstone is provided as hySMART tool (see below). Using a standard memory card it is possible to capture the transactions that pass between the host and the Flash controller. This will include the type of data transfers such as sequential or random reads, the amount of data that is written or read, and the amount of data that is transferred. The nature of this data will define the customer’s specific use-case, but the content of the data remains hidden from the tool.
Getting SMART about storage
One of the main tools used by Hyperstone to carry out the use-case analysis is its proprietary tool, hySMART. This is an extension to the industry-standard SMART (Self-Monitoring, Analysis and Reporting Technology) that is embedded in most if not all hard disk drives, and accessed using ATA commands.
Although SMART was originally developed to analyse the health of hard disk drives, it has been widely adopted by solid-state drive manufacturers and now NAND Flash memories, for largely the same reasons. Many of the commands are applicable to both technologies and are now supported by a large number of Hyperstone’s range of Flash controllers.
The analysis made possible using SMART ATA commands is extensive. Hyperstone supports ATA standard commands as well as vendor-specific commands. Typically, issuing a command returns data in the form of an ASCII string or raw hexadecimal data. Interpreting this data falls to the hySMART tool, which decodes the data and presents it in a more useful format that can help developers understand the operating status of the memory connected.
The hySMART tool is effectively a GUI that can be used to interpret SMART data gathered from a drive connected to the host computer. By collecting log data, provided by the drive, it can graphically represent crucial information about the drive, such as spare blocks. In addition it presents data on the number of block erases and use this to predict the lifetime. ECC error information is also presented graphically as an ECC error histogram, both as correctable and uncorrectable errors.
Flash memory is a fundamental part of all modern systems, as the amount of data being generated increases on a daily basis. The era of the IoT and Big Data rely heavily on the endurance of memory devices, under operating conditions that can see repeated program/erase cycles over the intended lifetime of the system.
Advanced analysis tools like those developed by Hyperstone are becoming more important, as they enable design teams to better understand how their designs use Flash memory and the long term impact of that use-case on the memory chosen.
By working closely with Hyperstone, manufacturers can receive the help and support they now need to develop more reliable products that deliver higher performance throughout their intended lifecycle.