A new era of storage: Open compute with RISC-V and memory fabrics

By Zvonimir Bandic

Sr. Director, Next Gen Platforms, Western Digital; Chairman of CHIPS Alliance and Member Board Of Directors at RISC-V

Western Digital Corporation

February 21, 2018

Story

A new era of storage: Open compute with RISC-V and memory fabrics

In the last few years, we have witnessed a massive change in how data is generated, processed and further leveraged to garner additional value and intelligence.

In the last few years, we have witnessed a massive change in how data is generated, processed, and further leveraged to garner additional value and intelligence, all influenced by the emergence of new computational models based on deep learning and neural network applications. This profound change started in the data center where deep learning techniques were used to offer insights into vast data volumes, mostly to classify and/or recognize images, enable natural language or speech processing, or understand, generate, or successfully learn how to play complex strategy games. The change has also brought a wave of more power-efficient compute devices (based on GP-GPUs and FPGAs) created specifically for these classes of problems, and later included fully customized ASICs, further accelerating and increasing the compute capabilities of these deep learning-based systems.

Big Data and Fast Data

Big Data applications use specialty GP-GPU, FPGA and ASIC processors to analyze large datasets with deep learning techniques, and unmask trends, patterns, and associations, enabling image recognition, speech recognition, among others. As such, Big Data is based on information largely from the past, or rested data that typically resides in a cloud. A frequent outcome of a Big Data analysis is a “trained” neural network capable of executing a specific task, such as recognizing and tagging all faces in an image or video sequence. Voice recognition also demonstrates the power of the neural network.

The task is best executed by specialized engines (or inference engines), which reside directly on the edge device and led by a Fast Data application (Figure 1). By processing data captured locally at the edge, Fast Data leverages algorithms derived from Big Data to provide real-time decisions and results. As Big Data provides insights derived from ‘what happened’ to ‘what will likely happen’ (predictive analysis), Fast Data delivers real-time actions that can improve business decisions, operations, and reduce inefficiencies, invariably affecting bottom line results. These methods may apply to a variety of edge and storage devices, such as cameras, smartphones and SSDs.

Compute on data

The new workloads are based on two scenarios: (1) training the large neural network on a specific workload, such as image or voice recognition; and (2) applying the trained (or ‘fitted’) neural network on edge devices. Both workloads require massive parallel data processing that includes the multiplication and convolution of large matrices. Optimal implementations of these compute functions require vector instructions that operate on large vectors or data arrays. RISC-V is an architecture and ecosystem well-suited for this type of application as it offers a standardization process supported by open source software that enable developers complete freedom to adopt, modify or even add proprietary vector instructions. Prominent RISC-V compute architecture opportunities are outlined in Figure 1.

Move data

The emergence of Fast Data and computations at the edge creates a factual consequence that moving all of the data back and forth to the cloud for computational analysis is not very efficient. First, it involves relatively large data latency transfers at long distances across the mobile network and Ethernet, which is not optimal for the image or speech recognition apps that MUST operate in real time. Second, computing at the edge allows for more scalable architectures, where image and speech processing, or in-memory compute operations on SSDs can be executed in a scalable fashion. As such, each added edge device brings an incremental increase in the computational power required. Optimization on how and when the data moves is a key factor in the scalability of new architectures.

[Figure 1 | Big Data, Fast Data and RISC-V opportunities]
In Figure 1a, cloud data center servers execute machine learning using deep learning neural network training on large Big Data sets. In Figure 1b, security cameras at the edge employ inference engines that are trained on Big Data, and recognize images in real time (Fast Data). In Figure 1c, the smart SSD device uses an inference engine for data recognition and classification, effectively utilizing the device’s bandwidth. As Figure 1 shows potential opportunities for RISC-V cores, it enables the freedom to add both proprietary and future standardized vector instructions that are instrumental in processing deep learning and inference techniques.

Another similar, and important trend in how data is moved and accessed exists on the Big Data side and in the Cloud (Figure 2). The traditional computer architecture (Figure 2a) utilizes a slow peripheral bus that attaches to a number of other devices (e.g. dedicated machine learning accelerators, graphics cards, fast SSDs, smart networking controllers, etc.). The slow bus affects device utilization by limiting the communications capabilities between themselves, the main CPU, and main, potentially persistent memory. It is also not possible for this new class of computational devices to share memory amongst themselves, or with the main CPU, which results in wasteful and limited data movement across a slow bus.

There are several important industry trends emerging on how to improve data movement between different compute devices, such as the CPU and compute and network accelerators, as well as how the data is accessed in memory or fast storage. These new trends are focused on open standardization efforts that deliver faster, lower latency serial fabrics, and smarter logical protocols, enabling coherent access to shared memory.

Next generation data-centric computing

Future architectures will need to deploy open interfaces to persistent memory and cache-coherency-enabled fast busses (e.g. TileLink, RapidIO, OpenCAPI, and Gen-Z) that connect to compute accelerators, to not only improve performance substantially, but enable all devices to share memory and reduce unnecessary data movement.

[Figure 2 | Data movement and access in compute architectures]
In Figure 2a, a traditional compute architecture has reached its limits due to a slow peripheral bus used for fast storage and compute acceleration devices. In Figure 2b, future compute architectures deploy open interfaces that provide uniform cache-coherent access of all compute resources in the platform to shared persistent memory, (referred to as a data-centric architecture). In Figure 2c, the deployed devices are able to utilize the same shared memory, reducing unnecessary copying of data.

The role of the CPU uncore and network interface controllers will grow as the key enablers for moving data. The CPU uncore component will have to support key memory and persistent memory interfaces (e.g. NVDIMM-P), as well as memory that resides close to the CPU. Smart and fast busses for compute accelerators, smart networking and remote persistent memory will also need to be implemented. Any device on the bus (e.g. CPUs, general-purpose or purpose-built compute accelerators, network adapters, storage or memory) can include their own compute resources with the ability to access shared memory (Figures 2b and 2c).

To optimize data movement, RISC-V technology can be the key enabler as it will implement vector instructions for new machine learning workloads on all of the compute accelerator devices. It enables open source CPU technologies that support open memory and smart bus interfaces, and implements a new data-centric architecture with coherently shared memory.

Solving the challenges with RISC-V

Big Data and Fast Data pose future data movement challenges, paving the way for the RISC-V instruction set architecture (ISA), and its open, modular approach, ideally suited to serve as the foundation for data-centric compute architectures.  It provides the ability to:

  • Scale compute resources for edge compute devices
  • Add new instructions, such as vector instructions for key machine learning workloads
  • Locate small compute cores very close to storage and memory media
  • Enable new compute paradigms and modular chip designs
  • Enable new data-centric architectures where all of the processing elements can coherently access shared persistent memory, optimizing data movement

The RISC-V is developed by a membership of over one hundred organizations and includes a collaborative community of software and hardware innovators who can adapt the ISA to a specific purpose or project. Anyone who joins the organization can design, manufacture and/or sell RISC-V chips and software under a Berkeley Software Distribution (BSD) license.

Final thoughts

To realize its value and possibilities, data needs to captured, preserved, accessed, and transformed to its full potential.  Environments with Big Data and Fast Data applications have exceeded the processing capabilities of general-purpose compute architectures.  The extreme data-centric applications of tomorrow require purpose-built processing that supports independent scaling of data resources in an open manner.

Having a common open computer architecture centered on data stored in persistent memory, while allowing all devices to play a compute role, is a key enabler of these new scalable architectures driven by a new class of machine learning compute workloads. The next-generation of applications across all cloud and edge segments will need this new class of low-energy processing, as specialty compute acceleration processors focus on the task at hand, reducing wasted time moving data, or performing excess computing that is not relevant to the data.  People, communities and our planet thrives through the power, potential and possibilities of data.

Zvonimir Z. Bandi? is a Research Staff Member and Senior Director of the Next Generation Platform Technologies Department at Western Digital Corporation. He received his Bachelors of Science (BS) degree in electrical engineering from the University of Belgrade, Yugoslavia, and his Masters of Science (MS) degree and Doctorate (PhD) degree in applied physics from the California Institute of Technology (Caltech) in the field of novel electronic devices based on wide bandgap semiconductors. Bandic is currently focusing on emerging Non-Volatile Memory (PCM, ReRAM, MRAM) applications for data center distributed computing, including RISC-V based CPU technologies, in-memory compute, RDMA networking, and machine learning hardware acceleration. He has been awarded over 50 patents in the fields of solid state electronics, solid state disk controller technology, security architecture and storage systems, and has published over 50 peer-reviewed papers. Zvonimir is a Board of Director member of RISC-V, OpenCAPI and Rapid-IO standards organizations.

Fundamental background in computer architecture, memory device technologies, machine learning and inference hardware acceleration. Managing Western Digital RISC-V CPU architecture and design project, as well as Machine Learning Hardware Acccelerator project. Deep pplications and innovations experience in Storage and compute systems, Memory Architecture and Electronics design.

More from Zvonimir