We are rapidly approaching the exascale era with few tools and infrastructure to help us build software/hardware co-designed ecosystems for the future. We can no longer wait for new hardware to develop software. Likewise, in order to meet the performance and power targets of these exascale and beyond systems, we must leverage specialization in the form of a co-designed system, meaning hardware and software must be designed together, versus in isolation.
Specialization is also extending the open source ecosystem into hardware with open ISAs, like RISC-V, that define the base standard for interoperability at the software level, but also the flexibility to specialize the architecture to include new application features and support in hardware. This new capability is ushering in an exciting new era of system design, full stack research and development, including building specialized hardware. The MareNostrum Exascale Experimental Platform (MEEP), is a first step that enables a flexible hardware platform that we can use to map or emulate a variety of different architectures for developing, both hardware and software.
BSC is building the various software and hardware components to enable an open source software and hardware HPC ecosystem. We are building on the rich history of open source software, and extending it to open source hardware based on the RISC-V Instruction Set Architecture (ISA). MEEP enables the rapid evaluation of hardware architecture using FPGAs. This is one of the many steps along the path to an open HPC ecosystem, both hardware and software. Furthermore, MEEP is also a software development vehicle that allows us to run the entire software stack at a reasonable speed for interactive development, unlike software simulation. MEEP is our digital laboratory for design, testing, and evaluating future exascale accelerators and systems, true software/hardware co-design.
Unlike software, hardware development costs are very expensive and mistakes are very costly, especially in chip fabrication. MEEP provides a mechanism that trades flexibility for performance. We are building MEEP using FPGAs, a flexible hardware component or fabric that can be reprogrammed over and over again. It allows us to treat hardware like software with the same flexibility of software, we can recompile the hardware description language to create a new hardware design to map onto the FPGA fabric. MEEP infrastructure software and the FPGAs make hardware emulation more like traditional software development. Furthermore, MEEP can be used for much more than a single emulation project, i.e., the first demonstrator for MEEP, an exascale accelerator. MEEP can emulate other accelerator designs, as well as CPUs. We can also support FPGA-based accelerators using the same FPGA infrastructure.
Building an exascale accelerator emulator
MEEP combines three main components: software, architecture and RTL, and hardware components to form a complete system emulation platform. The first deployment will be an exascale accelerator for both HPC and High Performance Data Analytics (HPDA) applications. Thus, we have assembled a collection of HPC, AI, ML, and DL applications to target for acceleration.
Based on this benchmark suite, we analyze the applications and define an architecture that is optimized for these applications. With a defined architecture, we can write the RTL, code that describes the hardware. In true co-design practice, we have the flexibility to make changes at any level in the stack, any layer of the software stack and the hardware, as well. This is a new level of flexibility that enables the best overall solutions for problem, versus being constrained to software-only changes and/or simulation-only validation.
Finally, we combine the software and architecture and RTL and map that on to the emulator. This is a system composed of approximately 100 CPUs and FPGAs. This scale enables larger scale system studies beyond the normal single chip evaluation. The combination of CPUs and FPGAs provides additional flexibility for mapping the logical emulator on to the physical resources. We can blur the lines of the physical hardware with the logic definition of the exascale accelerator or any other system we map to MEEP. Furthermore, we are using a traditional accelerator architecture that can be used for studies, at scale, beyond this initial project. The FPGA is the fundamental building block that provides this flexibility. We basically deploy the FPGA in two different ways: FPGA Shell and FPGA Emulator. We define the FPGA Shell as the FPGA interfaces to memory and I/O. The FPGA Shell encompasses all the common infrastructure across all FPGA designs. The FPGA resources that are left over in the rest of the FPGA fabric can be used for emulation and/or accelerator designs. Figure 1, below illustrates the software stack and FPGA emulator that runs the RTL code that describes the hardware architecture.
HPC applications to include emerging AI, ML, and DL workloads. MEEP will enable us to peer into the future and demonstrate how things can work before the new hardware is available. This pre-silicon validation will save significant amounts of money by improving the quality of the RTL and remove bugs.
MEEP provides a unique opportunity to enable system-scale software development of new hardware and its associated hardware. We are building a tightly-coupled accelerator that extends beyond traditional HPC applications to include emerging AI, ML, and DL workloads. MEEP will enable us to peer into the future and demonstrate how things can work before the new hardware is available. This pre-silicon validation will save significant amounts of money by improving the quality of the RTL and remove bugs. MEEP will also enable software development for new systems, enabling concurrent development of the new hardware and software. This is especially crucial for the software development that must otherwise wait for the new hardware to be available. In both cases, we can reduce the time for development by parallelizing the hardware and software development and improve overall software and hardware quality by running the system on at a much larger scale.
The MEEP project started in January 2020 and has a duration of 36 months. For more information, please visit: http://meep-project.eu/.
About the Author
John D. Davis is the director of LOCA, the Laboratory for Open Computer Architecture, and the PI for MEEP at the BSC. He has published over 30 refereed conference and journal papers in Computer Architecture (ASIC and FPGA-based domain-specific accelerators, non-volatile memories and processor design), Distributed Systems, and Bioinformatics. He also holds over 35 issued or pending patents in the USA and multiple international filings. He has designed and built distributed storage systems in research and as products. John has led the entire product strategy, roadmap, and execution for a big data and analytics company. He has worked in research at Microsoft Research, where he also co-advised 4 PhDs, as well as large and small companies like Sun Microsystems, Pure Storage, and Bigstream. John holds a B.S. in Computer Science and Engineering from the (University of Washington) and an M.S. and Ph.D. in Electrical Engineering (Stanford University).