Many people talk about hardware architecture as if it’s the most important part of a new platform. It’s true that hardware architecture is important for performance, which was discussed at length in a previous blog post. As a refresher, the pillars of the Heterogeneous System Architecture (HSA) are unified and shared virtual memory user-mode dispatch, platform atomics, architected signals, strict memory model, quality of service, and cache coherency.
However, including these features into the platform architecture is not for their own sake; it allows software to be written easily and to run efficiently. Even more so, it enables existing software to be ported easily and ideally automatically onto the new architecture.
While hardware typically has a limited lifespan of a few years at most, software may live almost forever. Sure, almost no one uses actual VT100 text terminals to communicate with the computer and the programs running back then, yet a lot of the software used today uses libraries and application frameworks that have their origin as far back as the 1970s. That software set the foundation of high-performance computing, the Internet, and security protocols used today, usually behind a shiny user interface. Even the good old VT100 terminal still lives on in the command lines of many popular operating systems (OSs) where the control sequences still behave as they did 40 years ago.
This is one reason why some platform architectures have endured over decades. While the hardware design and implementation may have changed substantially internally, the software-visible instruction set architecture (ISA) has endured and got incrementally extended without breaking backward compatibility to run the old programs, while other, more modern architectures were popular for a time but ultimately withered away as their performance advantage diminished. Software-compatible platforms came close enough to their levels to make binary software compatibility the overwhelming factor. Good examples are the x86 ISA, the ARM instruction architecture, or IBM’s System/360 ISA, the latter celebrating its 53rd anniversary and still in use.
How do you ensure the long-term viability of a platform architecture? You ensure that software written for the traditional architectures can run well and faster on it but also keep the software development tool chain like compilers, linkers, and development process familiar, so that the programmer doesn’t have to deal with two or more different software toolchains to get to performant software running on the platform.
Today’s extensive use of open-source software is an important factor, especially the GNU and LLVM-based compiler toolchains, readily available in open source repositories, and OSs like Linux, which are used as a foundation in embedded systems in various forms, sometimes “hidden away” (like in the case of Android). However, applications need to start and run without much delay, so it’s important that the compilation and time-expensive compiler code optimization to the accelerator doesn’t happen at the application’s load time (as often happens with many current accelerator APIs).
Most code optimization should happen once, when producing the application binary and then readily loaded and mapped to the accelerator. This needs a portable, accelerator-neutral ISA with fast transcription to the target accelerator ISA, instead of full compilation. Hence, it’s important to define a vendor-neutral ISA, which in the case of HSA is called HSA Intermediate Language (IL) or HSAIL. This IL represents a common ISA to target by compilers and is designed to be close to a data-parallel accelerator like a GPU, DSP or other hardware.
The source code written in a common high-level language like C++ or Python, be it an application framework or a popular application, will then produce code that’s defined in the IL. The compiler can apply all the extensive optimization steps to generate the intermediate code, which can then can be linked with other libraries, and even with modules written in different languages, such as C++, for some functions.
By integrating the IL as a binary section in the application binary (which is defined in an object format called BRIG), the program loader can then load both the host ISA and the accelerator code blocks in parallel and allow each to execute the program as written by the programmer without the end user seeing a difference from regular program load. Using the HSA run-time functionality, the software engineer can either target the HSA run-time directly or use an application interface or framework sitting on top of it, such as OpenCL.
But that’s not all. AMD has developed an open-source HSA run-time called Radeon Open Compute (ROCm) and added a portability layer called Heterogeneous Interface for Portability (HIP) that allows source code using proprietary CUDA APIs to compile and run on top of the ROCm run-time, while keeping source code compatibility. Alongside CodeXL, an open-source tool for profiling and debugging data parallel applications, this a powerful toolset to automatically port and run large application frameworks. While not using all ROCm features, it’s an easy way to take advantage of AMD’s HSA implementation without refactoring legacy code.
More information can be found in half-day HSA-focused tutorial at the HPCA/CGO conference in a couple of weeks.