GPU compute is making an impact in embedded applications

May 12, 2016

Blog

GPU compute is making an impact in embedded applications

A graphics processing unit (GPU) is better equipped for forming images at higher resolution and faster frame rate as compared to a central processing...

A graphics processing unit (GPU) is better equipped for forming images at higher resolution and faster frame rate as compared to a central processing unit (CPU) because the GPU features hundreds of compute units that can process thousands of data sets in parallel. The parallel data structure and high thread count make GPUs inherently more suitable for applications like medical imaging and video games that demand compute-heavy features, such as concurrent visualization and interactive segmentation.

Multi-core processor designs housing both a CPU and a GPU have existed for many years. In fact, almost every notebook, smartphone, and tablet PC now boasts a multi-core processor with an integrated GPU and many other accelerators for audio, networking and other features. However, in these multi-core processor designs, a GPU usually doesn’t access application memory directly and thus acts as a slave to the CPU.

A few years ago, AMD introduced the concept of an accelerated processor unit (APU) that incorporates cache-coherent memory for both the CPU and GPU inside the processor. The idea of combining the two processing units on the same bus to increase the processor throughput eventually led to the creation of the Heterogeneous System Architecture (HSA) Foundation in 2012.

The set of standards and specifications within HSA facilitate the common bus and shared memory for the CPU, GPU, and other accelerators in a bid to make these vastly different architectures work in tandem. Industry leaders like AMD, ARM, MediaTek, and Texas Instruments are part of this effort that marks a significant break from the existing multi-core processor design approach.

1. HSA takes the existing heterogeneous computing to the next level.

For a start, HSA 1.0 aimed to unlock the GPU potential in embedded computing by automating an offload of calculations from the CPU to the GPU, and vice-versa. By enabling software to efficiently dispatch tasks to the GPU with much lower latency and with dramatically reduced overhead, HSA allows the GPU tasks to directly and securely access data in system memory via the shared virtual memory feature (SVM) and walk data structures in application process memory (ptr-is-ptr). And this can all now be done without requiring host CPU provisioning of data buffers as previously required in legacy GPU compute APIs.

Upcoming releases of the HSA standards integrate Digital Signal Processors (DSP) into the architecture and also improve the efficient interoperation with non-HSA enabled programmable and fixed-function accelerators in the system.

Next up, while HSA is a great foundation for general-purpose GPU (GPGPU) APIs like OpenCL, with its fine-grain & coarse grain shared virtual memory features, many high-level languages have been ported and optimized to natively target HSA platforms, including C++ 17, GCC, LLVM/CLANG, and Python. Work is also ongoing to optimize software frameworks such as CAFFE, BLAS, CHARM++, FFT, Sparse, FLAME, and Docker to make it easier for developers to efficiently program and use heterogeneous parallel devices directly.

This new level of processor efficiency created by these heterogeneous compute environments is reinvigorating industries like medical and print imaging. Until recently, medical imaging products, which entail compute-intensive jobs such as image registration, image segmentation, and image de-noising, have largely been compromising the frame rate at the cost of image quality.

Enter HSA with its innovative mechanisms for assigning different loads to different processing cores, leading to efficient computing with strong visualization and image fidelity. Extensive resources are now available to assist developers with adapting or creating new applications to take advantages of heterogeneous architectures. These include the HSA Foundation GitHub repository and the Radeon Open Compute Solutions GitHub. The latter extends the HSA programming model to high-performance discrete GPUs and includes the powerful open source debugging and profiling tools available in CodeXL 2.0.

The computationally intensive medical segment can benefit from the GPU acceleration to enhance the execution of algorithms specific to applications like MRI, PET, ultrasound, and microscopy.

2. GPU acceleration offers exceptional speed to efficiently fulfill medical imaging’s unique data throughput and post-processing needs.

Embedded TechCon, taking place June 7-8 in Austin, Texas, offers a pair of tutorials and an expert panel on how HSA is transforming the heterogeneous design environment by effectively dealing with professional workloads, and how it will impact medical and print imaging segments.

Specifically, the tutorials are The Heterogeneous System Architecture – A foundation for the next generation of heterogeneous computing, and GPU compute in medical and print imaging, while the panel is titled Heterogeneous Systems Architectures: Power, performance, and programming for the future.

Paul Blinzer is an AMD Fellow in the System Software group and chairperson of the System Architecture Workgroup of the HSA Foundation. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

Paul Blinzer, Fellow, AMD, Chairperson, System Architecture Workgroup of the HSA Foundation
Categories
Processing