For many years, Moore's law very accurately predicted improvements to CPU performance. You could expect a doubling in chip performance about every 18 months. Manufacturing improvements brought us smaller transistors, allowing more transistors to fit on a die without an increase in size. Currently, chips are available with structures as small as 10 nm, and foundries are preparing to introduce 7-nm technology.
As transistors continue to shrink, new challenges appear which make it harder for the industry to continue progressing at this speed. But there are other ways to increase system performance without increasing cost or power consumption.
For example, instead of using one universal computer architecture for all the tasks in an embedded system, a variety of specialized hardware components can be used. Consider the common task of rendering graphics. While CPUs are capable of rendering graphics, graphics processing units (GPUs), built specifically for this purpose, handle the task more efficiently.
ARM SoCs take full advantage of the concept of combining CPUs with additional hardware accelerators. Typically, such systems contain a GPU, video encoders and decoders, audio processors, cryptographic accelerators and other specialized hardware. There are many different SoC combinations from a range of manufacturers.
Some accelerators inside SoCs, like video decoders, have been commonplace for many years. They allow playing video with very little load on the CPU. Other components and concepts are newer, like the big.LITTLE concept from ARM. In this case, a single SoC contains different Cortex-A CPU cores. Some are optimized for maximum performance, like the Cortex-A72; others are optimized for low power consumption, like the Cortex-A53. All of them are Cortex-A application processors, so program code can be seamlessly moved from one core to another, depending on the workload.
Other SoCs, like the NXP i.MX7, take it a step further and deploy different classes of ARM cores. On the NXP i.MX7, two ARM Cortex-A7 application processor cores handle the general computing load. These cores are ideal to run an operating system (OS) like Linux with or without a graphical user interface.
The SoC also contains an ARM Cortex-M4 microcontroller. This core is less powerful than the A7; however, due to its simpler architecture, it draws less power. Its execution time is more predictable, with more deterministic latency. This makes it well suited to running a real-time OS (RTOS) to handle critical real-time tasks. Traditionally, an external microcontroller is required to assemble this kind of architecture. This complicates the hardware design. Alternatively, the real-time task can share the A7 cores with the rest of the system. However, technologies like the Linux RT patch or Hypervisors requires trade-offs (read more on this topic in the blog Developing Real-Time Systems on Application Processors.
Another interesting use case for this architecture comes low-power IoT applications. It’s possible to switch off the higher-performance Cortex-A7 cores while continuing to monitor the environment with the lower power M4 core. The M4 core can wake up the rest of the system when required. This strategy combines the advantage of microcontrollers with the versatility of an application processor. See this concept in action in a video.
This form of heterogeneous multi-core system is becoming increasingly popular. The upcoming NXP iMX8 QuadMax will contain two Cortex-A72 cores, four Cortex-A53 cores, and two Cortex-M4 cores. Other SoC providers have also produced conceptually similar designs.
This kind of ARM SoC is a good fit for many embedded applications due to its efficiency. However, integrating SoCs requires several external components, such as high-speed DDR RAM, flash storage, Ethernet PHYs, and complex power management circuitry to power the different HW accelerators and cores separately. This increases the initial development cost and risk of a project and has the largest impact on small- and medium-volume products. System on modules (SoMs), such as the Toradex Colibri and Apalis families, provide the latest ARM SoCs in an easy-to-use package and include all the common external components that accompany a SoC.
It's important to note that hardware alone is insufficient to take advantage of a heterogeneous system architecture. Most software is designed to operate on a single CPU architecture, rather than leveraging the multiple hardware architectures of a heterogeneous SoC. To maximize the SoC, the software must be optimized. Toradex recently produced a webinar that shows how to use a heterogeneous multi-core system combining ARM Cortex-A and Cortex-M cores.
Another application of heterogeneous system architecture that’s gaining interest is general-purpose computing on graphics processing units (GPGPU), as many modern SoCs come with powerful GPUs. The task of rendering graphics can be easily parallelized, so GPUs contain many simple processing cores. Interestingly, there are many other tasks that can be parallelized using these processing cores. NVIDIA's CUDA and the open standard OpenCL are common frameworks for programming GPUs for general-purpose tasks.
One technology that would benefit from GPGPU-accelerated computing is neural networks. Deep neural networks are currently in the spotlight within AI and machine learning. For example, DeepMind has AI technology that has beaten the best human players in go, an ancient East Asian board game. Current self-driving car technologies involve deep neural networks. And the technology powers voice assistants like Siri, Alexa, Watson, and Cortana.
It’s clear that the industry is adapting to new processing workloads. Despite Moore’s law slowing down, innovation in other areas promises a steady stream of exciting new developments.