2019 Embedded Processor Report: The Evolving State of Signal Processing

January 24, 2019

Story

2019 Embedded Processor Report: The Evolving State of Signal Processing

IoT and AI benefit from the cost and flexibility of general-purpose CPUs, but also need advanced signal processing to clean analog inputs and perform MAC operations with high precision and efficiency.

The DSP revolutionized the field of signal processing back in the 1980s by reducing noise, improving accuracy, and easing programming for engineers working with analog signals. Over the next few decades, DSPs advanced to provide greater performance, floating-point computation, and extreme optimization for specific types of workloads.

Then the IP licensing model took off and embedded developers quickly realized the versatility and cost benefits of off-the-shelf CPU cores. Just the CPU and some application-specific peripherals sufficed for many applications, while more highly specialized systems could integrate chips with soft cores for functions like FFTs, encoding, and decoding.

Fast forward, and many of today’s IoT and machine learning devices require a blend of old and new. Not only do these systems benefit from the low cost and flexibility enabled by general-purpose CPUs, they also need advanced signal processing capabilities to clean analog signals and perform fast MAC operations at high levels of precision and power efficiency.

So where does that leave DSP solutions today?

“It’s no secret that the market for discrete DSPs has not flourished like other, more heterogeneous processor types through the last several years,” says Dan Mandell, senior analyst for IoT and Embedded Technology at VDC Research. “The biggest players in embedded DSPs such as Analog Devices, NXP, and Texas Instruments have all placed a much greater focus on their offerings for microcontrollers, which often provide cost-effective mixed-signal processing and control at a low cost for high volumes.

“However, DSPs provide much greater performance at low power for emerging applications ranging from automotive radar and other imaging or sensing, V2X and general functional communications, machine vision, video surveillance, and more,” Mandell continues. “Automotive and industrial applications requiring mid- to high-performance fixed- and floating-point signal processing are looking more and more towards DSP solutions.

“We are seeing growing demand for DSPs again in a number of industries,” he adds.

A DSP architecture for every use case

One reason for the decline in discrete DSP solutions over the past two decades is that low-cost CPU cores have become increasingly proficient in handling signal processing tasks themselves. For instance, Arm’s Cortex-M4, Cortex-M7, Cortex-M33, and Cortex-M35P processors include DSP extensions and an optional floating-point unit (FPU). The cores also support 2 x 16-bit or 4 x 8-bit SIMD instructions, enabling parallelism in the computation of video, audio, speech and other signals.

“The performance level of today’s microprocessors can easily handle many signal processing workloads that have been, in the past, relegated to specialized DSPs,” says Rhonda Dirvin, senior director of marketing programs for the Embedded and Automotive Line of Business at Arm. “There is a growing need to have these processing capabilities on low-power, lower-cost microcontrollers.

“There are unique use cases out there today that play well to the microcontroller space,” she continues. “For example, the always-on keyword spotting of smart speakers is a very good use case. The microcontroller is running various echo- and noise-cancellation algorithms, and once the keyword is detected, it can use microphone beamforming to gather the rest of the audio more clearly. This scenario requires the low-power nature of a microcontroller in addition to the processing capabilities for signal processing.”

For moderate signal processing workloads, integrating specialized DSPs alongside CPU cores in a custom SoC is an increasingly popular option. Mike Demler, senior analyst at The Linley Group, notes that CEVA’s recently released CEVA-BX hybrid DSPs bring higher performance digital signal processing capabilities alongside CPU cores that are “equivalent or even superior to some Arm cores in [the EEMBC’s] CoreMark per MHz” rating. Demler also notes that Synopsys has continually enhanced the DSP capabilities of its ARC cores in recent years, most notably by adding DSP options to its ARC HS CPU family.

A good indicator of the trend toward heterogeneous compute architectures is at Texas Instruments, one of the original pioneers of digital signal processing. Today, SoCs like the Sitara AM57x bring a heterogeneous multicore design based on a combination of Arm Cortex-A15 application processors, C66x Series DSPs, real-time microcontrollers, GPUs, and machine learning accelerators that can all be tuned for various tasks (Figure 1).

Figure 1. The Texas Instruments Sitara AM574x SoCs are a line of massively heterogeneous processors with multiple CPU, DSP, GPU, microcontroller, and specialized accelerator cores.

“Heterogeneous architectures that have both Arm and DSP processors are becoming increasingly popular where high computation blocks can be offloaded to the DSP,” says Mark Nadeski, marketing manager for Catalog Processors at Texas Instruments. “Optimized performance to reduce cost or power are still key care-abouts in the embedded space. This can often be done by offloading work to a DSP.”

Solving the programming problem

One potential area of concern with heterogenous architectures is a lack of programming familiarity within the embedded-engineering workforce. On one hand, engineers being trained when discrete DSPs were the rage are beginning to age into retirement, which could result in a lack of experience among the general-purpose-educated developers.

To supplement signal processing development on its general-purpose CPU platforms, Arm offers a suite of CMSIS-DSP libraries that include basic math functions like vector add/multiply; fast math functions like sine, cosine, and square root; transforms, such as FFT functions; matrix functions; filters; motor control functions; and so on.

However, Dirvin realizes that “there is still a gap in the general developer community on knowing which algorithms to use where and how to apply those algorithms in their system.” In response, Arm and its ecosystem partners offer tools that run “what-if” scenarios, provide code generation, and build coefficient tables to help engineers work with the various algorithms.

MathWorks’ MATLAB, for example, offers a graphical tool that can help developers ease into the intricacies of signal processing. Figure 2 shows an FIR filter running on an Arm Cortex-M device to filter two sine waves of different frequencies.

Figure 2. MathWorks’ MATLAB is one of many tools that helps ease the development of signal processing applications. Shown here in MATLAB is a FIR filter running on an Arm Cortex-M device.

“The good news is that there are a lot of software tools that make developing applications easier than ever, but there’s still no replacing an analog/signal-processing or neural-network expert for the tough problems,” Demler observes.

Programming discrete DSPs has also evolved over the years and become easier for designers. Texas Instrument’s C66x family of DSPs, for example, can be programmed entirely in C code or with C libraries, Nadeski says.

“The C compiler is so efficient that programming at the assembly level isn’t necessary. For times when programming at the assembly level is necessary, designers can use optimized libraries that are callable through an open software architecture like OpenCL,” he explains.

Deep neural nets and next-gen signal processing

Precision inputs are pivotal to many of the advanced applications mentioned earlier and have enabled a renaissance of sorts in DSP technology, and currently no field of engineering requires more higher-accuracy signals than artificial intelligence and machine learning. Because the market for neural network processing is still in its infancy, DSPs have an opportunity to provide stopgap functionality in the short term and the potential to carve out a larger niche in the future.

“Traditional AI and ML algorithms like decision trees, random forests, and so on require a lot of if/then-type processing, which fits well on general purpose processors,” Nadeski says. “However, neural networks are needed to perform deep learning – the next evolution of AI and ML – functions, and are heavily based on convolutions. These run much more efficiently on DSPs and dedicated accelerators like the Embedded Vision Engine (EVE) subsystems available in the Sitara AM57x processors than they do on general purpose processors.

“As neural networks work their way into more and more embedded products, architectures like DSPs that can perform efficient implementations of a neural network inference may become more popular,” Nadeski suggests.