Multiprocessing systems have been around since the beginning of time, or so it seems. Led by system and processor companies such as Intel, SGI, and Sun, multiprocessing has always had a home in workstations and servers. Recently, AMD and Intel have started down the multiprocessing path for PCs as well with innovations that make it more practical to put more than one PC processor core on a single die.
In the embedded market, the multicore approach has long been a viable technology. For example, Freescale and Texas Instruments have their dual-core MXC and OMAP architectures, respectively, that combine an ARM core and a DSP core. PMC-Sierra has its dual-core MIPS64-compatible RM9000 and recently announced the 1.8 GHz RM11200. Freescale has recently jumped on board with its dual-core PowerPC 8641D that integrates two e600 CPU cores. The list of multiprocessor devices goes on and on.
The need for multiprocessing
The best-known methods of multiprocessing are referred to as Symmetric MultiProcessing (SMP), and homogenous multiprocessing. These methods are represented by systems with two or more identical processors, or systems with a processor with two or more identical cores.
The obvious benefit of these methods is a theoretical doubling of performance (or multiplication by a factor of n, where n is the number of processors). This is useful for increasing scalability, improving system density, boosting processing power without the incremental costs of support chips, and providing true concurrency. SMP has recently seen an increase in popularity for PC applications, as AMD and Intel have hit the operating frequency barrier due to power consumption issues.
Multiprocessing can also be implemented using an asymmetrical or heterogeneous method. Asymmetrical MultiProcessing (AMP) allows the system designer to use the processor best suited for a specific task or group of tasks. For mobile phone applications, the previously mentioned MXC and OMAP architectures use the ARM processor for running the application code and user interface, and the DSP processor for modem functions and accelerating multimedia algorithms such as MPEG-4 decode.
Hardware-based multithreading is a virtual method of multiprocessing that takes advantage of a single processor’s built-in hardware support to simultaneously run multiple concurrent tasks. Useful for implementing fine-grained multithreading, this method of multiprocessing is a good solution for the mismatch between processor speed and memory bandwidth. The basic premise behind multithreading is to minimize idle CPU cycles by executing several instruction streams simultaneously. When a thread encounters a cache miss, subsequent threads are activated to avoid any stall cycles.
Benefits for a wide range of applications
Theoretically, the hardware realization of any of these multiprocessing methods is relatively straightforward, but the true art of the deal is in making multiprocessing transparent to the system designer. In other words, the ultimate goal is to allow system designers to implement multiprocessing applications with no extra effort (or at least a minimal amount of effort). Hence, the partitioning burden is placed on the operating system, compiler, and other software tool vendors. A big challenge for any of these tool vendors relates to the wide variety of applications where multiprocessing techniques can be applied. For example, in the consumer market, multiprocessing can be applied to the set-top box, telematics, smart phones, and gaming platforms. In the networking market, multiprocessing is useful for symmetrical packet processing, TCP termination offload, security processing, and Ethernet drivers (MACs).
Obviously, this wide variety of applications implies a wide variety of hardware and software multiprocessing techniques. The wide variety also implies a huge challenge in deriving industry-standard benchmarks to measure and compare the capabilities of the different processor and system solutions.
A simple (albeit ineffective) solution to benchmarking would be to measure the total throughput available from running multiple instances of either the same or different applications, while eliminating any inter-application dependencies. However, to create realistic usage scenarios of running multiple independent applications, any benchmark should run different applications to stress the platform’s ability to support the multiple cache contexts associated with multiple applications, as opposed to highlighting the ability to cache a single application across multiple processors.
A good multiprocessing benchmark could exploit a single application that has multiple task activities that are spread across the available processing resources. These applications can be identical, such as in networking, where the applications support multiple identical streams.
Alternatively, in a consumer-level device, the application could be processing multiple different streams (encode/decode for both audio/video streams), while processing network packets (TCP/IP), and controlling a user interface. These models of multiprocessing place very different load characteristics on the hardware that must maintain a consistent memory image of that application across the multiple processors.
Another good multiprocessing benchmark can be derived by using a single task that can be parallelized to be scalable across multiple instruction contexts. This type of benchmark must stress the system in terms of fine-grained synchronization access to shared resources.
Multiprocessing is a hot topic and represents a significant growth area for the embedded industry. EEMBC has embarked on a mission to develop industry-standard benchmarks that address the various methods of multiprocessing for the embedded market. Deviating from the consortium’s standard procedure, it will be necessary to run these benchmarks on top of a common operating system API. Similar to the consortium’s current mode of operation, these new benchmarks will follow an application-centric approach, although the choice of specific applications has not yet been determined. Stay tuned!
Markus Levy is founder and President of EEMBC. He is also Technical Editorial Director and Analyst at ConVergence Promotions. Mr. Levy received several patents while at Intel for flash memory architecture and for flash memory drives.
EEMBC – the Embedded Microprocessor Benchmark Consortium – was formed in 1997 to develop meaningful performance benchmarks for embedded system hardware and software. Contact the EEMBC directly for membership and certification information.
. . . . .
Markus Levy is founder and President of EEMBC. He is also Technical Editorial Director and Analyst at ConVergence Promotions. Mr. Levy received several patents while at Intel for flash memory architecture and for flash memory drives.
EEMBC – the Embedded Microprocessor Benchmark Consortium – was formed in 1997 to develop meaningful performance benchmarks for the hardware and software used in embedded systems. Contact the EEMBC directly for membership and certification information. Read more here.