Warning: Thrills ahead with NextGen processor technologies

December 15th, 2009

3You can ignore it, but it won’t go away: More cores – lots more cores – are coming to a processor near you. Here’s an analysis from the Linley Group of what’s happening now in “many-core” trends.

If mastering multicore processors has given you the same heart-in-the-throat feeling as a roller coaster ride, be warned: There are more twists and turns ahead. Two trends are emerging now – processors with more than 32 cores and processors implementing an approximation of a data-flow architecture.

In several years, most processor suppliers will need to offer chips with 32 or more CPUs if they want to increase their products’ performance. The factors that prompted the shift from single CPU to dual core and then to multicore processors still hold true and will lead to ever-more CPUs per chip and per system.

In some respects, the expertise gained from turning serial software originally written for a single CPU to a parallelized program running on, say, eight CPUs will apply when porting it to 32, 64, or 100 CPUs. However, software developers will have to learn new skills, making it progressively harder to achieve greater degrees of parallelism.

Moreover, CPU organization on a chip will become apparent to programmers, particularly those who are developing performance-sensitive or low-level code. A shared bus is too slow for a processor with many cores because bandwidth does not scale with the number of cores, as Figure 1 shows. A switch uses too much on-chip real estate.

Figure1
Figure 1: Bus bandwidth remains constant as CPUs are added, limiting scaling.

A mesh, however, provides scalable bandwidth at modest die-area cost. In a mesh, each CPU has a connection to its north, south, east, and west neighbors. Bandwidth scaling is achieved because each time a new row or column is added to the array of CPUs, a new set of connections is added as well.

In a mesh-based processor like the one shown in Figure 2, data sent from one CPU node to another travels hop by hop via intermediate nodes. Low-level software such as the OS kernel or lightweight executives must be conscious of the CPUs’ row and column addresses to direct the data. Transfers to far CPUs will take more hops than transfers to near CPUs. Programmers must map their code to CPUs in such a way that code blocks communicating to each other will run on neighboring CPUs to minimize mesh bandwidth latency and consumption.

Figure2
Figure 2: Mesh bandwidth increases as a new row or column of CPUs is added.

Additionally, processors that implement aspects of data-flow design are emerging. In the traditional CPU-centric approach to an application such as a VPN server, a program implements the application, calling different functions at different stages. These functions might in turn call a hardware engine, but the application remains in control of operations and data flow.

On the contrary, in a data-flow architecture, characteristics of the data determine its flow. Whether implemented by a hardware engine or software routine, each stage processes a block of data if it matches certain criteria. Based on these criteria, it then sends the data to the next stage. Different data can pass through engines in a different sequence. Data moves from engine to engine without intervention by a CPU-based application. A central control program’s main task is to train each engine how to process data – a one-time job.

This data-driven approach to processing is efficient but requires programmers to take an inside-out way of viewing an application. Instead of thinking about the program flow, programmers must think about the data flow.

Tilera is one company that makes high-speed processors with more than 32 CPUs. Offering 36- and 64-core processors, the company has a roadmap to a 100-core version. A mesh connects each CPU core.

Data-flow processors are equally rare. Network processors from Xelerated implement a type of data-flow processor. A forthcoming multicore processor from LSI will also optionally support data-flow operation, combining multiple PowerPC CPUs and hardware engines derived from the company’s network processor.

The roller coaster track that took the industry from single-core to multicore programming and chip design is headed for “many-core” designs. Whether the track moves toward data-flow design or veers in another direction is uncertain. Either way, software, system, and chip developers must gird themselves for more thrills before the coaster returns to a plateau or stops.

Joseph Byrne is a senior analyst at The Linley Group, based in Mountain View, California. With more than 15 years of experience, he is one of the industry’s leading analysts covering the semiconductor market. His prior experience includes positions at Gartner, Deloitte, and SMOS Systems. He holds a BS in Electrical Engineering and Computer Science from Duke University and an MBA from the University of Michigan.

The Linley Group

408-281-1947

joeb@linleygroup.com

www.linleygroup.com

Topics covered in this article

Silicon, software, and strategies for embedded devices
Embedded Computing Design magazine is the resource for engineers, architects, and decision makers involved with embedded devices. Topics explored span silicon, software, and strategies for designing and connecting with small devices in mobile, automotive, home, industrial, and medical applications. Departments include Deep Green discussing the latest in energy efficient, low power designs and applications. Content is available in print, E-letter, E-cast, white papers, video, RSS, social networks, and more. Subscriptions are free of charge.
©MMXIIEmbedded Computing Design.
An OpenSystems Media publication.