Next-generation multicore SoC architectures for tomorrow's communications networks

December 1, 2012 OpenSystems Media

3IT managers are under increasing pressure to boost network capacity and performance to cope with the data deluge. Networking systems are under a similar form of stress with their performance degrading as new capabilities are added in software. The solution to both needs is next-generation System-on-Chip (SoC) communications processors that combine multiple cores with multiple hardware acceleration engines.

The data deluge, with its massive growth in both mobile and enterprise network traffic, is driving substantial changes in the architectures of base stations, routers, gateways, and other networking systems. To maintain high performance as traffic volume and velocity continue to grow, next-generation communications processors combine multicore processors with specialized hardware acceleration engines in SoC ICs.

The following discussion examines the role of the SoC in today’s network infrastructures, as well as how the SoC will evolve in coming years. Before doing so, it is instructive to consider some of the trends driving this need.

Networks under increasing stress

In mobile networks, per-user access bandwidth is increasing by more than an order of magnitude from 200-300 Mbps in 3G networks to 3-5 Gbps in 4G Long-Term Evolution (LTE) networks. Advanced LTE technology will double bandwidth again to 5-10 Gbps. Higher-speed access networks will need more and smaller cells to deliver these data rates reliably to a growing number of mobile devices.

In response to these and other trends, mobile base station features are changing significantly. Multiple radios are being used in cloud-like distributed antenna systems. Network topologies are flattening. Operators are offering advanced Quality of Service (QoS) and location-based services and moving to application-aware billing. The increased volume of traffic will begin to place considerable stress on both the access and backhaul portions of the network.

Traffic is similarly exploding within data center networks. Organizations are pursuing limitless-scale computing workloads on virtual machines, which is breaking many of the traditional networking protocols and procedures. The network itself is also becoming virtual and shifting to a Network-as-a-Service (NaaS) paradigm, which is driving organizations to a more flexible Software-Defined Networking (SDN) architecture.

These trends will transform the data center into a private cloud with a service-oriented network. This private cloud will need to interact more seamlessly and securely with public cloud offerings in hybrid arrangements. The result will be the need for greater intelligence, scalability, and flexibility throughout the network.

Moore’s Law not keeping pace

Once upon a time, Moore’s Law – the doubling of processor performance every 18 months or so – was sufficient to keep pace with computing and networking requirements. Hardware and software advanced in lockstep in both computers and networking equipment. As software added more features with greater sophistication, advances in processors maintained satisfactory levels of performance. But then along came the data deluge.

In mobile networks, for example, traffic volume is growing by some 78 percent per year, owing mostly to the increase in video traffic. This is already causing considerable congestion, and the problem will only get worse when an estimated 50 billion mobile devices are in use by 2016 and the total volume of traffic grows by a factor of 50 in the coming decade.

In data centers, data volume and velocity are also growing exponentially. According to IDC, digital data creation is rising 60 percent per year. The research firm’s Digital Universe Study predicts that annual data creation will grow 44-fold between 2009 and 2020 to 35 zettabytes (35 trillion gigabytes). All of this data must be moved, stored, and analyzed, making Big Data a big problem for most organizations today.

With the data deluge demanding more from network infrastructures, vendors have applied a Band-Aid to the problem by adding new software-based features and functions in networking equipment. Software has now grown so complex that hardware has fallen behind. One way for hardware to catch up is to use processors with multiple cores. If one general-purpose processor is not enough, try two, four, 16, or more.

Another way to improve hardware performance is to combine something new – multiple cores – with something old – Reduced Instruction Set Computing (RISC) technology. With RISC, less is more based on the uniform register file load/store architecture and simple addressing modes. ARM, for example, has made some enhancements to the basic RISC architecture to achieve a better balance of high performance, small code size, low power consumption, and small silicon area, with the last two factors being important to increasing the core count.

Hardware acceleration necessary, but …

General-purpose processors, regardless of the number of cores, are simply too slow for functions that must operate deep inside every packet, such as packet classification, cryptographic security, and traffic management, which is needed for intelligent QoS. Because these functions must often be performed in serial fashion, there is limited opportunity to process them simultaneously in multiple cores. For these reasons, such functions have long been performed in hardware, and it is increasingly common to have these hardware accelerators integrated with multicore processors in specialized SoC communications processors.

The number of function-specific acceleration engines available also continues to grow, and more engines (along with more cores) can now be placed on a single SoC. Examples of acceleration engines include packet classification, deep packet inspection, encryption/decryption, digital signal processing, transcoding, and traffic management. It is even possible now to integrate a system vendor’s unique intellectual property into a custom acceleration engine within an SoC. Taken together, these advances make it possible to replace multiple SoCs with a single SoC in many networking systems (see Figure 1).

Figure 1: SoC communications processors combine multiple general-purpose processor cores with multiple task-specific acceleration engines to deliver higher performance with a lower component count and lower power consumption.

In addition to delivering higher throughput, SoCs reduce the cost of equipment, resulting in a significant price/performance improvement. Furthermore, the ability to tightly couple multiple acceleration engines makes it easier to satisfy end-to-end QoS and service-level agreement requirements. The SoC also offers a distinct advantage when it comes to power consumption, which is an increasingly important consideration in network infrastructures, by providing the ability to replace multiple discrete components in a single energy-efficient IC.

The powerful capabilities of today’s SoCs make it possible to offload packet processing entirely to system line cards such as a router or switch. In distributed architectures like the IP Multimedia System and SDN, the offload can similarly be distributed among multiple systems, including servers.

Although hardware acceleration is necessary, the way it is implemented in some SoCs today may no longer be sufficient in applications requiring deterministic performance. The problem is caused by the workflow within the SoC itself when packets must pass through several hardware accelerators, which is increasingly the case for systems tasked with inspecting, transforming, securing, and otherwise manipulating traffic.

If traffic must be handled by a general-purpose processor each time it passes through a different acceleration engine, latency can increase dramatically, and deterministic performance cannot be guaranteed under all circumstances. This problem will get worse as data rates increase in Ethernet networks from 1 Gbps to 10 Gbps, and in mobile networks from 300 Mbps in 3G networks to 5 Gbps in 4G networks.

Next-generation multicore SoCs

LSI addresses the data path problem in its Axxia SoCs with Virtual Pipeline technology. The Virtual Pipeline creates a message-passing control path that enables system designers to dynamically specify different packet-processing flows that require different combinations of multiple acceleration engines. Each traffic flow is then processed directly through any engine in any desired sequence without intervention from a general-purpose processor (see Figure 2). This design natively supports connecting different heterogeneous cores together, enabling more flexibility and better power optimization.

Figure 2: To maximize performance, next-generation SoC communications processors process packets directly and sequentially in multiple acceleration engines without intermediate intervention from the CPU cores.

In addition to faster, more efficient packet processing, next-generation SoCs also include more general-purpose processor cores (to 32, 64, and beyond), highly scalable and lower-latency interconnects, nonblocking switching, and a wider choice of standard interfaces (Serial RapidIO, PCI Express, USB, I2C, and SATA) and higher-speed Ethernet interfaces (1G, 2.5G, 10G, and 40G+). To easily integrate these increasingly sophisticated capabilities into a system’s design, software development kits are enhanced with tools that simplify development, testing, debugging, and optimization tasks.

Next-generation SoC ICs accelerate time to market for new products while lowering both manufacturing costs and power consumption. With deterministic performance for data rates in excess of 40 Gbps, embedded hardware is once again poised to accommodate any additional capabilities required by the data deluge for another three to four years.

David Sonnier is a technical fellow in system architecture for the Networking Solutions Group of LSI Corporation.

LSI Corporation

Follow: @LSICorporation Facebook YouTube

David Sonnier (LSI Corporation)
Previous Article
Top Embedded Distributor: Q&A with Russell Rasor, Mouser Electronics
Top Embedded Distributor: Q&A with Russell Rasor, Mouser Electronics

Mouser Electronics is a global distributor for semiconductors and electronic components. Mouser stocks a wi...

Next Article
RapidIO: Optimized for low-latency processor connectivity
RapidIO: Optimized for low-latency processor connectivity

Interconnect architectures reflect the problems they are designed to solve. Focusing too heavily on raw ban...