PCI Express: No longer just chip-to-chip

Can PCIe really compete and win against Ethernet and IB? Engineers should benefit from an understanding of where PCIe, Ethernet, and IB coexist, and why PCIe is poised to make inroads in the other two's spaces.

InfiniBand (IB) was originally envisioned as a unified fabric to replace most of the other datacenter interconnects. While it did not achieve that goal, it has become popular as a high-speed clustering interconnect, replacing proprietary solutions that were in use until then.

Much like PCI Express (PCIe), IB has gone through a number of different speeds since its introduction. The initial speed is called Single Data Rate (SDR), which is the same effective data rate as PCIe Gen1, about 2 Gigabits per second (Gbps). It has since been enhanced to Double Data Rate (DDR) at 4 Gbps, Quad Data Rate (QDR) at 8 Gbps, and now to 13.64 Gbps with the Fourteen Data Rate (FDR) enhancement.

QDR is closest to PCIe Gen3 in terms of data rate for a single lane, and with a similar bandwidth and latency, a fabric based on PCIe will be able to provide similar performance to an IB solution at the same data rate. In addition to providing the same level of performance, PCIe is able to offer I/O device sharing using standard Single-Root I/O Virtualization (SR-IOV) hardware and software drivers, which IB does not offer. IB is primarily a high-speed clustering technology, so a PCIe-based fabric can achieve IB QDR-like performance and reduce the system cost and power of an equivalent IB solution.

Current architecture

Traditional systems currently being deployed in volume have several interconnect technologies that need to be supported. As Figure 1 shows, IB and Ethernet can serve as interconnects in a single system, in addition to other fabrics such as Fibre Channel (FC).

**Figure 1:** Shown here is a traditional system with PCI Express, InfiniBand (IB), and Ethernet interconnect technologies.

This architecture has several limitations:

Existence of multiple I/O interconnect technologies
Low utilization rates of I/O endpoints
High power and system cost due to the need for multiple I/O endpoints
I/O is fixed at build time with no flexibility to change later
Management software must handle multiple I/O protocols with overhead

Using multiple I/O interconnect technologies increases latency, cost, board space, and power. This architecture would be somewhat useful if all the endpoints were being used 100 percent of the time, however they’re often underutilized, meaning there’s a costly overhead for that limited utilization. The increased latency is because the processors’ native PCIe interface needs to be converted to multiple protocols. However, designers can reduce such latency by leveraging that same native PCIe interface to converge all endpoints.

Clearly, sharing I/O endpoints is the solution to these limitations (Figure 2). This concept appeals to system designers because it lowers cost and power, improves performance and utilization, and simplifies design. Additional advantages of shared I/O are:

As I/O speeds increase, the only additional investment needed is to change I/O adapter cards. In earlier deployments when multiple I/O technologies existed on the same card, designers would have to redesign the entire system, whereas in a shared-I/O model they can simply replace an existing card with a new one when an upgrade is needed for one particular I/O technology.
Since multiple I/O endpoints don’t need to exist on the same cards, designers can either manufacture smaller cards to further reduce cost and power, or choose to retain the existing form factor and differentiate their products by adding multiple CPUs, memory, and/or other endpoints in the space saved by eliminating multiple I/O endpoints.
Designers can reduce the number of cables that criss-cross a system. With multiple interconnect technologies comes the need for different (and multiple) cables to enable bandwidth and overhead protocols. However, with the simplification of design and the range of I/O interconnect technologies, the number of cables needed for proper functioning of the system is also reduced, thereby eliminating the complexity of design and delivering cost savings.

**Figure 2:** An I/O system using PCI Express (PCIe) for shared I/O reduces cost, improves performance, and simplifies design.

Implementing shared I/O in a PCIe switch is the key enabler for the architecture depicted in Figure 2. SR-IOV technology implements I/O virtualization in the hardware for improved performance, and makes use of hardware-based security and Quality-of-Service (QoS) features in a single physical server. SR-IOV also allows I/O-sharing by multiple guest Operating Systems (OSs) running on the same server.

PCIe offers a simplified way of achieving this by allowing all I/O adapters – based, for example, on 10 Gb Ethernet (GbE), FC, or IB – to be moved outside the server. With a PCIe switch fabric providing virtualization support, each adapter can be shared across multiple servers while at the same time provide each server with a logical adapter. The servers, or the Virtual Machines (VMs) on each server, continue to have direct access to their own set of hardware resources on the shared adapter. The virtualization that results allows for better scalability, as the I/O and the servers can be scaled independently of each other. Such virtualization reduces cost and power demands by avoiding over-provisioning of the servers or I/O resources.

Adding to the shared I/O implementation, PCIe-based fabric has enhanced the basic PCIe capability to include Remote DMA (RDMA), delivering very-low-latency, host-to-host transfers by copying information directly from the host application memory without involvement from the main CPU, thereby freeing up that CPU for more essential processing functions.

Table 1 provides a high-level overview of the cost comparison of PCIe, 10 GbE, and QDR IB, while Table 2 gives power comparisons of the three interconnect technologies.

**Table 1:** PCI Express (PCIe) I/O sharing interconnect architectures enable cost savings of more than 50 percent in comparison with 10 Gigabit Ethernet (GbE), and Quad Data Rate (QDR) InfiniBand (IB) alternatives.

**Table 2:** PCI Express (PCIe) I/O sharing interconnect architectures enable power savings of more than 50 percent in comparison with 10 Gigabit Ethernet (GbE), and Quad Data Rate (QDR) InfiniBand (IB) alternatives.

Price estimates in Table 1 are based on a broad industry survey, and assume that pricing will vary according to volume, availability, and vendor relationships with regard to top-of-rack switches and adapters. These two tables provide a framework for understanding the cost and power savings of using PCIe for I/O sharing, principally through the elimination of adapters.

Making the switch to PCIe

PCIe has come to dominate the mainstream interconnect market for a range of reasons: versatile scalability, high throughput speeds, low overhead, and widespread deployment. PCIe can scale linearly for different bandwidth requirements, from x1 connections on server motherboards to x2 connections for high-speed storage to x4 and x8 connections for backplanes, and up to x16 for graphics applications. PCIe Gen3’s bidirectional 8 Gbps per link is more than capable of supporting shared I/O and clustering, which in turn empowers system designers with an unparalleled tool for optimizing design efficiency, as does PCIe’s simple, low-overhead protocol. Finally, PCIe is a truly ubiquitous technology, with virtually every device in a system having at least one PCIe connection.

This article focused on a variety of comparisons between PCIe, 10 GbE and IB QDR, notably in cost and power requirements, though it’s advisable to also consider other technical distinctions between these three industry standards. Nonetheless, with PCIe being native on practically all processors, designers can benefit from lower latency realized by eliminating the need for additional components used between a CPU and PCIe switch; with the new generation of CPUs, a PCIe switch can be placed directly off the CPU and reduce both latency and component cost.

Krishna Mallampati is Senior Director of Product Marketing, PCIe Switches at PLX Technology in Sunnyvale, CA.

PLX Technology

@PLX_Technology