PCI Express (PCIe), Ethernet, and InfiniBand (IB) all have coexisted by having clearly delineated roles: PCIe mainly as a chip-to-chip interconnect, and Ethernet and IB serving as system-to-system interfaces, especially in the high-performance computing space. There's a great deal of logic behind those boundaries and how the three technologies have coexisted as they have, so this delineation is expected to remain the case for some time. However, PCIe is ready for prime time in applications traditionally in the domain of Ethernet and IB - notably, within the rack.
Can PCIe really compete and win against Ethernet and IB? Engineers should benefit from an understanding of where PCIe, Ethernet, and IB coexist, and why PCIe is poised to make inroads in the other two's spaces.
InfiniBand (IB) was originally envisioned as a unified fabric to replace most of the other datacenter interconnects. While it did not achieve that goal, it has become popular as a high-speed clustering interconnect, replacing proprietary solutions that were in use until then.
Much like PCI Express (PCIe), IB has gone through a number of different speeds since its introduction. The initial speed is called Single Data Rate (SDR), which is the same effective data rate as PCIe Gen1, about 2 Gigabits per second (Gbps). It has since been enhanced to Double Data Rate (DDR) at 4 Gbps, Quad Data Rate (QDR) at 8 Gbps, and now to 13.64 Gbps with the Fourteen Data Rate (FDR) enhancement.
QDR is closest to PCIe Gen3 in terms of data rate for a single lane, and with a similar bandwidth and latency, a fabric based on PCIe will be able to provide similar performance to an IB solution at the same data rate. In addition to providing the same level of performance, PCIe is able to offer I/O device sharing using standard Single-Root I/O Virtualization (SR-IOV) hardware and software drivers, which IB does not offer. IB is primarily a high-speed clustering technology, so a PCIe-based fabric can achieve IB QDR-like performance and reduce the system cost and power of an equivalent IB solution.
Traditional systems currently being deployed in volume have several interconnect technologies that need to be supported. As Figure 1 shows, IB and Ethernet can serve as interconnects in a single system, in addition to other fabrics such as Fibre Channel (FC).
This architecture has several limitations:
- Existence of multiple I/O interconnect technologies
- Low utilization rates of I/O endpoints
- High power and system cost due to the need for multiple I/O endpoints
- I/O is fixed at build time with no flexibility to change later
- Management software must handle multiple I/O protocols with overhead
Using multiple I/O interconnect technologies increases latency, cost, board space, and power. This architecture would be somewhat useful if all the endpoints were being used 100 percent of the time, however they’re often underutilized, meaning there’s a costly overhead for that limited utilization. The increased latency is because the processors’ native PCIe interface needs to be converted to multiple protocols. However, designers can reduce such latency by leveraging that same native PCIe interface to converge all endpoints.
Clearly, sharing I/O endpoints is the solution to these limitations (Figure 2). This concept appeals to system designers because it lowers cost and power, improves performance and utilization, and simplifies design. Additional advantages of shared I/O are:
- As I/O speeds increase, the only additional investment needed is to change I/O adapter cards. In earlier deployments when multiple I/O technologies existed on the same card, designers would have to redesign the entire system, whereas in a shared-I/O model they can simply replace an existing card with a new one when an upgrade is needed for one particular I/O technology.
- Since multiple I/O endpoints don’t need to exist on the same cards, designers can either manufacture smaller cards to further reduce cost and power, or choose to retain the existing form factor and differentiate their products by adding multiple CPUs, memory, and/or other endpoints in the space saved by eliminating multiple I/O endpoints.
- Designers can reduce the number of cables that criss-cross a system. With multiple interconnect technologies comes the need for different (and multiple) cables to enable bandwidth and overhead protocols. However, with the simplification of design and the range of I/O interconnect technologies, the number of cables needed for proper functioning of the system is also reduced, thereby eliminating the complexity of design and delivering cost savings.
Implementing shared I/O in a PCIe switch is the key enabler for the architecture depicted in Figure 2. SR-IOV technology implements I/O virtualization in the hardware for improved performance, and makes use of hardware-based security and Quality-of-Service (QoS) features in a single physical server. SR-IOV also allows I/O-sharing by multiple guest Operating Systems (OSs) running on the same server.
PCIe offers a simplified way of achieving this by allowing all I/O adapters – based, for example, on 10 Gb Ethernet (GbE), FC, or IB – to be moved outside the server. With a PCIe switch fabric providing virtualization support, each adapter can be shared across multiple servers while at the same time provide each server with a logical adapter. The servers, or the Virtual Machines (VMs) on each server, continue to have direct access to their own set of hardware resources on the shared adapter. The virtualization that results allows for better scalability, as the I/O and the servers can be scaled independently of each other. Such virtualization reduces cost and power demands by avoiding over-provisioning of the servers or I/O resources.
Adding to the shared I/O implementation, PCIe-based fabric has enhanced the basic PCIe capability to include Remote DMA (RDMA), delivering very-low-latency, host-to-host transfers by copying information directly from the host application memory without involvement from the main CPU, thereby freeing up that CPU for more essential processing functions.
Table 1 provides a high-level overview of the cost comparison of PCIe, 10 GbE, and QDR IB, while Table 2 gives power comparisons of the three interconnect technologies.
Price estimates in Table 1 are based on a broad industry survey, and assume that pricing will vary according to volume, availability, and vendor relationships with regard to top-of-rack switches and adapters. These two tables provide a framework for understanding the cost and power savings of using PCIe for I/O sharing, principally through the elimination of adapters.
Making the switch to PCIe
PCIe has come to dominate the mainstream interconnect market for a range of reasons: versatile scalability, high throughput speeds, low overhead, and widespread deployment. PCIe can scale linearly for different bandwidth requirements, from x1 connections on server motherboards to x2 connections for high-speed storage to x4 and x8 connections for backplanes, and up to x16 for graphics applications. PCIe Gen3’s bidirectional 8 Gbps per link is more than capable of supporting shared I/O and clustering, which in turn empowers system designers with an unparalleled tool for optimizing design efficiency, as does PCIe’s simple, low-overhead protocol. Finally, PCIe is a truly ubiquitous technology, with virtually every device in a system having at least one PCIe connection.
This article focused on a variety of comparisons between PCIe, 10 GbE and IB QDR, notably in cost and power requirements, though it’s advisable to also consider other technical distinctions between these three industry standards. Nonetheless, with PCIe being native on practically all processors, designers can benefit from lower latency realized by eliminating the need for additional components used between a CPU and PCIe switch; with the new generation of CPUs, a PCIe switch can be placed directly off the CPU and reduce both latency and component cost.