|
In this article, Peter Carlston will discuss the existing issues surrounding signaling plane performance and will describe a sample measurement methodology based on Intel’s approach.
Peter will also provide results from particular studies, and will explain how developers can choose the appropriate signaling plane processor for their next-generation RNC project.
Increasing Radio Network Controllers’ (RNCs) overall performance poses a difficult challenge because RNCs must perform many different tasks under severe time constraints. They control radio equipment at scores or even hundreds of antenna sites (Node Bs). They set up links between individual User Equipment (UE) devices and the voice and/or data portions of networks and then route the high-speed, high-bandwidth traffic. They must do this for several hundred thousand active users at once while also ensuring extremely high levels of reliability.
RNCs are built to proprietary or standards-based form factors. Equipment manufacturers are clearly moving toward common form factors. Some are extending their purpose-built building practices across multiple segments of their product lines. Others are using the PICMG 2.16 standard. Intel, however, is seeing a marked shift toward the new PICMG Advanced Telecommunications and Computing Architecture (AdvancedTCA) standard. Several important manufacturers have publicly endorsed AdvancedTCA, and others will make more announcements during 2004.
Whatever the form factor, RNCs have three basic types of blades that communicate with each other over a switched ATM or Gigabit Ethernet backplane:
One basic blade type, the control plane processing card, sets up and tears down connections between end points, manages the radio resources from Node B to the UE, and performs other control functions within the RNC. Control plane cards typically utilize general-purpose processors.
Line cards, a second blade type, exchange voice, data, and control information with equipment at the antenna sites and with various parts of the core network. Line cards are classified as user plane elements.
Radio network layer (RNL) cards fall into the third basic blade type category and process the voice/data traffic once the end-to-end connection has been established. These user plane processing cards do not terminate external interfaces as the line cards do. As we shall see, network processors offer significant performance advantages for both line card and RNL traffic processing.
Control plane processing encompasses three distinct functions, as Table 1 shows.
Although this article does not address shelf and systems management functionality, I do want to note that industry standards groups such as the Service Availability Forum and the Network Processor Forum have made impressive headway defining standard interfaces that will bring the benefits of modular hardware and software to RNC systems management.
Figure 1 shows a schematic of the signaling and user plane software stacks between the Node Bs and the Serving GPRS Support Node (SGSN) in the Packet-Switched Data Network (PSDN).
Table 1 defines Radio Resource Management (RRM) as an RNC Application. RRM allocates channels between Node Bs and UEs, determines when handoff between antennas or RNCs occurs, controls individual channels’ power levels, and handles link performance.
Carrier-grade Linux
Classified as a soft real-time operating system, Linux achieves response times of around 10 milliseconds or less. Linux capabilities are proving to be a good fit for signaling software. And of course Linux is a natural choice for RNC applications and services, which typically take advantage of multi-processor, SMP architectures.
Software engineering developers and managers should be aware of two Linux tools from Intel that can significantly increase performance. Intel has recently released the Intel C++ Compiler (icc) 8.0 for Linux. This compiler provides optimized features for Intel Xeon and Intel Pentium M processors, including support for Hyper-Threading Technology on Intel Xeon processors. It is source- and object-code compatible with gnu C version 3.2.
The Intel VTune Performance Analyzer allows developers to focus their optimization efforts by quickly homing in on bottlenecks and hotspots.
Proof-of-concept RNC: Phase one
Developing a proof-of-concept RNC using our own and third-party building blocks has enabled Intel to measure the component performance in a functioning RNC system. The following paragraphs summarize the results of these studies and show how they can help architects choose the appropriate processor types for the various subsystems that make up next-generation RNC designs. Although this article focuses on RNCs, Base Station Controllers (BSCs) perform similar functions and differ mostly in software. So, much of the following content applies to BSC design as well.
System developers measure performance tests and ratings using specific computer systems and/or components, and the results reflect the approximate performance as measured by those tests. Any difference in system hardware or software design or configuration might affect actual performance. Therefore, buyers should consult additional information sources for further evaluation as needed.
Intel’s proof-of-concept RNC project aims to demonstrate how manufacturers can use modular products to build RNCs that support up to 500,000 subscribers. To do this, we first developed a traffic model. Then we simulated user plane performance on the Intel IXP2400 and 2800 network processors. RNC proof-of-concept project engineers then used this information to architect, design, and build user plane hardware and software products. While this work was going on, we built the first phase proof-of-concept RNC to characterize the ATM transport and signaling plane hardware and software.
AdvancedTCA hardware was not available when we began integration, so we built phase one using Intel PICMG 2.16 chassis and blades specifically the ZT5504 Low Voltage (LV) Intel Pentium III SBCs for signaling and application processing. Table 2 summarizes phase one hardware and software. ATM/AAL2/AAL5 line card software ran on an Intel IXP2400 network processor development system. RNL network processor microcode was also not available at the time, so we ran its RLC/MAC and FP protocols on another LV Pentium III SBC. To measure performance and show meaningful functionality, we ran complete UE and Node B stacks on a Linux PC.

We simulated the SGSN and the rest of the PSDN equipment using a Tektronics* [what does the * reference?] K1297-G20 3G protocol analyzer. The Tektronics also bridged the ATM transport surrounding the RNC with the Ethernet transport used beyond it. Trillium supplied all software except the line card microcode. We launched phase one at the CTIA trade show in March 2003. At the show we demonstrated how the UE GUI, simulated by a Microsoft Media Player laptop, could request streaming video from a media server located beyond the Tektronics SGSN. The RNC’s signaling software set up the radio access bearer channel and then the video was streamed down to the player.
Phase one signaling plane performance
We simplified the demo setup for the phase one signaling plane performance measurements. We replaced the K1297 with a Linux PC running the Trillium SGSN and related core network stacks. We also changed the transport into and out of the RNC from ATM to Ethernet. The rest of the hardware and software was as shown in Table 2. We compiled the single-threaded stacks with gcc options g c o Wall Wno-comment fpic DANSI DSS_LINUX.
We took measurements at seven representative points during the single-call setup process and extrapolated the results assuming the processors engaged in no other activity. Results were extremely encouraging, but it would be relatively meaningless to report them here since the stacks were not optimized or even compiled with icc. Correct compiler options and C-level optimizations typically increase signaling performance by a large factor. Intel will investigate these compiler and optimized code factors under load conditions during phase two.
We next measured signaling stack performance on LV Pentium M and LV Xeon processors to discover the approximate performance gains we could expect when moving to these newer processors. For these tests we replaced the two PICMG 2.16 blades with two Pentium M and then two Xeon processor based systems. The LV Pentium M processors ran at 1.6 GHz and had 1 Mbyte of L2 cache. The Xeon processors ran at 2.4 GHz and had 512 Mbytes of L2 cache. All systems had 1 Gbyte of PC 1600 RAM and used the Intel E7501 chipset with a 400-MHz processor side bus.
We found that at all seven measurement points the signaling performance was 40 to 55 percent faster on the Intel Pentium M processor than on the Intel Xeon processor. Although this result might seem somewhat surprising given the Xeon processors’ faster clock speed, it should be noted that developers optimized the Xeon and Pentium M architectures to address different processing characteristics. Optimized for server and workstation designs, the Intel Xeon micro-architecture has excellent floating-point capabilities, an instruction pipeline optimized for repetitive operations, and memory pre-fetch algorithms that optimize sequential memory access. Outstanding signaling plane performance, however, requires a different set of characteristics. The processor must be able to randomly access small amounts of data within large in-memory databases. Processing is often exclusively integer-based, and nondeterministic. Furthermore, most signaling software is not multi-threaded and thus unable to capitalize on the SMP Intel Xeon environment dual processor performance advantages. Based on our own and other measurements done within Intel, we concluded that dual Intel Xeon processors will maximize performance for multi-threaded higher level RNC applications and services, but single-threaded signaling software should be run on Intel Pentium M-based processors.
Our RNC signaling results dovetail with another study that show Intel Pentium M architecture advantages for communications type processing. Intel performed MEGACO parsing studies comparing the same Pentium III and Pentium M processors (also running Linux) with 1 Gbyte RAM. Messages were in compact text mode, UDP, one message per transaction, and were measured over a ten-minute interval. We normalized the results to show transactions parsed per second per MHz. We found an approximate 43 percent performance gain from the Intel Pentium III processor to the Intel Pentium M processor on a megahertz for megahertz basis.
Table 3 summarizes the processing characteristics and recommended Intel embedded processor for RNC control plane signaling and applications/services subsystems.
Phase two performance and architecture
Intel’s traffic model is based on UMTS Forum Report Number 6, which predicts multimedia and voice ratios and services in 2005. We have also added specific values based on several major equipment manufacturers’ input. A few of the traffic model’s more important parameters are:
- Packet-Switched (PS) traffic: BHCA = .30, 190-second session duration, 20 percent activity factor, 384 Kbits/sec DL data rate during activity (349 Kbits/sec user net bit rate), 10 kbits/sec UL data rate during activity (0.09 Kbits/sec user net bit rate)
- Circuit-Switched (CS) traffic: BHCA 1.00, 120-second session duration, 6.1 Kbits/sec effective DL/UL data rate (50 percent of 12.2 Kbits/sec)
- Control traffic: Transport SAP consisting of NBAP, RRC, RANAP, and ALCAP (Iub and Iu-CS) stacks; Control SAP
- CS/PS active sessions ratio = 10.53 (CS BHCA x session duration/PS BHCA * activity factor * session duration)
- Soft/softer handovers per connection = 10
We factored these assumptions with modeling functionality results from within the Intel Developers’ Workbench, a third-generation software development framework and instruction cycle modeling tool for our network processors. Our testing team simulated all line card and radio network layer card protocols. The radio network layers RLC and MAC along with the Kasumi ciphering algorithm are by far the most processor-intensive user plane protocols. We found that Kasumi alone required five micro-engines, so we moved Kasumi processing to an FPGA mezzanine card. This enabled us to use all 16 micro-engines in the Intel IXP2800 to perform RLC, MAC, and FP processing and service the FPGA. The eight micro-engines in the IXP2400 proved more than adequate for the user plane line cards.
We de-rated overall results by 30 percent to allow instruction cycles for code fixes, upgrades, and high availability support. The final results indicated that it would require only 12 user plane blades, many times less than alternate designs require, to serve 500,000 subscribers.
The signaling and user plane studies enabled us to finalize the design of the phase two proof-of-concept as shown in Table 4.
Note that all user plane cards have Intel-developed and integrated software. The Intel X-Scale core on these cards uses the Linux operating system to handle such things as memory access, slow path, and exception processing.
Phase two will launch with Intel Xeon processor-based signaling blade SBCs. However, we plan to move to AdvancedTCA SBCs based on Intel Pentium M processors from Intel Communications Alliance members as soon as they are available.
TietoEnator’s Signaling and Media Group will provide phase two signaling and application software stacks. Signaling software will interface with the Intel data plane stacks using a set of Intel Framework Application Program Interfaces (FAPIs). The Intel FAPIs will conform to the Network Processor Forum’s RNC FAPIs and transport mechanism when those standards are ratified. Figure 2 illustrates the phase two user and signal plane software architecture.
We have now been able to run performance tests on working phase two user plane hardware. The performance numbers have completely validated the modeling studies. In some cases performance is turning out to be even greater than the modeling tool predicted.
Follow-on studies
Intel plans a number of detailed studies to characterize the performance of the phase two system. The first study will give guidelines about the number of signaling plane blades needed to support the 500,000-subscriber user plane shelf. Other work will include using the icc version 8 compiler and the Intel VTune tool to optimize signaling plane performance. We will also measure the signaling/apps performance under load conditions. We also plan studies to determine a high performance distributed control plane architecture, where some signaling software modules reside on the line cards’ mezzanine cards. Finally, we will measure the performance of the complete system under load.
Later phases of the project will implement standards-based, high availability interfaces as defined by the relevant AdvancedTCA and Service Availability Forum specifications. We will then demonstrate and characterize third-party management middleware performance and the overall platform’s manageability .
. . . . .
Online References
Peter Carlston is the wireless platform architect and a staff technical marketing engineer in the Embedded Intel Architecture Division. He has been the technical lead on the Intel proof-of-concept RNC project since its inception. He has held a wide variety of systems and software engineering positions at Intel and Unisys.
Disclaimer: Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Source: Intel Corporation. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit http://www.intel.com/performance/resources/limits.htm.
|