Embedded virtualization enables scalability of real-time applications on multicore

August 1, 2011 OpenSystems Media

4The advent of multicore processor technology has the potential to revolutionize the way embedded systems are designed. While different technologies are being developed to solve the problem of distributing application functionality among the processors on a multicore chip, the most promising emerging technology from an embedded systems perspective is embedded virtualization. Using global object networking, embedded designers can scale applications with a software platform that maintains determinism, enables upgradability, and reduces development costs.

Embedded virtualization has several positive implications for OEMs. For example, once there is a means for splitting up applications to run on multiple cores while maintaining determinism, the solution can subsequently enable real-time applications to scale the number of cores they use, upward or downward. With scalability, OEMs can offer a range of price/performance options for their products without requiring changes to the software.

Virtualization is not a new concept in computer science, but it has gathered new interest with the advent of multicore processors. Though virtualization is recognized as a way to keep multiple processor cores busy, it’s important to note that most types of server or client virtualization are not designed to meet the needs of time-critical embedded processing. These approaches to virtualization most often treat all processors on a multicore chip the same way. In these systems, a single Operating System (OS) assigns tasks to processors as they become available in an attempt to keep all processors as heavily loaded as possible with processing tasks.

Server and client virtualization typically virtualize all hardware including the I/O interfaces. When an I/O interface needs servicing, the Virtual Machine Monitor (VMM) fields the request and passes the results to the OS clients it is supporting. There is no way to ensure that a particular OS client is loaded when an I/O that belongs to that client requires service, nor is there a global way of associating a particular I/O with a client OS and the application running on it. Consequently, there is no way to guarantee exactly how much time will be required to handle an I/O event; hence, this approach is not appropriate for handling real-time processing in an embedded system.

Embedded system designers want direct control of the system to obtain determinism and consistent performance. While it is desirable to balance overall processor utilization and keep multicore processors as busy as possible, this is not the top priority. First and foremost, embedded designers are looking for software technology that helps them maintain determinism while adding features and/or reducing the cost of their OEM products.

Embedded designers are looking for a software platform that enables them to combine OSs of different types so that processing can be optimized for the tasks at hand – for example, real-time OSs to handle critical I/O timing requirements and General-Purpose OSs (GPOSs) to leverage COTS graphics-rich applications that run human-directed functions. They are also looking for solutions to scale applications so they can provide different products using the same application code base. This reduces engineering development cost, improves time to market, and more importantly, enables new products to be based on tried and proven software that can be continuously upgraded in performance and reliability.

Embedded virtualization preserves determinism

For an embedded application to be deterministic, it must be engineered as such from the inception of the development project. Determinism is not something that can be added at the end. Special considerations must be made to ensure that application threads have direct control of the I/O interfaces on which they depend.

Figure 1 shows a pick-and-place assembly system in which TenAsys’ INtime for Windows Real-Time Operating Systems (RTOSs) are hosted on three cores of a quad-core processor and the Human Machine Interface (HMI) is hosted on the fourth core running Microsoft Windows. The real-time tasks running on different CPUs communicate when required via the global object network. This automated assembly system includes three real-time subsystems: a vision system that guides an assembly robot, the multi-axis robot, and the material transport system that indexes components into place for assembly and then carries off assembled units. The ideal way to develop and debug this type of application to ensure that each component performs as reliably as required is to split the application into separate components.

21
Figure 1: Embedded systems can save cost and preserve real-time responsiveness while adding features by hosting multiple OSs on a multicore processor.

For example, the HMI would be a separate application module that might run in a non-real-time environment such as Windows. This requires an OS environment that can partition the platform resources – I/O, memory, interrupts, and CPU core (in the case of a multicore processor platform) – to allow the application modules to run totally independently from each other. An embedded virtualization environment supports this, allowing different real-time tasks to run on specific processor cores. The operating software partitions physical I/O interfaces so that an interrupt coming from one of the devices only interrupts the processor handling that device. This ensures predictable response times for handling real-time events.

The hosting of multiple OS environments on a multicore chip is managed either by software called an embedded virtualization manager or by special embedded virtualization functionality contained in the RTOS.

Embedded virtualization can be implemented in different ways, depending on the amount of hardware virtualization support provided by the processor. Paravirtualization solutions use software techniques to modify the guest OSs, allowing them to work side by side with-out affecting each other or compromising the system’s real-time responsiveness. Implementations provide varying degrees of platform partitioning and have typically been limited to running two OSs at a time on a platform – an RTOS and a GPOS. Some implementations have evolved to the point where the GPOS doesn’t require any modification and the latest version of the GPOS is readily supported. This is a real plus when the object of coupling the GPOS to the RTOS is to make use of legacy RTOS application software as is, without any modification, while adding an HMI based on an OS like Windows that leverages the latest HMI development tools.

Over the years some of these implementations have been optimized to provide the best performance for a particular combination of OSs. The downside to this is that each implementation is specific to the particular combination of OSs, and it is an impractical approach to providing a generic virtualization solution to support multiple OS combinations.

Hardware-assisted virtualization features like VT (supported by Intel processors) that have recently become available eliminate some of the software complexity of paravirtualization by providing hardware assistance built into the processor. By using the hardware virtualization support, a VMM can be constructed to function without knowledge of the guest OS. As a result, the VMM can support any OS targeted for that platform.

Intel processor features like VT-x (a subset of the VT features) ensure that any memory address issued by a guest OS is automatically mapped to the appropriate address location in physical memory. Likewise, the hardware-assisted virtualization feature called VT-d automatically maps I/O memory accesses for bus-master DMA devices, enabling native I/O drivers that are part of the guest RTOS application to be used without modification in the virtualized environment. These hardware-assist features substantially reduce the complexity of a VMM and make embedded virtualization a more viable solution.

Making scalability work in real-time applications

While embedded virtualization provides the ideal environment for enabling real-time applications to be split into individual independently operating components, breaking up applications creates the need for a mechanism to support Inter-Process Communications (IPC).

In the past, designers often set up Ethernet links between application subsystems and used TCP/IP stacks to communicate between the subsystems, but this method is cumbersome, slow, sometimes unreliable, and adds uncertainty to the system’s behavior, affecting determinism.

A better IPC approach is to use a concept called global object networking. A global object network provides a managed communication environment with built-in initiation and discovery services, enabling an application to be dynamically distributed across one or several CPUs at load time. Processes requiring services of other processes are found automatically, and a local manager records their location to keep track of established IPC links. If a communications link or targeted process fails, the manager notifies an initiating process. In addition, the local manager keeps the system clean by clearing up all records when the IPC links are no longer required by the initiating process. Because the global object network is integrated with the OS, its overheads are low and it does not require the application developer to create any custom software. It is deterministic and substantially more efficient than traditional IPC interfaces.

The OS manages the location and existence of global objects through which processes pass information to ensure the system’s integrity. For instance, objects used by a process across several processors are “kept alive,” and only removed when all the processes have terminated. This requires an underlying management infrastructure to ensure that an object is not removed prematurely, or that it isn’t removed causing memory leakage through poor cleanup of unused resources. Likewise, in the event that a processing node goes down before its processes are terminated, the manager needs to inform the global object managers on all the other processing nodes that they should clean up any local references to objects on the down node.

An example of GOBSnet communications

Figure 2 shows a highly simplified view of the software architecture of the system depicted in Figure 1. In addition to the RTOS and the real-time process software, Cores 0-2 also run the INtime GOBS manager software, whose function is to manage global object communications. (GOBSnet is TenAsys’ global object network supported by the company’s INtime Distributed RTOS and INtime for Windows RTOS.)

22
Figure 2: GOBSnet facilitates communication between real-time processes running on different processor cores.

With GOBSnet, a process on one core can communicate with another process on another core using a global memory object. The initiating process, “Process 1” in Figure 2, creates the memory object and catalogs it in that core’s root process. After this is done, processes running on other processing nodes can find the memory object, resulting in an efficient shared memory interface for all processes to use. The same applies to all ranges of objects (including semaphores and mailboxes) for IPC use.

The second step is for other processes to locate the memory object and obtain its memory location. This is done by specifying the processor name to start the search. When the process finds the object, it stores its location, type, and parameter in a reference object in the processor node’s GOBS manager and keeps the handle of that reference object. From then on, when a remote process (Process 2 or Process 3 in this example) wants to write or read to the memory object, it uses the reference object’s handle to retrieve the appropriate memory object information to access it.

When Process 2 or Process 3 terminates its node’s GOBS manager, the manager clears all reference object information about the remote memory object. When Process 1 terminates, it removes all objects it has created, including the memory object. In cases where that is not the optimum action, there is the option of creating a memory object with a counter. The counter is incremented every time another process connects to the memory object and is decremented every time one of those associated processes is terminated. The result is that the memory object is removed only when all the processes connected to it are terminated. This allows the situation where remote processes, Process 2 and Process 3, can continue passing information via the memory object even though the initiating process, Process 1, has terminated.

GOBSnet communications can be used whether the real-time application is spread among CPUs on the same multicore chip or among separate CPU cores on different microprocessor components that are networked together (Figure 3). By designing systems around this flexible system software architecture, embedded system developers have headroom to grow their products’ processing power or shrink it accordingly to meet the challenges of the future.

23
Figure 3: GOBSnet provides a scalable means of IPC whether the processes are located on different cores of a multicore chip or on entirely different processor platforms.

Kim Hartman is VP of TenAsys Corporation. He has worked in the embedded market focusing on hardware analysis tools and RTOS products for 26 years, first at Tektronix and then at RadiSys before cofounding TenAsys in 2000. He is a Computer Engineering graduate of the University of Illinois, Urbana-Champaign, and received his MBA from Northern Illinois University.

TenAsys Corporation 503-748-4720 Kim.Hartman@tenasys.com www.tenasys.com

Kim Hartman (TenAsys Corporation)
Previous Article
Encryption 101: Choosing the right scheme

This intro to encryption offers some of the pitfalls that can derail the inexperienced user. About six year...

Next Article
Embedded Application Frameworks: Simplifying the development of M2M devices
Embedded Application Frameworks: Simplifying the development of M2M devices

A helping hand from Embedded Application Frameworks eases design pressures for M2M developers.