Reconfigurability – the ability to change a system's functionality after it has been deployed – not only helps designers react to last-minute design changes but also enables them to prototype ideas before implementation and upgrade designs in the field. In today's environment of rapid change, this capability offers a tremendous competitive advantage and often gets more computation done for each watt of system power. Tom explores how current trends in FPGA designs are providing greater flexibility and fulfilling the requirements of even the most demanding applications.
As applications become more complex, reconfigurable computing must evolve to address the industry's shifting needs. Flexibility is becoming all the more important to address fluctuating customer desires, and systems are required to be more powerful than ever before. Signal processing applications, for example, must progress to track and interpret signals from much longer distances and merge data from multiple types of sensors, such as infrared and ultraviolet.
These applications will require ever-increasing levels of image compression and compute power as well as enhanced intelligence for evaluating data. Of course, traditional performance computing applications such as weather modeling and computational chemistry also demand more computation power. And, with power and cooling becoming greater concerns, the 100 W or more required for a Graphics Processing Unit (GPU) can be problematic in traditional computing centers and prohibitive in other applications, including satellites and unmanned aircraft.
FPGAs directly address reconfigurable computing requirements by offering a flexible platform that can keep pace with emerging standards. Intellectual Property (IP) functions and configurable processors speed development while new, powerful software tools decrease latency, increase bandwidth, and reduce gate usage. FPGAs accomplish all this with a significant advantage over their alternatives in terms of low-power operation and heat dissipation.
IP cores boosting performance
Today, one of the easiest paths to reconfigurable computing is using IP cores such as configurable processors, which can offer adjustable cache size, multipliers, dividers, hard logic, and custom instructions. Some configurable processors also support accelerators that can be automatically converted to hardware, thus improving productivity and substantially increasing embedded software performance. Designers can simply add peripherals or processors to create the exact design that fits their needs.
These processors have the added benefit of being obsolescence proof; the design can be ported to new silicon even if a device becomes obsolete, protecting the designer's investment in the software. However, these configurable processors were never meant to handle massive computations. They typically add value to reconfigurable computing applications as controllers, coordinating interactions between specialized blocks with dedicated computing functions.
Development is under way for other IP such as scalable, configurable processing and high-performance computing architectures that address the needs of customized data paths, protocol processing, digital signal processing, and image processing (see Figure 1). These functions allow engineers to replace sequential computations with customized pipelines and parallel data paths for higher performance and efficiency. Soft vector processors also are currently in development.
Acceleration via parallelism
Furthermore, FPGAs are now offering more compute power while consuming significantly less power than alternative solutions. Key to reconfigurable computing is the move away from the traditional model of computing algorithms one at a time in sequence to distributing algorithms spatially across the configurable computing fabric. Speed does not come from performing many operations in rapid succession, but rather from performing operations in parallel using pipelining, broadside parallelism, or a combination of both (see Figure 2). FPGAs also allow designers to customize pipelines and memory access models, capabilities not available with GPUs. The higher bandwidth is ideal for streaming data in communications applications.
Applications in FPGA accelerators typically run near 100 MHz, but higher clock speeds can be reached through more design effort. With optimization, FPGAs can achieve impressive speedups for applications that take advantage of their strengths, including:
- Fine-grained parallelism with 1,000-plus independent hardware multipliers and arithmetic units, all of which can run concurrently
- Low computation overhead, where indexing and fetches can be pipelined, operands can be stored in independent memory banks, and termination testing can occur in parallel with arithmetic functions
- Memory concurrency with 1,000-plus independently addressable RAM buffers
- Fast, fine-grained communication with on-chip communication running at full chip speed and typically with latencies of just a few cycles
Floating-point compilers increasing efficiency
In addition to hardware, new tools now resolve some of the challenges to using programmable logic in designs. FPGAs have always offered almost unlimited flexibility in data flow architectures and therefore provide an ideal method for implementing arithmetic functions or accelerating a system by offloading a data path that cannot be implemented optimally in a processor. However, FPGAs have previously experienced difficulty achieving the complexity and precision of floating-point operations, especially for double-precision applications.
A new floating-point compiler has been developed to efficiently map floating-point data paths to generic FPGA architectures. This floating-point compiler achieves efficiency gains by fusing together large subsections of a data path, clustering similar operations, and optimizing the interface between clusters of successive operators.
This allows multiple precisions - integer, single, and double - to exist within a single data path, giving generic FPGAs a significant efficiency advantage over simple component-based systems. With typical logic savings of 50 percent in logic utilization and a similar reduction in latency, generic FPGAs can easily support floating-point capability with the flexibility to implement a wider range of operator mixes (such as a larger ratio of adder/subtractor to multipliers), while maintaining the processing power to support an application using a data path.
The reconfigurable advantage
In today's world of rapidly changing technology and customer requirements, the ability to enhance functionality after a design has been deployed in the field is critical. With the latest advances in technology, including more compute power, higher bandwidth, decreased latency, and reduced gate usage, plus sustained double-precision GFLOPS per W up to 1.5 available now (with 2.0 GFLOPS per W expected in the next year), reconfigurable computing with FPGAs can make the difference in getting to market before the competition and ultimately, in ensuring a product's success.
Tom VanCourt is a senior member of the technical staff at San Jose-based Altera Corporation, where he develops system-building tools and champions performance computing on FPGAs. Tom has spent more than 25 years in the industry with DEC, HP, and other companies, and taught at Boston University, where he earned his PhD in Computer Systems Engineering. His interests include FPGA-based computing in finance, life science, medical imaging, and other application areas.