Evolution of Embedded FPGA From Aerospace, Networking and Communications to Artificial Intelligence, and More

By Geoff Tate

CEO, Co-Founder & Board Member

Flex Logix Inc.

June 27, 2019

Blog

Evolution of Embedded FPGA From Aerospace, Networking and Communications to Artificial Intelligence, and More

This article will review the various generations of eFPGA, ending with the current features available today.

While FPGA chips have been available since the 1980s, embedded FPGA (eFPGA) only became available about five years ago. However, due to its amazing flexibility and feature set, it’s taken hold in the market very quickly. It’s now commercially available from multiple vendors, on a wide range of foundries (TSMC, GlobalFoundries, SMIC and Samsung) and on process nodes such as 180, 40, 28, 22, 16, 14, 12 and 7nm. eFPGA is also being used in a wide range of applications such as aerospace, communications, networking and most recently, artificial intelligence (AI.) Just a sample of the customers adopting it (just the announced ones) include Sandia National Labs, DARPA, Boeing, Harvard and MorningCore (the chip subsidiary of Datang Telecom of China).

In the short number of years that eFPGA has been available, the industry has already seen several generations of products. With each new generation, eFPGAs became more flexible and more usable for new applications, all driven by customer demand. In fact, the applications for eFPGA seem to be endless and in the future, this technology should be as pervasive as Arm processors are today. This article will review the various generations of eFPGA, ending with the current features available today.

First Generation

The first generation of eFPGA was relatively simple and typically used 4-input LUTs. To provide a range of array sizes, some suppliers designed their eFPGA as arrays of replicated tiles with top-level interconnect automatically connecting the tiles in an array-wide mesh interconnect. Programming was done in Verilog with command line interfaces.

Second Generation

The second generation of eFPGA added a range of features based on customer feedback from evaluating the first generation. These new features included:

  • 6-input LUTs just as in today’s state-of-the-art FPGAs. 6-input LUTs increased the density and performance of eFPGA.
  • The ability to integrate RAM of any kind/size between the rows of an array. This provided more localized memory for distributed computing. This was a very important addition since 1/3 of customers evaluating eFPGA were requesting this feature.
  • GUI interface for programming with various graphical tools to speed evaluation and development.
  • Timing extracted directly from the GDS database allowing performance to be evaluated at any PVT combination.
  • The ability to readback configuration bits (desirable in high reliability environments) and to rewrite configuration bits during operation.
  • DFT coverage of 99 percent +.
  • A new configuration load mode that reduced test times by approximately 100x.
  • Availability of evaluation boards for every new process with PC interfaces.

Using second generation eFPGA today, customers are working on designs that use Flex Logix’s EFLX eFPGA in multiple arrays per chip with array sizes of up to hundreds of thousands of LUTs.

Third Generation

As you can see from the above descriptions, the first couple of generations of eFPGA performed functions similar to FPGA chips. However, in the third generation of eFPGA, it started to do things that were not done in FPGA chips. An example of a third generation eFPGA is Flex Logix’s nnMAX, which is an optimized eFPGA for inference that has the following novel features:

  • It is programmed by high level neural network model languages such as Tensorflow Lite and ONNX. The nnMAX compiler does the lower-level Verilog programming automatically freeing the customer and achieving higher performance.

Runs neural network layers of about a billion MACs (multiply-accumulate-operations) per layer. nnMAX reconfigures the eFPGA interconnect and “soft logic” (control state machines) for the next layer then starts running again. In earlier eFPGA, reconfiguration was done serially, similar to FPGA chips. nnMAX instead does configuration in a highly parallel mode so that the full array is reconfigured in about 1,000 cycles at 1GHz operation.

Organizes AI-optimized MACs (8x8 Integer, 16x16 Integer and/or 16x16 floating point) into clusters of 64 each. In traditional eFPGA, each MAC is separately connected into the interconnect network. In AI workloads, matrix multiplies are very large so clustering is a logical approach to get higher density and more efficiently use scarce interconnect network resources.

  • Can run in a Winograd acceleration mode to accelerate 3x3 convolutions with a stride of 1 by 2.25x for Integer 8 operations
  • Capable of mixing precision between layers: hardware can convert from integer 8/16 to floating point and back as needed. This allows the model designer to maximize both throughput and maximize prediction accuracy.
  • Interconnect now has pipeline flops to enable achieving 1GHz throughput while adding only a few cycles of latency per layer. 

To date, the new features described above are only available for nnMAX inference eFPGA. However, depending on customer interest, some of these features may appear in a future third generation of general eFPGA as well.

The Special Sauce is in the Interconnect

While eFPGA was originally developed for applications such as communications and networking, over time companies such as Flex Logix figured out how to leverage the same core interconnect technology used in eFPGA to address the neural inferencing portion of the explosive AI market. In neural inferencing, the computation is primarily trillions of operations (multiplies and accumulates, typically using 8-bit integer inputs and weights, and sometimes 16-bit integer). The technology originally developed for eFPGA is ideally suited for inferencing because eFPGA allows for re-configurable, fast control logic for each network stage. SRAM in eFPGA is reconfigurable as needed in neural networks where each layer can require different data sizes. As an example, Flex Logix interconnects allow reconfigurable connections between SRAM input banks, MAC clusters, and activation to SRAM output banks at each stage. 

The Future

eFPGA is a very powerful and flexible technology, applicable to a wide range of markets and applications. It will continue to evolve as customers come up a learning curve of how to use eFPGA and continually ask suppliers to support new features and capabilities to improve its value proposition. Real customer needs will be the driving force behind the future evolution of eFPGA and it will be exciting to see how this technology continues to be evolved over time.

High throughput, low power, low cost neural network inference solutions. eFPGA available for TSMC 12/16/22/28/40 and GF 12/14nm. Silicon proven.

More from Geoff