Flex Logix announced availability of its InferX X1, an AI inference chip for edge systems. The InferX X1 offers ideal performance of neural network models such as object detection and recognition, and other neural network models, for robotics, industrial automation, medical imaging, and more.
Customers can use YOLOv3 in their products in robotics, bank security, and retail analytics because it offers ideal accuracy for object detection and recognition algorithm. Additionally, customers can use YOLOv3 for custom models they have developed for a range of applications that need more throughput at lower cost. Flex Logix has benchmarked models for these applications and demonstrated to these customers that InferX X1 provides the needed throughput and lower cost.
The InferX X1 silicon area is 54mm2 which is 1/5th the size of a US penny. InferX X1’s high-volume price enables high-quality, high-performance AI inference to be implemented in mass market products.
InferX X1’s software makes it ideal to adopt. The InferX Compiler takes models in TensorFlow Lite or ONNX to program the InferX X1.
Based on multiple Flex Logix proprietary technologies, the InferX X1 features a new architecture that, per the company, achieves more throughput from less silicon area. Flex Logix’s XFLX double density programmable interconnect is already used in the eFPGA (embedded FPGA) that Flex Logix has supplied to multiple customers including Dialog, Boeing, Sandia National Labs, and Datung Telecommunications. This is combined with a reconfigurable Tensor Processor consisting of 64 1-Dimensional Tensor Processors that are reconfigurable to implement the wide range of neural network operations. Because reconfiguration can be done in microseconds, each layer of a neural network model can be optimized with full-speed data paths for each layer.
InferX X1 mass production chips and software will be available Q2 2021. Customer samples and advance Compiler and Software Tools will be available in Q1 2021. Customers with Neural Network Models in TensorFlowLite or ONNX with volume applications in 2021 may contact Flex Logix now for performance benchmarking, early sampling and tool access, and detailed specifications and pricing.
Technology Details and Specifications:
- High MAC utilization up to 70% for large models/images translates into less silicon area/cost
- 1-Dimensional Tensor Processors (1D TPUs) are a 1D systolic array
- 64 byte input tensor
- 64 INT8 MACs
- 32 BF16 MACs
- 64 byte x 256 byte weight matrix
- One dimensional systolic array produces an output tensor every 64 cycles using 4096 MAC operations
- Reconfigurable Tensor Processor made up of 64 1D TPUs per X1
- TPUs can be configured in series or in parallel to implement a wide range of tensor operations; this flexibility enables high performance implementation of new operations such as 3D convolution
- Programmable interconnect provides a full speed, non-contention data path from SRAM through the TPUs to SRAM
- eFPGA programmable logic implements high speed state machines that control the TPUs and implement the control algorithms for the operators
- Each layer of a model is configured exactly as needed; reconfiguration for a new layer takes just microseconds
- DRAM traffic bringing in the weights and configuration for the next layer occurs in the background during compute of the current layer; this minimizes compute stalls
- Combining two layers in one configuration (layer fusion) minimizes DRAM traffic delays.
- Minimal memory keeps cost down: LPDDR4x DRAM, 14MB total SRAM
- x4 PCIe Gen 3 or Gen 4 provides rapid communication with the host
- 54 mm2 die size in 16nm process
- 21 x 21 mm flip-chip Ball Grid Array package
The InferX X1 is sampling soon to selected customers and production is expected in the second quarter of 2021. Pricing ranges based on configuration and volumes from $34 - $199.
For more information, visit: https://flex-logix.com
About the AuthorFollow on Twitter Follow on Linkedin Visit Website More Content by Tiera Oliver