COMPUTEX. Arm has released a suite of IPs that include the Arm Cortex-A77 CPU, Arm Mali-G77 GPU, and Arm Machine Learning (ML) processor.
The Cortex-A77 offers a 20 percent instruction per clock (IPC) performance improvement over its predecessor and 35x the machine learning performance of the Cortex-A75. It also delivers a 20 percent improvement in integer performance, 35 percent better floating point performance, and 15 percent higher memory bandwidth.
The IP is slated for 7 nm process technology, and includes several microarchitecture enhancements, including:
- Branch Prediction: Double the branch prediction bandwidth, 4x the L1 branch target buffer (BTB) capacity, and 33 percent more L2 BTB
- Memory: High bandwidth, low latency fetch operations and dynamic code optimization through Macro-op (Mop) cache; dynamic data prefetching based on memory subsystem configuration; and twice the dedicated load-store issue bandwidth
- Execution: 50 percent increase in integer execution bandwidth enabling up to six instructions per cycle; a 25 percent increase in out-of-order window size to 160 instructions; and a second AES encryption pip has been added
The Arm Machine Learning (ML) processor is a neural processing unit (NPU) that provides up to 5 Tera Operations Per Second per watt. Based on the Winograd architecture that consists of fixed-function engines for executing convolutional layers and programmable layer engines for non-convolutional layers, the Arm ML processor delivers 225 percent more performance on common filters than competing NPUs.
Key features of the Arm ML processor include:
- Network Types: Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are supported for classification, object detection, speech recognition, natural language processing, and other edge artificial intelligence (AI) applications.
- Heterogeneous Compute: With optimizations for use with Cortex-A CPUs and Mali GPUs
- Multicore Scalability: Up to eight NPUs and 32 TOPS in a cluster or 64 NPUs in a mesh configuration
- Software and Framework Support: The Arm ML processor integrates with TensorFlow, TensorFlow Lite, Caffe, Caffe 2, and other frameworks via the ONNX ecosystem. It is also compatible with the Arm NN software development kit (SDK).
The new Mali-G77 GPUs are based on the Valhall architecture, which offers an enhanced microarchitecture engine, load store caches, and texture pipes. These upgrades result in a 40 percent performance improvement, 30 percent better density, 30 percent increase in energy efficiency, and 60 percent greater machine learning inferencing performance over the previous generation. Arm expects this to result in a 40 percent upgrade in peak graphics performance.
Highlights of the Valhall microarchitecture include:
- Wider Execution Engines: Two 16-wide execution engines, delivering 32 fused multiply-adds (FMA) per core (two clusters of 16 FMAs per execution engine per core)
- Quad Texture Mapper: With four texels per cycle, providing double the throughput of the Mali-G76
- Dynamic Instruction Scheduling: The schedular decides which instructions should be executed from which warps. This is handled completely in hardware, and then delivered to independent parallel arithmetic logic units (ALUs)
- Arm Frame Buffer Compression 1.3: AFBC 1.3 supports 2-plane YUV, improved front-buffer rendering, and separate depth/stencil encoding for better compatibility with APIs like Vulkan
For more information on the new processor cores visit developer.arm.com.
About the AuthorFollow on Twitter Follow on Linkedin Visit Website More Content by Brandon Lewis