Arm Releases Cortex-A77 CPU, Machine Learning Processor, and Mali-G77 GPU

June 4, 2019 Brandon Lewis

COMPUTEX. Arm has released a suite of IPs that include the Arm Cortex-A77 CPU, Arm Mali-G77 GPU, and Arm Machine Learning (ML) processor.

The Cortex-A77 offers a 20 percent instruction per clock (IPC) performance improvement over its predecessor and 35x the machine learning performance of the Cortex-A75. It also delivers a 20 percent improvement in integer performance, 35 percent better floating point performance, and 15 percent higher memory bandwidth.

The IP is slated for 7 nm process technology, and includes several microarchitecture enhancements, including:

  • Branch Prediction: Double the branch prediction bandwidth, 4x the L1 branch target buffer (BTB) capacity, and 33 percent more L2 BTB
  • Memory: High bandwidth, low latency fetch operations and dynamic code optimization through Macro-op (Mop) cache; dynamic data prefetching based on memory subsystem configuration; and twice the dedicated load-store issue bandwidth
  • Execution: 50 percent increase in integer execution bandwidth enabling up to six instructions per cycle; a 25 percent increase in out-of-order window size to 160 instructions; and a second AES encryption pip has been added

The Arm Machine Learning (ML) processor is a neural processing unit (NPU) that provides up to 5 Tera Operations Per Second per watt. Based on the Winograd architecture that consists of fixed-function engines for executing convolutional layers and programmable layer engines for non-convolutional layers, the Arm ML processor delivers 225 percent more performance on common filters than competing NPUs.

Key features of the Arm ML processor include:

  • Network Types: Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are supported for classification, object detection, speech recognition, natural language processing, and other edge artificial intelligence (AI) applications.
  • Heterogeneous Compute: With optimizations for use with Cortex-A CPUs and Mali GPUs
  • Multicore Scalability: Up to eight NPUs and 32 TOPS in a cluster or 64 NPUs in a mesh configuration
  • Software and Framework Support: The Arm ML processor integrates with TensorFlow, TensorFlow Lite, Caffe, Caffe 2, and other frameworks via the ONNX ecosystem. It is also compatible with the Arm NN software development kit (SDK).

The new Mali-G77 GPUs are based on the Valhall architecture, which offers an enhanced microarchitecture engine, load store caches, and texture pipes. These upgrades result in a 40 percent performance improvement, 30 percent better density, 30 percent increase in energy efficiency, and 60 percent greater machine learning inferencing performance over the previous generation. Arm expects this to result in a 40 percent upgrade in peak graphics performance.

Highlights of the Valhall microarchitecture include:

  • Wider Execution Engines: Two 16-wide execution engines, delivering 32 fused multiply-adds (FMA) per core (two clusters of 16 FMAs per execution engine per core)
  • Quad Texture Mapper: With four texels per cycle, providing double the throughput of the Mali-G76
  • Dynamic Instruction Scheduling: The schedular decides which instructions should be executed from which warps. This is handled completely in hardware, and then delivered to independent parallel arithmetic logic units (ALUs)
  • Arm Frame Buffer Compression 1.3: AFBC 1.3 supports 2-plane YUV, improved front-buffer rendering, and separate depth/stencil encoding for better compatibility with APIs like Vulkan  

For more information on the new processor cores visit


About the Author

Brandon Lewis

Brandon Lewis, Editor-in-Chief of Embedded Computing Design, is responsible for guiding the property's content strategy, editorial direction, and engineering community engagement, which includes IoT Design, Automotive Embedded Systems, the Power Page, Industrial AI & Machine Learning, and other publications. As an experienced technical journalist, editor, and reporter with an aptitude for identifying key technologies, products, and market trends in the embedded technology sector, he enjoys covering topics that range from development kits and tools to cyber security and technology business models. Brandon received a BA in English Literature from Arizona State University, where he graduated cum laude. He can be reached by email at

Follow on Twitter Follow on Linkedin Visit Website More Content by Brandon Lewis
Previous Article
Digi International Launches Digi ConnectCore 8X Development Kits

Purpose-built for demanding IoT environments, Digi ConnectCore 8X provides a complete and secure system pla...

Next Article
Analog Devices Releases New AD9081/2 MxFE Platform

Analog Devices’ new AD9081/2 MxFE platform merges high-performance analog and digital signal processing for...