Cortex-A77 Improves IPC

May 28, 2019

Author: Linley Gwennap

After a big revision in Cortex-A76, Arm’s newest high-end CPU takes a more incremental approach but still delivers a sizable performance gain. The new Cortex-A77 borrows the pipeline and basic microarchitecture from its predecessor while adding a few features. Using the new level-zero (L0) cache, the A77 can issue six instructions per cycle, although it’s limited to four decoders. To support the faster front end, it adds a second branch unit and a fourth integer ALU. The architects also expanded the reordering capacity, improved the branch prediction, and added other enhancements based on their experience with the A76 design.

As a result, Arm expects Cortex-A77 to deliver about 20% better integer performance than Cortex-A76 at the same clock speed. Since the pipeline is the same, the clock speed should remain constant in the same IC process. This gain comes atop a 35% per-clock improvement for the A76. Initial customers received production RTL for the new core, code-named Deimos, in 4Q18; the first phones using A77-based processors are due early next year. Arm withheld licensee names, but we expect Huawei, which was first to market with the A76 in its Kirin 980 smartphone processor, to be the lead A77 customer. Qualcomm is likely to deploy a modified A77 core in its next-generation Snapdragon 865.

The performance boost brings Cortex-A77 closer to custom Arm-compatible CPUs from Samsung and Apple in single-core performance, but we still expect those companies to maintain a lead, particularly once they debut their own next-generation designs. Arm continues to balance performance against power and die area, but it’s shifting more weight to performance. The company must also balance the needs of its smartphone customers against those of its server customers; we expect its next-generation Neoverse CPU, called Zeus, is based on the Deimos design.

