Samsung M3 Flexes Arm Muscles

August 16, 2018

Author: Linley Gwennap

Having already shipped millions of phones that use its M3 CPU, Samsung showed off its powerful microarchitecture at the Hot Chips conference today. The third-generation design (code-named Meerkat) from the company’s Austin, Texas, group is considerably beefed up from its predecessor and outperforms any shipping Cortex CPU. In fact, it compares well with Intel’s top-of-the-line Skylake on some performance metrics.

To deliver big performance, Samsung made big changes to the microarchitecture. The M3 can decode and dispatch six Arm instructions per cycle, 50% more than its predecessor. To keep up with the front end, the new design includes an extra integer ALU, another FP/Neon unit, and a second load unit, doubling the load bandwidth. It also doubles many of the branch-prediction resources, the instruction-TLB size, the instruction-cache bandwidth, and the data-cache size and bandwidth while more than doubling the reorder window as well as the number of scheduler entries. Samsung even reduced some instruction latencies to boost performance.

Using a 17-stage pipeline, the CPU can achieve speeds of up to 2.7GHz in the company’s Exynos 9810 processor, which appears exclusively in the Galaxy S9 and Note9 smartphones. This chip, built in Samsung’s 10nm LPP process, features four M3 CPUs and four Cortex-A55 CPUs in a Big.Little configuration. Each M3 has its own 512KB level-two (L2) cache, and all four share a 4MB level-three (L3) cache.

On Geekbench 4.0, a popular mobile benchmark, the 9810 delivers 81% greater single-core performance than the Exynos 8895, which uses the previous-generation M2 CPU in a similar 10nm process. On the same benchmark, the M3 outperforms the top-of-the-line Arm Cortex-A75 by 51%. It even tops the massive Skylake in performance per clock (IPC), although that Intel CPU scales to higher clock speeds.

