» Current | 2020 | 2019 | 2018 | Subscribe

Linley Newsletter

Alibaba Uses Convolution Architecture

March 3, 2020

Author: Linley Gwennap

At the recent ISSCC, Alibaba presented a more detailed disclosure of its HanGuang 800 ASIC, which maintains the record for best ResNet-50 performance of any AI inference accelerator. The presentation describes an architecture optimized for 3D-convolution operations rather than general matrix multiplication (GEMM). This architecture employs dot-product units rather than systolic arrays, increasing MAC utilization relative to competing accelerators and reducing memory requirements. The latter achievement allows the chip to operate entirely without external DRAM, packing neural networks into its capacious 192MB of SRAM. The design compresses weights to further optimize memory usage.

Operating at 700MHz, the chip achieves an impressive 825 trillion operations per second (TOPS), far more than Nvidia’s best GPUs. This rating, combined with the higher MAC utilization, enables its record-setting performance. The HanGuang 800 has a large die and requires up to 280W, but other data-center accelerators have similar requirements. The design has only four cores and can divide models among them, delivering peak throughput at any batch size. This capability makes Alibaba’s accelerator particularly well suited to real-time workloads that require minimum latency.

The chip delivers considerable advantages over Nvidia’s most powerful inference product, the Titan RTX. Using its Turing architecture, the RTX can generate a maximum of 261 TOPS, less than one-third of Alibaba’s peak rating. It has an even greater deficit on ResNet-50 owing its lower MAC utilization. HanGuang can achieve its peak performance even at a batch size of one, whereas the Titan RTX is much slower for real-time inference and has considerably worse latency. The GPU requires an expensive high-bandwidth DRAM to compensate for its small on-chip memory, but this arrangement provides more storage for large models. The two chips have similar die area and power.

Subscribers can view the full article in the Microprocessor Report.

Subscribe to the Microprocessor Report and always get the full story!

Purchase the full article

Free Newsletter

Linley Newsletter
Analysis of new developments in microprocessors and other semiconductor products
Subscribe to our Newsletter »

Events

Linley Fall Processor Conference 2020
October 20-22 and 27-29, 2020 (All Times Pacific)
Register Now!
More Events »