Linley Newsletter

Cornami Right-Sizes Systolic Arrays

November 5, 2019

Author: Mike Demler

Since it started in 2014, Cornami has disclosed few details of its plan to build a configurable AI processor, but at the recent Linley Fall Processor Conference, it presented performance results based on an FPGA prototype. Running ResNet-50 inference on 224x224 images with half-precision (FP16) floating-point operations, the development system predicts an incredible 105,000 images per second for the initial product, which it expects to require only 30W. Although that result comes from cycle-accurate measurements rather than an actual chip, it’s equivalent to 16x the performance of Nvidia’s V100 GPU. Cornami based its estimate on single-batch inference, making its results even more impressive because the V100 had a more favorable batch size of 128.

Although the company withheld most of the architectural details, the new processor will integrate an array of programmable cores connected by a proprietary network-on-a-chip (NoC), which allows dynamic reconfiguration to meet the compute requirements of individual neural-network layers. Cornami doesn’t describe its product as an FPGA, but we expect it’s applying some of the programmable logic technology the founders previously developed at Quicksilver Technology more than 15 years ago.

Despite the company’s impressive performance estimate, throughput in a real system will be much less. Cornami hasn’t announced plans to target neural-network training, but its processor uses floating-point MAC units rather than the more power-efficient INT8 MAC units typical of inference. Judging from the FPGA prototype, the new chip will be a much better fit for data centers than for edge devices. The company is still working on its first tapeout, however, so it could change the architecture before going to production.

