» Current | 2017 | 2016 | 2015 | Subscribe

Linley Newsletter

Microsoft Brainwave Uses FPGAs

November 28, 2017

Author: Linley Gwennap

Microsoft’s Brainwave project demonstrates the advantages of the FPGA approach to deep learning. The company uses deep neural networks (DNNs) in many of its cloud services, including Bing web search, Cortana voice assistant, and Skype Translator. Microsoft wanted a flexible design that could easily adapt as its workloads evolve. Having direct access to production DNNs and large data sets, the company could experiment with alternative hardware designs and data formats to optimize performance. This experimentation led it to develop a new 8-bit floating-point format (FP8) to represent the neural-network weight and activation values.

To maximize flexibility, Brainwave is a “soft” vector processor that has a custom instruction-set architecture for DNN acceleration. Like Google’s TPU, Brainwave doesn’t run general-purpose code, so the ISA contains a few special-purpose instructions such as matrix multiply, vector operations, convolutions, and nonlinear activations. Each one can operate on hundreds or thousands of data units.

Because it’s instantiated in FPGAs, the instruction set is easily changed. These changes are rolled into a compiler that the team created, maintaining a consistent interface to software. The compiler takes pretrained DNN models developed in multiple frameworks—TensorFlow, Caffe, and Microsoft’s Cognitive Toolkit—and converts them to run on the Brainwave ISA.

The microarchitecture is based on the company’s FP8 format, which halves the size of the compute units and register files compared with standard FP16. The design reads weights from DRAM in FP16 format and converts them to FP8 before storing them in the vector register file. The multiply units then apply the weights to incoming activation data; the results are accumulated and converted back to FP16. Two multifunction units can optionally perform additional multiply-add operations before looping the data back or storing the result in memory. These units can perform activation, normalization, and pooling.

Subscribers can view the full article in the Microprocessor Report.

Subscribe to the Microprocessor Report and always get the full story!

Purchase the full article

Events

Linley Processor Conference 2017
Covers processors and IP cores used in deep learning, embedded, communications, automotive, IoT, and server designs.
October 4 - 5, 2017
Hyatt Regency, Santa Clara, CA
More Events »

Newsletter

Linley Newsletter
Analysis of new developments in microprocessors and other semiconductor products
Subscribe to our Newsletter »