» Current | 2018 | 2017 | 2016 | Subscribe

Linley Newsletter

Nvidia Shares Its Deep Learning

March 27, 2018

Author: Mike Demler

Designers no longer need to worry about the costs of deep-learning acceleration: Nvidia is making the technology available for free. The company has extracted the deep-learning accelerator (NVDLA) from its Xavier autonomous-driving processor and is offering it for use under a royalty-free open-source license. It’s managing the NVDLA project as a directed community, which it supports with comprehensive documentation and instructions. Users can also download NVDLA hardware and software components from GitHub. Nvidia delivers the NVDLA core as synthesizable Verilog RTL code, along with a step-by-step SoC-integrator manual, a run-time engine, and a software manual.

The company’s strategy in creating the open-source project is to foster more-widespread adoption of neural-network inference engines. It expects to thereby benefit from greater demand for its expensive GPU-based training platforms. Most neural-network developers train their models on Nvidia GPUs, and many use the Cuda deep-neural-network (cuDNN) library and software-development kit (SDK) to run models built in Caffe2, Pytorch, TensorFlow, and other popular frameworks.

The NVDLA is configurable for uses ranging from tiny IoT devices to image-processing inference engines in self-driving cars, but Nvidia’s first RTL release is the “full” model, which is similar to the unit in Xavier. It includes 2,048 INT8 multiply-accumulators (MACs), but they’re configurable at run time as 1,024 INT16 or FP16 units. In a 16nm design optimized to run the ResNet-50 neural network, the full model processes 269 frames per second (fps) and consumes 291mW on average.

Next quarter, the company plans to offer early access to a small NVDLA version that integrates 64 fixed-configuration INT8 MACs. This design can process 7fps on ResNet-50, but it consumes just 17mW (average). These full and small models are just two end-point examples of the accelerator’s configurability, and designers are free to fine-tune the architecture.

Subscribers can view the full article in the Microprocessor Report.

Subscribe to the Microprocessor Report and always get the full story!

Purchase the full article

Events

Linley Fall Processor Conference 2018
Covers processors and IP cores used in embedded, communications, automotive, IoT, and server designs.
October 31 - November 1, 2018
Hyatt Regency, Santa Clara, CA
More Events »

Newsletter

Linley Newsletter
Analysis of new developments in microprocessors and other semiconductor products
Subscribe to our Newsletter »