| Order a report

A Guide to Processors for Deep Learning

Third Edition

Published February 2020

Authors: Linley Gwennap and Mike Demler

Corporate License: $5,995

Ordering Information



Take a Deep Dive into Deep Learning

Deep learning, also known as artificial intelligence (AI), has seen rapid changes and improvements over the past few years and is now being applied to a wide variety of applications. Typically implemented using neural networks, deep learning powers image recognition, voice processing, language translation, and many other web services in large data centers. It is an essential technology in self-driving cars, providing both object recognition and decision making. It is even moving into client devices such as smartphones and embedded (IoT) systems. 

Even the fastest CPUs are inadequate to efficiently execute the highly complex neural networks needed to address these advanced problems. Boosting performance requires more specialized hardware architectures. Graphics chips (GPUs) have become popular, particularly for the initial training function. Many other hardware approaches have recently emerged, including DSPs, FPGAs, and dedicated ASICs. Although these solutions promise order-of-magnitude improvements, GPU vendors are tuning their designs to better support deep learning.

Autonomous vehicles are an important application for deep learning. Vehicles don't implement training but instead focus on the simpler inference tasks. Even so, these vehicles require very powerful processors, but they are more constrained in cost and power than data-center servers, requiring different tradeoffs. Several chip vendors are delivering products specifically for this application; some automakers are developing their own ASICs instead. 

Large chip vendors such as Intel and Nvidia currently generate the most revenue from deep-learning processors. But many startups, some well funded, have emerged to develop new, more customized architectures for deep learning; Cerebras, Graphcore, Greenwaves, Gyrfalcon, Groq, Habana (now part of Intel), and Horizon Robotics are among the first to deliver products. Eschewing these options, leading data-center operators such as Alibaba, Amazon, Google and Microsoft have developed their own hardware accelerators. 

We Sort Out the Market and the Products

A Guide to Processors for Deep Learning covers hardware technologies and products. The report provides deep technology analysis and head-to-head product comparisons, as well as analysis of company prospects in this rapidly developing market segment. Which products will win designs, and why. The Linley Group’s unique technology analysis provides a forward-looking view, helping sort through competing claims and products. 

The guide begins with a detailed overview of the market. We explain the basics of deep learning, the types of hardware acceleration, and the end markets, including a forecast for both automotive and data-center adoption. The heart of the report provides detailed technical coverage of announced chip products from AMD, Cerebras, Graphcore, Groq, Gyrfalcon, Horizon Robotics, Intel (including former Altera, Habana, Mobileye, and Movidius), Mythic, Nvidia (including Tegra and Tesla), Wave Computing, and Xilinx. Other chapters cover Google’s TPU family of ASICs and Tesla’s autonomous-driving ASIC. We also include shorter profiles of numerous other companies developing AI chips of all sorts, including Alibaba, Blaize, Brainchip, Huawei, Lattice, and Syntiant. Finally, we bring it all together with technical comparisons in each product category and our analysis and conclusions about this emerging market. 

Make Informed Decisions 

As the leading vendor of technology analysis for processors, The Linley Group has the expertise to deliver a comprehensive look at the full range of chips designed for a broad range of deep-learning applications. Principal analyst Linley Gwennap and senior analyst Mike Demler use their experience to deliver the deep technical analysis and strategic information you need to make informed business decisions. 

Whether you are looking for the right processor for an automotive application, an IoT device, or a data-center accelerator, or seeking to partner with or invest in one of these vendors, this report will cut your research time and save you money. Make the smart decision: order A Guide to Processors for Deep Learning today. 

This report is written for:

  • Engineers designing chips or systems for deep learning or autonomous vehicles
  • Marketing and engineering staff at companies that sell related chips who need more information on processors for deep learning or autonomous vehicles
  • Technology professionals who wish an introduction to deep learning, vision processing, or autonomous-driving systems
  • Financial analysts who desire a hype-free analysis of deep-learning processors and of which chip suppliers are most likely to succeed
  • Press and public-relations professionals who need to get up to speed on this emerging technology

This market is developing rapidly — don't be left behind!

The third edition of A Guide to Processors for Deep Learning covers dozens of new products and technologies announced in the past year, including: 

  • The TSP from startup Groq, the first chip to reach 1,000 TOPS
  • Huawei’s Ascend family of accelerators, which ranges from edge to the data center
  • Google’s first chip product, the tiny Edge TPU
  • The massive Wafer-Scale Engine (WSE) from Cerebras
  • Syntiant’s NDP10x chip, which performs voice recognition using less than one milliwatt
  • The HanGuang ASIC from Alibaba, which achieved a record ResNet-50 score
  • GreenWaves’ second-generation IoT processor, the GAP9
  • Tesla’s ASIC, shipping in all of its new cars, sets automotive TOPS record
  • Habana’s Gaudi accelerator for AI training
  • Horizon Robotics’ vision accelerators for automotive designs
  • Lattice’s SenseAI technology, which turns its low-power FPGAs into AI accelerators
  • Intel’s Cascade Lake server processors with AI-inference extensions
  • The Kunlun ASIC from Chinese cloud giant Baidu
  • GrAI Matter Labs’s GrAI One, a low-power neuromorphic processor
  • Startup Esperanto’s Maxion architecture
  • Habana’s acquisition by Intel for $2 billion
  • A first look at Qualcomm’s Cloud AI program
  • Other new AI vendors such as Achronix, Cornami, Flex Logix, Hailo, Knowles, and Synaptics
  • New product roadmaps and other updates on all vendors
  • Updated market size and forecast to include 2019 slowdown in cloud spending and slower progress in autonomous driving

Deep-learning technology is being deployed or evaluated in nearly every industry in the world. This report focuses on the hardware that supports this AI revolution. As demand for the technology grows rapidly, we see opportunities for deep-learning accelerators (DLAs) in three general areas: the data center, automobiles, and embedded (edge) devices.

Large cloud-service providers (CSPs) can apply deep learning to improve web search, language translation, email filtering, product recommendations, and voice assistants such as Alexa, Cortana, and Siri. Data-center DLAs exceeded $3 billion in 2019 revenue and in five years will approach $10 billion. By 2024, we expect nearly half of all new servers (and most cloud servers) to include a DLA.

Deep learning is critical to the development of self-driving cars. Level 2 ADAS functions such as autonomous emergency braking already appear in more than half of new cars in the US and Europe; we forecast 70% worldwide adoption by 2024. Level 3–4 autonomy is taking longer than anticipated to reach the market, but we expect robotaxis and other commercial vehicles to deploy in the next few years, generating more than $2 billion in processor revenue in 2024.

To improve latency and reliability for voice and other cloud services, edge products such as smartphones, drones, smart speakers, security cameras, and Internet of Things (IoT) devices are implementing neural networks. We expect 1.6 billion edge devices to ship with DLAs in 2024.

This rapid market growth has spurred many new companies to develop chips with DLAs. This report covers more than 40 vendors, including those developing chips only for internal use; we’re aware of many others that have disclosed too little information to cover. Although this situation is reminiscent of previous booms in graphics accelerators and network processors, we expect a greater number of winners in this competition. Given the widely differing product requirements in the data center, automotive, and various embedded market segments, different companies are likely to take the lead in each.

Nvidia dominates the data-center market with its Volta and Turing GPUs, which include “tensor cores” that greatly improve their performance for neural-network training and inference. Customers prefer the company’s broad and reliable software stack. But it didn’t release a new GPU in 2019, leaving it more vulnerable to competition. Nvidia also leads the push to develop autonomous vehicles; its Xavier processor is the industry’s first single-chip solution for Level 3 autonomy, and its Drive AGX cards deliver even greater capabilities.

Intel offers several DLA architectures. Its standard Xeon CPUs are often used for inference, and the new Cascade Lake models triple inference throughput. After spending $2 billion to acquire Habana’s high-end DLA chips for training and inference, the company terminated its similar Nervana products. Intel also sells a range of FPGAs for customers that wish to design their own DLA architecture. Its low-power Myriad chips target drones and other camera-based devices. In addition, its Mobileye subsidiary is a leader in ADAS and is moving up to Level 3 and above.

Several others offer data-center DLAs. Cerebras, Graphcore, Groq, and other startups are sampling or shipping chips that use new architectures to outperform Nvidia’s GPUs on at least some workloads. AMD supplies Radeon Instinct GPUs, but they lack AI-specific features and fall well behind for neural networks. Xilinx added AI cores to its new Versal FPGAs but hasn’t disclosed any neural-network benchmarks for the pre-production chips. These vendors must also compete against in-house inference ASICs at Alibaba, Amazon, Baidu, Google, Microsoft, and other top CSPs, all of which are now deploying these devices. Huawei disclosed its own AI architecture that it sells in servers and as a cloud service. Each of these companies offers a limited software stack and thus can target only a few workloads.

Established automotive suppliers such as NXP, Renesas, and Toshiba compete against Mobileye in the ADAS market. Well-funded Chinese startups such as Black Sesame and Horizon Robotics have released ADAS processors and are developing more-powerful chips for autonomous driving. Other startups also target this market, including Blaize (formerly ThinCI), Hailo, and Kalray. The slowdown in autonomous deployment, as well as the need to meet stringent safety standards, means these companies will take years to generate significant automotive revenue. Some automakers are developing their own chips for autonomous vehicles, but only Tesla has disclosed any details on its ASIC, which is already in production.

The embedded market has attracted the most startups, as the cost of both hardware and software development is relatively low, and design wins can quickly generate revenue. This market encompasses multiple end applications. Gyrfalcon and NovuMind offer high-performance DLA chips for consumer video applications. Smart speakers and other voice-activated devices require low power and performance; Syntiant supplies the lowest-power chip for keyword spotting, but it competes against Ambient, Knowles, and Synaptics. For ultra-low-power sensors, BrainChip and Grai Matter use neuromorphic technology to reduce power, while Eta Compute, GreenWaves, and Lattice provide alternatives.

Comparing the capabilities of such products is complicated; much depends on the needs of the end application. This report provides the data necessary to evaluate these companies and their products, along with our analysis of how well they address market requirements.

List of Figures
List of Tables
About the Authors
About the Publisher
Preface
Executive Summary
1 Deep-Learning Applications
What Is Deep Learning?
Cloud-Based Deep Learning
Advanced Driver-Assistance Systems
Autonomous Vehicles
Voice Assistants
Smart Cameras
Manufacturing
Robotics
Financial Technology
Health Care and Medicine
2 Deep-Learning Technology
Artificial Neurons
Deep Neural Networks
Spiking Neural Networks
Neural-Network Training
Training Spiking Neural Networks
Pruning and Compression
Neural-Network Inference
Quantization
Neural-Network Development
Neural-Network Models
Natural-Language Models
3 Deep-Learning Accelerators
Accelerator Design
Data Formats
Computation Units
Dot Products
Systolic Arrays
Handling Sparsity
Other Common Functions
Processor Architectures
CPUs
GPUs
DSPs
Custom Architectures
FPGAs
Performance Measurement
Peak Operations
Neural-Network Performance
MLPerf Benchmarks
AI-Benchmark
4 Market Forecast
Market Overview
Data Center and HPC
Market Size
Market Forecast
Automotive
Market Size
Market Forecast
Autonomous Forecast
Client and IoT
Market Size
Market Forecast
5 AMD
Company Background
Key Features and Performance
Conclusions
6 Cerebras
Company Background
Key Features and Performance
Conclusions
7 Google
Company Background
Key Features and Performance
Data-Center TPUs
Edge TPU
Conclusions
8 Graphcore
Company Background
Key Features and Performance
Product Roadmap
Conclusions
9 Groq
Company Background
Key Features and Performance
Conclusions
10 Gyrfalcon
Company Background
Key Features and Performance
Product Roadmap
Conclusions
11 Habana (Intel)
Company Background
Key Features and Performance
Conclusions
12 Horizon Robotics
Company Background
Key Features and Performance
Product Roadmap
Conclusions
13 Intel
Company Background
Xeon Processors
Key Features and Performance
Product Roadmap
Nervana Accelerators
Key Features and Performance
Stratix and Agilex FPGAs
Key Features and Performance
Product Roadmap
Movidius Myriad
Key Features and Performance
Product Roadmap
Conclusions
14 Mobileye (Intel)
Company Background
Key Features and Performance
Product Roadmap
Conclusions
15 Mythic
Company Background
Key Features and Performance
Product Roadmap
Conclusions
16 Nvidia Tegra
Company Background
Key Features and Performance
Software Development
Product Roadmap
Conclusions
17 Nvidia Tesla
Company Background
Key Features and Performance
Product Roadmap
Conclusions
18 Tesla (Motors)
Company Background
Key Features and Performance
Product Roadmap
Conclusions
19 Wave Computing
Company Background
Key Features and Performance
Product Roadmap
Conclusions
20 Xilinx
Company Background
Key Features and Performance
UltraScale+
Alveo
Versal
Conclusions
21 Other Automotive Vendors
Black Sesame
Blaize
Fabu
Hailo
Company Background
Key Features and Performance
Conclusions
Kalray
Company Background
Key Features and Performance
Conclusions
NXP
Company Background
Key Features and Performance
Conclusions
Renesas
Company Background
Key Features and Performance
Conclusions
Toshiba
Company Background
Key Features and Performance
Conclusions
22 Other Data-Center Vendors
Achronix
Company Background
Key Features and Performance
Conclusions
Alibaba
Key Features and Performance
Conclusions
Amazon
Baidu
Centaur (Via)
Company Background
Key Features and Performance
Conclusions
Esperanto
Furiosa
Huawei
Key Features and Performance
Conclusions
Marvell
Microsoft
Company Background
Key Features and Performance
Conclusions
Qualcomm
Company Background
Key Features and Performance
Conclusions
SambaNova
23 Other Embedded Vendors
Ambient
BrainChip
Cornami
Eta Compute
Company Background
Key Features and Performance
Conclusions
Flex Logix
Company Background
Key Features and Performance
Conclusions
Grai Matter
Company Background
Key Features and Performance
Conclusions
GreenWaves
Company Background
Key Features and Performance
Conclusions
Knowles
Company Background
Key Features and Performance
Conclusions
Lattice
Company Background
Key Features and Performance
Conclusions
NovuMind
Company Background
Key Features and Performance
Conclusions
Synaptics
Company Background
Key Features and Performance
Conclusions
Syntiant
Company Background
Key Features and Performance
Conclusions
24 Processor Comparisons
How to Read the Tables
Data-Center Training
Architecture
Interfaces
Performance
Summary
Data-Center Inference
Architecture
Interfaces
Performance
Summary
Automotive Processors
CPU Subsystem
Vision Processing
Interfaces
Summary
Embedded Processors
Performance
Memory and Interfaces
Summary
Embedded Coprocessors
Performance
Memory and Interfaces
Summary
Ultra-Low-Power Processors
Performance and Power
Memory and Interfaces
Summary
25 Conclusions
Market Summary
Data Center
Automotive
Embedded
Technology Trends
Neural Networks
Hardware Options
Performance Metrics
Vendor Summary
Data Center
Automotive
Embedded
Closing Thoughts
Appendix: Further Reading
Index
Figure 1‑1. SAE autonomous-driving levels
Figure 1‑2. Waymo autonomous test vehicle
Figure 1‑3. GM’s autonomous-vehicle prototype
Figure 1‑4. Various smart speakers
Figure 1‑5. A smart surveillance camera
Figure 1‑6. Processing steps in a computer-vision neural network
Figure 1‑7. Robotic arms use deep learning
Figure 1‑8. Example biopsy images used to diagnose breast cancer
Figure 2‑1. Neuron connections in a biological brain
Figure 2‑2. Model of a neural-network processing node
Figure 2‑3. Common activation functions
Figure 2‑4. Model of a four-layer neural network
Figure 2‑5. Spiking effect in biological neurons
Figure 2‑6. Spiking-neural-network pattern
Figure 2‑7. Pruning a neural network
Figure 2‑8. Mapping from floating-point format to integer format
Figure 3‑1. Common AI data types and approximate data ranges
Figure 3‑2. Arm dot-product operation
Figure 3‑3. A systolic array
Figure 3‑4. Performance versus batch size
Figure 4‑1. Revenue forecast for deep-learning chips, 2016–2024
Figure 4‑2. Unit forecast for deep-learning chips, 2016–2024
Figure 4‑3. Unit forecast for ADAS-equipped vehicles, 2016–2024
Figure 4‑4. Revenue forecast for ADAS processors, 2016–2024
Figure 4‑5. Unit forecast for client deep-learning chips, 2016–2024
Figure 6‑1. Cerebras wafer-scale engine (WSE)
Figure 7‑1. Google TPUv2 board
Figure 8‑1. Graphcore C2 card
Figure 9‑1. TSP conceptual diagram
Figure 11‑1. Functional diagram of Habana HLS-1 system
Figure 13‑1. Intel NNP-I chip in an M.2 card
Figure 15‑1. Mythic's flash-based neural-network tile
Figure 21‑1. Hailo-8 heterogeneous-resource map
Figure 21‑2. Block diagram of NXP S32V234
Figure 21‑3. Block diagram of Renesas R-Car V3H
Figure 21‑4. Block diagram of Toshiba TMPV770 ADAS processor
Figure 22‑1. Block diagram of DSP core in Achronix Speedster7t
Figure 22‑2. Block diagram of Alibaba HanGuang 800
Figure 22‑3. Block diagram of Centaur CHA processor
Figure 23‑1. Block diagram of Eta Compute ECM3531
Figure 23‑2. Block diagram of Flex Logix InferX X1
Figure 23‑3. Block diagram of GreenWaves GAP8 processor
Figure 23‑4. Block diagram of Knowles AISonic processor
Figure 23‑5. Block diagram of Lattice SensAI architecture for the ECP5
Figure 23‑6. Block diagram of Synaptics AudioSmart AS-371
Figure 23‑7. Block diagram of Syntiant NDP101 speech processor
Figure 24‑1. ResNet-50 v1.0 training throughput
Figure 24‑2. ResNet-50 v1.0 inference throughput
Figure 24‑3. ResNet-50 v1.0 inferences per watt
Figure 24‑4. ResNet-50 v1.0 inference latency
Figure 25‑1. Model size trend, 2012–2019
Figure 25‑2. Deep-learning accelerators
Table 2‑1. Size and compute requirement of popular DNNs
Table 4‑1. Data-center DLA units and revenue, 2018–2024
Table 5‑1. Key parameters for AMD Radeon Instinct accelerators
Table 7‑1. Key parameters for Google TPU accelerators
Table 7‑2. Key parameters for Google Edge TPU
Table 8‑1. Key parameters for Graphcore GC2 processor
Table 9‑1. Key parameters for Groq TSP architecture
Table 10‑1. Key parameters for Gyrfalcon Lightspeeur coprocessors
Table 11‑1. Key parameters for Habana Goya accelerator card
Table 12‑1. Key parameters for Horizon Robotics processors
Table 13‑1. Key parameters for selected Intel Cascade Lake processors
Table 13‑2. Key parameters for Intel Nervana processors
Table 13‑3. Key parameters for selected Intel Stratix 10 GX FPGAs
Table 13‑4. Key parameters for Intel Movidius processors
Table 14‑1. Key parameters for Mobileye EyeQ processors
Table 16‑1. Key parameters for Nvidia automotive processors
Table 17‑1. Key parameters for Nvidia deep-learning GPUs
Table 18‑1. Key parameters for Tesla FSD ASIC
Table 19‑1. Key parameters for Wave DPU accelerators
Table 20‑1. Key parameters for selected Xilinx FPGAs
Table 21‑1. AI-chip companies targeting automotive applications
Table 21‑2. Key parameters for Kalray MPPA3 processor
Table 22‑1. AI-chip companies targeting data-center applications
Table 22‑2. Key parameters for Huawei Ascend 910 accelerator card
Table 22‑3. Key parameters for Microsoft Brainwave accelerator
Table 23‑1. AI-chip companies targeting embedded applications
Table 24‑1. Comparison of CPUs and GPUs for AI training
Table 24‑2. Comparison of accelerators for AI training
Table 24‑3. Comparison of high-end DLAs for AI inference
Table 24‑4. Comparison of midrange DLAs for AI inference
Table 24‑5. Comparison of ADAS processors
Table 24‑6. Comparison of autonomous-vehicle processors
Table 24‑7. Comparison of embedded DLA SoCs
Table 24‑8. Comparison of embedded DLA coprocessors
Table 24‑9. Comparison of low-power processors for deep learning

Free Newsletter

Linley Newsletter
Analysis of new developments in microprocessors and other semiconductor products
Subscribe to our Newsletter »

Events

Linley Fall Processor Conference 2020
Coming October 20-22 and 27-29, 2020
Register Now!
More Events »