| Order a report

A Guide to Processors for Deep Learning

Fourth Edition

Published February 2021

Authors: Linley Gwennap and Mike Demler

Corporate License: $5,995

Ordering Information



Take a Deep Dive into Deep Learning

Deep learning, also known as artificial intelligence (AI), has seen rapid changes and improvements over the past few years and is now being applied to a wide variety of applications. Typically implemented using neural networks, deep learning powers image recognition, voice processing, language translation, and many other web services in large data centers. It is an essential technology in self-driving cars, providing both object recognition and decision making. It is even used in smartphones, PCs, and embedded (IoT) systems.

Even the fastest CPUs are inadequate to efficiently execute the highly complex neural networks needed to address these advanced problems. Boosting performance requires more specialized hardware architectures. Graphics chips (GPUs) have become popular, particularly for the initial training function. Many other hardware approaches have recently emerged, including DSPs, FPGAs, and dedicated ASICs. Although these solutions promise order-of-magnitude improvements, GPU vendors are tuning their designs to better support deep learning.

Autonomous vehicles are an important application for deep learning. Vehicles don't implement training but instead focus on the simpler inference tasks. Even so, these vehicles require very powerful processors, but they are more constrained in cost and power than data-center servers, requiring different tradeoffs. Several chip vendors are delivering products specifically for this application; some automakers are developing their own ASICs instead.

Large chip vendors such as Intel and Nvidia currently generate the most revenue from deep-learning processors. But many startups, some well funded, have emerged to develop new, more customized architectures for deep learning; Cerebras, Graphcore, Greenwaves, Gyrfalcon, Groq, Horizon Robotics, Tenstorrent, and Untether are among the first to deliver products. Eschewing these options, leading data-center operators such as Alibaba, Amazon, and Google have developed their own hardware accelerators.

We Sort Out the Market and the Products

A Guide to Processors for Deep Learning covers hardware technologies and products from more than 55 companies. The 300+ page report provides deep technology analysis and head-to-head product comparisons, as well as analysis of company prospects in this rapidly developing market segment. We explain which products will win designs, and why. The Linley Group’s unique technology analysis provides a forward-looking view, helping sort through competing claims and products.

The guide begins with a detailed overview of the market. We explain the basics of deep learning, the types of hardware acceleration, and the end markets, including a forecast for both automotive and data-center adoption. The heart of the report provides detailed technical coverage of announced chip products from AMD, Cambricon, Cerebras, Graphcore, Groq, Intel (including former Altera, Habana, Mobileye, and Movidius), Mythic, Nvidia (including Tegra and Tesla), NXP, and Xilinx. Other chapters cover Google’s TPU family of ASICs and Tesla’s autonomous-driving ASIC. We also include shorter profiles of numerous other companies developing AI chips of all sorts, including Amazon, Brainchip, Gyrfalcon, Hailo, Huawei, Lattice, Qualcomm, Synaptics, and Texas Instruments. Finally, we bring it all together with technical comparisons in each product category and our analysis and conclusions about this emerging market.

Make Informed Decisions

As the leading vendor of technology analysis for processors, The Linley Group has the expertise to deliver a comprehensive look at the full range of chips designed for a broad range of deep-learning applications. Principal analyst Linley Gwennap and senior analyst Mike Demler use their experience to deliver the deep technical analysis and strategic information you need to make informed business decisions.

Whether you are looking for the right processor for an automotive application, an IoT device, or a data-center accelerator, or seeking to partner with or invest in one of these vendors, this report will cut your research time and save you money. Make the smart decision: order A Guide to Processors for Deep Learning today.

This report is written for:

  • Engineers designing chips or systems for deep learning or autonomous vehicles
  • Marketing and engineering staff at companies that sell related chips who need more information on processors for deep learning or autonomous vehicles
  • Technology professionals who wish an introduction to deep learning, vision processing, or autonomous-driving systems
  • Financial analysts who desire a hype-free analysis of deep-learning processors and of which chip suppliers are most likely to succeed
  • Press and public-relations professionals who need to get up to speed on this emerging technology

This market is developing rapidly — don't be left behind!

The fourth edition of A Guide to Processors for Deep Learning covers dozens of new products and technologies announced in the past year, including:

  • The innovation behind Nvidia’s Ampere A100, the industry-leading GPU
  • The first products in Qualcomm’s power-efficient Cloud AI 100 line
  • Graphcore’s second-generation accelerator, the GC2000
  • Tenstorrent’s initial Grayskull product, which outperforms Nvidia’s T4 at the same 75W
  • Intel’s new Stratix NX, its first AI-optimized FPGA
  • AMD’s powerful new MI100 (CDNA) accelerator for supercomputers
  • NXP’s i.MX8M Plus, the company’s first microcontroller with AI acceleration
  • Untether’s TsunAImi, which packs an industry-leading 2,000 TOPS into a single card
  • The evolution of Google’s TPU family, including its next-generation TPUv4
  • Intel’s new Xe GPU initiative
  • Esperanto’s first product, which features more than 1,000 Minion cores at only 20W
  • New card and system products from Groq
  • A preview of the second-generation Wafer-Scale Engine (WSE2) from Cerebras
  • Updated roadmaps for Intel’s Habana accelerators and Agilex FPGAs
  • The Jacinto processor from Texas Instruments for Level 3 ADAS
  • Synaptics’ VS680, a low-cost SoC for AI-enabled consumer devices
  • Other new AI vendors such as Ambient, Aspinity, Coherent Logix, Kneron, Perceive, SambaNova, Sima, and XMOS
  • New product roadmaps and other updates on all vendors
  • Updated market size and forecast to include economic effects of 2020 pandemic

Deep-learning technology is being deployed or evaluated in nearly every industry in the world. This report focuses on the hardware that supports the AI revolution. As demand for the technology grows rap­idly, we see opportunities for deep-learning accelerators (DLAs) in three general are­as: the data center, automobiles, and embedded (edge) devices.


Large cloud-service providers (CSPs) can apply deep learning to improve web searches, language translation, email filtering, product recommenda­tions, and voice assistants such as Alexa, Cortana, and Siri. Data-center DLAs exceeded $5 billion in 2020 revenue and in five years will approach $12 billion. By 2025, we expect nearly half of all new servers (and most cloud servers) to include a DLA.


Deep learning is critical to the development of self-driving cars. ADAS functions such as autonomous emergency braking already appear in more than half of new cars worldwide; we forecast nearly 80% adoption by 2025. Several automakers are shipping Level 3 autonomous vehicles. Over the next few years, we expect rapid growth in robotaxis and other com­mercial vehicles, generating more than $2.5 billion in processor reve­nue in 2025.


To improve latency and reliability for voice and other cloud services, edge products such as drones, security cameras, and Internet of Things (IoT) devices are implementing neural networks. Most premium smartphones now include a DLA, which is also becoming common in smart speakers. We expect 1.9 billion edge devices to ship with DLAs in 2025.


The rapid market growth has spurred many new companies to develop chips with DLAs. This report covers more than 55 vendors, including those developing chips only for internal use; we’re aware of many others that have disclosed too little information to cover. Although the situation is reminiscent of previous booms in graphics accelerators and network processors, we expect more winners in this competition. Given the widely differing product requirements in the data center, auto­motive, and various embedded market segments, different companies are likely to take the lead in each.


Nvidia dominates the data-center market with its new Ampere GPU, which offers excellent per­formance for both neural-network training and infer­ence. Customers prefer the company’s broad and reliable software stack. Nvidia also leads the push to develop autonomous vehicles; its Xavier processor is the indus­try’s first single-chip solu­tion for Level 3 autonomy, and its Drive AGX cards deliver even greater capabilities.


Intel offers several DLA architectures. Its standard Xeon CPUs are often used for lightweight inference, although their performance is far less than Ampere’s. In late 2019, it acquired Habana’s high-end DLA chips for training and inference, but product development has stalled since then. Intel also sells a range of FPGAs for customers that wish to design their own DLA architecture. Its low-power Myriad chips target drones and other camera-based devices. In addition, its Mobileye subsid­iary leads in ADAS and competes against Nvidia at Level 3 and above.


Several others offer data-center DLAs. Cerebras, Graph­core, Groq, Samba­Nova, and other startups are sampling or shipping chips that use new architectures to outperform Nvidia’s GPUs on at least some workloads. AMD’s new MI100 GPU challenges the A100 for HPC but still lags on AI applications. Xilinx added AI cores to its new Versal FPGAs but hasn’t disclosed any neural-network benchmarks. These vendors must also compete against in-house infer­ence ASICs at Alibaba, Amazon, Baidu, Google, Microsoft, and other top CSPs, all of which are now deploying these devices. Huawei dis­closed its own AI architecture that it sells in servers and as a cloud ser­vice. Each of these companies offers a limited software stack and thus can target only a few workloads.


Established automotive suppliers such as NXP, Renesas, Texas Instru­ments, and Toshiba compete against Mobileye in the ADAS market. Well-funded Chinese startups Black Sesame and Horizon Robotics have released ADAS processors and are developing more-powerful chips for autono­mous driving. Other startups also target this market, including Blaize (formerly ThinCI) and Hailo. Some automakers are developing their own chips for autonomous vehicles, but only Tesla has disclosed any details about its ASIC, which is already in pro­duction. Pilot deployments of Level 4 robotaxis have already started, and we expect mass production of these vehicles in 2022.


The embedded market has attracted the most startups, as the cost of both hardware and software development is relatively low, and design wins can quickly generate revenue. This market encompasses multiple end appli­cations. Flex Logix, Gyrfalcon, and Mythic offer high-end DLA chips for high-resolution and multicamera applications. BrainChip, Google, Intel, NXP, and Synaptics target drones and other consumer video devices. Smart speakers and other voice-activated devices require less performance; Syntiant supplies the lowest-power chip for keyword spotting, but it competes against Ambi­ent, Kneron, and Knowles. For ultra-low-power sensors, GreenWaves and Lattice provide alternatives.


Comparing the capabilities of such products is complicated; much depends on the needs of the end application. This report provides the data necessary to evaluate these companies and their products, along with our analysis of how well they address market requirements.

List of Figures
List of Tables
About the Authors
About the Publisher
Preface
Executive Summary
1 Deep-Learning Applications
What Is Deep Learning?
Cloud-Based Deep Learning
Advanced Driver-Assistance Systems
Autonomous Vehicles
Voice Assistants
Smart Cameras
Manufacturing
Robotics
Financial Technology
Health Care and Medicine
2 Deep-Learning Technology
Artificial Neurons
Deep Neural Networks
Spiking Neural Networks
Neural-Network Training
Training Spiking Neural Networks
Pruning and Compression
Neural-Network Inference
Quantization
Neural-Network Software
Popular Frameworks
Other Open Software
Neural-Network Models
Image-Classification Models
Natural-Language and Recommender Models
3 Deep-Learning Accelerators
Accelerator Design
Data Formats
Computation Units
Dot Products
Systolic Arrays
Handling Sparsity
Other Common Functions
Processor Architectures
CPUs
GPUs
DSPs
Custom Architectures
FPGAs
Performance Measurement
Peak Operations
Neural-Network Performance
MLPerf Benchmarks
AI-Benchmark
4 Market Forecast
Market Overview
Data Center and HPC
Market Size
Market Forecast
Automotive
Market Size
Market Forecast
Autonomous Forecast
Client and IoT
Market Size
Market Forecast
5 AMD
Company Background
Key Features and Performance
Product Roadmap
Conclusions
6 Cambricon
Company Background
Key Features and Performance
Product Roadmap
Conclusions
7 Cerebras
Company Background
Key Features and Performance
Product Roadmap
Conclusions
8 Google
Company Background
Key Features and Performance
Data-Center TPUs
Edge TPU
Conclusions
9 Graphcore
Company Background
Key Features and Performance
Graphcore GC2
Graphcore GC2000
Graphcore M2000
Product Roadmap
Conclusions
10 Groq
Company Background
Key Features and Performance
Product Roadmap
Conclusions
11 Huawei
Company Background
Key Features and Performance
Conclusions
12 Intel
Company Background
Xeon Processors
Key Features and Performance
Product Roadmap
Habana Accelerators
Key Features and Performance
Product Roadmap
Stratix and Agilex FPGAs
Key Features and Performance
Product Roadmap
Movidius Myriad SoCs
Key Features and Performance
Product Roadmap
Conclusions
Data Center
Client and IoT
Strategy Summary
13 Mobileye (Intel)
Company Background
Key Features and Performance
Product Roadmap
Conclusions
14 Mythic
Company Background
Key Features and Performance
Product Roadmap
Conclusions
15 Nvidia Tegra
Company Background
Key Features and Performance
Software Development
Product Roadmap
Conclusions
16 Nvidia Tesla
Company Background
Key Features and Performance
Product Roadmap
Conclusions
17 NXP
Company Background
Key Features and Performance
Product Roadmap
Conclusions
18 Tesla (Motors)
Company Background
Key Features and Performance
Product Roadmap
Conclusions
19 Xilinx
Company Background
Key Features and Performance
UltraScale+
Alveo
Versal
Product Roadmap
Conclusions
20 Other Automotive Vendors
Black Sesame
Company Background
Key Features and Performance
Conclusions
Blaize
Company Background
Key Features and Performance
Conclusions
Hailo
Company Background
Key Features and Performance
Conclusions
Horizon Robotics
Company Background
Key Features and Performance
Conclusions
Renesas
Company Background
Key Features and Performance
Conclusions
Texas Instruments
Company Background
Key Features and Performance
Conclusions
Toshiba
Company Background
Key Features and Performance
Conclusions
21 Other Data-Center Vendors
Achronix
Company Background
Key Features and Performance
Conclusions
Alibaba
Company Background
Key Features and Performance
Conclusions
Amazon
Baidu
Company Background
Key Features and Performance
Conclusions
Centaur (Via)
Company Background
Key Features and Performance
Conclusions
Enflame
Company Background
Key Features and Performance
Conclusions
Esperanto
Company Background
Key Features and Performance
Conclusions
Furiosa
Marvell
Microsoft
Company Background
Key Features and Performance
Conclusions
Qualcomm
Company Background
Key Features and Performance
Conclusions
SambaNova
Company Background
Key Features and Performance
Conclusions
SimpleMachines
Company Background
Key Features and Performance
Conclusions
Tenstorrent
Company Background
Key Features and Performance
Conclusions
Tianshu Zhixin
Untether
Company Background
Key Features and Performance
Conclusions
Wave Computing
22 Other Embedded Vendors
Ambient
Company Background
Key Features and Performance
Conclusions
Aspinity
Company Background
Key Features and Performance
Conclusions
BrainChip
Company Background
Key Features and Performance
Conclusions
Coherent Logix
Company Background
Key Features and Performance
Conclusions
Cornami
Eta Compute
Company Background
Key Features and Performance
Conclusions
Flex Logix
Company Background
Key Features and Performance
Conclusions
Grai Matter
Company Background
Key Features and Performance
Conclusions
GreenWaves
Company Background
Key Features and Performance
Conclusions
Gyrfalcon
Company Background
Key Features and Performance
Conclusions
Kneron
Company Background
Key Features and Performance
Conclusions
Knowles
Company Background
Key Features and Performance
Conclusions
Lattice
Company Background
Key Features and Performance
Conclusions
NovuMind
Company Background
Key Features and Performance
Conclusions
Perceive
Company Background
Key Features and Performance
Conclusions
Sima.ai
Company Background
Key Features and Performance
Conclusions
Synaptics
Company Background
Key Features and Performance
Conclusions
Syntiant
Company Background
Key Features and Performance
Conclusions
XMOS
Company Background
Key Features and Performance
Conclusions
23 Processor Comparisons
How to Read the Tables
Data-Center Training
Architecture
Interfaces
Performance and Power
Summary
Data-Center Inference
Architecture
Interfaces
Performance and Power
Summary
Power-Efficient Data-Center Inference
Architecture
Interfaces
Performance and Power
Summary
Automotive Processors
CPU Subsystem
Vision Processing
Interfaces
Summary
Embedded Processors
Performance and Power
Memory and Interfaces
Summary
Embedded Coprocessors
Performance and Power
Memory and Interfaces
Summary
Ultra-Low-Power Processors
Performance and Power
Memory and Interfaces
Summary
24 Conclusions
Market Summary
Data Center
Automotive
Embedded
Technology Trends
Neural Networks
Hardware Options
Performance Metrics
Vendor Summary
Data Center
Automotive
Embedded
Closing Thoughts
Index
Figure 1‑1. SAE autonomous-driving levels
Figure 1‑2. Waymo autonomous test vehicle
Figure 1‑3. GM’s autonomous-vehicle prototype
Figure 1‑4. Various smart speakers
Figure 1‑5. A smart surveillance camera
Figure 1‑6. Processing steps in a computer-vision neural network
Figure 1‑7. Robotic arms use deep learning
Figure 1‑8. Comparisons of lung-disease severity (PXS)
Figure 2‑1. Neuron connections in a biological brain
Figure 2‑2. Model of a neural-network processing node
Figure 2‑3. Common activation functions
Figure 2‑4. Model of a four-layer neural network
Figure 2‑5. Three-dimensional neural network
Figure 2‑6. Spiking effect in biological neurons
Figure 2‑7. Spiking-neural-network pattern
Figure 2‑8. Pruning a neural network
Figure 2‑9. Mapping from floating-point format to integer format
Figure 3‑1. Common AI data types and approximate data ranges
Figure 3‑2. Arm dot-product operation
Figure 3‑3. A systolic array
Figure 3‑4. Performance versus batch size
Figure 4‑1. Revenue forecast for deep-learning chips, 2017–2025
Figure 4‑2. Unit forecast for deep-learning chips, 2017–2025
Figure 4‑3. Unit forecast for deep-learning chips by technology, 2018–2025
Figure 4‑4. Unit forecast for ADAS-equipped vehicles, 2017–2025
Figure 4‑5. Revenue forecast for ADAS processors, 2017–2025
Figure 4‑6. Unit forecast for client deep-learning chips, 2017–2025
Figure 5‑1. AMD CDNA architecture
Figure 7‑1. Cerebras wafer-scale engine (WSE)
Figure 8‑1. Google TPUv3 board
Figure 9‑1. Block diagram of Graphcore M2000 accelerator
Figure 10‑1. TSP conceptual diagram
Figure 12‑1. Functional diagram of Habana HLS-1 system
Figure 14‑1. Mythic's flash-based neural-network tile
Figure 20‑1. Blaize Pathfinder system-on-module
Figure 20‑2. Hailo-8 heterogeneous-resource map
Figure 20‑3. Block diagram of Renesas R-Car V3H
Figure 20‑4. TI Jacinto 7 TDA4VM primary compute domain
Figure 20‑5. Block diagram of Toshiba TMPV770 ADAS processor
Figure 21‑1. Block diagram of Centaur CHA processor
Figure 21‑2. Esperanto ET-SoC-1 die plot
Figure 22‑1. Block diagram of Aspinity AnalogML processor
Figure 22‑2. Block diagram of Eta Compute ECM3531
Figure 22‑3. Block diagram of Flex Logix InferX X1
Figure 22‑4. Block diagram of GreenWaves GAP8 processor
Figure 22‑5. Block diagram of Knowles AISonic processor
Figure 22‑6. Block diagram of Lattice SensAI architecture
Figure 22‑7. Block diagram of Synaptics AudioSmart AS-371
Figure 22‑8. Block diagram of Syntiant NDP101 speech processor
Figure 22‑9. Xcore.ai processor architecture
Figure 23‑1. ResNet-50 v1.0 training throughput
Figure 23‑2. ResNet-50 v1.0 inference throughput (high end)
Figure 23‑3. ResNet-50 v1.0 inference throughput (low end)
Figure 23‑4. ResNet-50 v1.0 inference latency
Figure 24‑1. Model-size trends, 2014–2020
Table 2‑1. Size and compute requirement of popular DNNs
Table 3 1. AI-Benchmark smartphone tests
Table 4‑1. Data-center DLA units and revenue, 2019–2025
Table 5‑1. Key parameters for AMD Instinct accelerators
Table 6‑1. Key parameters for Cambricon edge- and cloud-AI processors
Table 8‑1. Key parameters for Google TPU accelerators
Table 8‑2. Key parameters for Google Edge TPU
Table 9‑1. Key parameters for Graphcore processors
Table 10‑1. Key parameters for Groq TSP architecture
Table 11‑1. Key parameters for Huawei Ascend devices
Table 12‑1. Key parameters for selected Intel Xeon Scalable processors
Table 12‑2. Key parameters for Intel Habana Goya accelerator card
Table 12‑3. Key parameters for selected Intel FPGAs
Table 12‑4. Key parameters for Intel Movidius processors
Table 13‑1. Key parameters for Mobileye EyeQ processors
Table 14‑1. Key parameters for Mythic M1108 DLA
Table 15‑1. Key parameters for Nvidia automotive processors
Table 16‑1. Key parameters for Nvidia deep-learning GPUs
Table 17‑1. Key parameters for NXP AI-enabled processors
Table 18‑1. Key parameters for Tesla FSD ASIC
Table 19‑1. Key parameters for selected Xilinx FPGAs
Table 20‑1. AI-chip companies targeting automotive applications
Table 21‑1. AI-chip companies targeting data-center applications
Table 21‑2. Key parameters for Achronix Speedster7t FPGAs
Table 21‑3. Key parameters for Alibaba HanGuang 800 accelerator card
Table 21‑4. Key parameters for Baidu Kunlun K200 accelerator card
Table 21‑5. Key parameters for Enflame CloudBlazer products
Table 21‑6. Key parameters for Qualcomm Cloud AI 100 products
Table 21‑7. Key parameters for SimpleMachines Accelerando card
Table 21‑8. Key parameters for Tenstorrent Grayskull accelerator card
Table 21‑9. Key parameters for Untether TsunAImi accelerator card
Table 22‑1. AI-chip companies targeting embedded applications
Table 22‑2. Key parameters for Gyrfalcon Lightspeeur coprocessors
Table 22‑3. Key parameters for Kneron edge-AI processors
Table 22‑4. Key parameters for Perceive Ergo processor
Table 23‑1. Comparison of leading DLAs for AI training
Table 23‑2. Comparison of other DLAs for AI training
Table 23‑3. Comparison of high-end DLAs for AI inference
Table 23‑4. Comparison of midrange DLAs for AI inference
Table 23‑5. Comparison of 75W DLAs for AI inference
Table 23‑6. Comparison of M.2 DLAs for AI inference
Table 23‑7. Comparison of ADAS processors
Table 23‑8. Comparison of autonomous-vehicle processors
Table 23‑9. Comparison of automotive DLAs
Table 23‑10. Comparison of low-power embedded DLA SoCs
Table 23‑11. Comparison of high-performance embedded DLA SoCs
Table 23‑12. Comparison of embedded DLA coprocessors
Table 23‑13. Comparison of ultra-low-power DLAs below 5 GOPS
Table 23‑14. Comparison of ultra-low-power DLAs above 50 GOPS

Free Newsletter

Linley Newsletter
Analysis of new developments in microprocessors and other semiconductor products
Subscribe to our Newsletter »

Events

Linley Spring Processor Conference 2021
April 19 - 23, 2021
Virtual Event
Register Now!
More Events »