White Papers

We periodically publish white papers providing our analysis and insight on a particular processor or technology. The creation of these white papers is sponsored by other companies, but the opinions and analysis are those of The Linley Group.

The following white papers are currently available:

Movellus Maestro: An Intelligent Clock Network

Movellus is increasing silicon clock performance, power, and reliability through its Maestro technology, which uses Optimized IP to mitigate on-chip variation, skew, and voltage droop. It’s well suited to high-performance systolic arrays, low-power edge IoT devices, and similar designs.

Democratizing Chiplet-Based Processor Design

Chiplet-based designs promise reduced development costs and faster time to market, but they’ve been exclusive to large chip vendors. Now, the industry is building an ecosystem intended to enable designs combining third-party chiplets that employ different process nodes. At the same time, RISC-V is enabling greater CPU innovation through its open-source model. These trends create an opportunity for a RISC-V chiplet vendor.

Bit-Accurate CD Audio From aptX Lossless

Thanks to the development of Bluetooth audio technology, wireless streaming has largely replaced physical media, and smartphone users no longer need to struggle with tangled earphone wires. But to accommodate Bluetooth’s limited bandwidth, the industry adopted a variety of lossy compression techniques that sacrifice audio fidelity. This white paper describes the benefits of the aptX Lossless codec, which enables wireless Bluetooth earbuds and speakers to stream audio that’s bit accurate to the original CD recording.

Software Is Critical to Edge-AI Deployment

This white paper describes the importance of a flexible and robust software stack to edge-AI deployment. Processor vendors and intellectual-property licensers often tout the theoretical performance of their designs, but the neural-network compiler, run-time engine, and scheduler are just as critical to realizing that potential in production systems.

Qualcomm Defines Premium-Smartphone Cameras

The Snapdragon 888 and 888+ are the latest flagship processors from Qualcomm, offering a collection of industry-leading camera technologies that provide a lead over competitors on DXOmark’s camera and video benchmark. These processors enable feature-rich 4K HDR video and photography, AI-driven “3A” technology, and industry leading hardware to allow one-inch camera sensors.

DPU-Based Hardware Acceleration: A Software Perspective

Data-processing units (DPUs) promise greater data-center efficiency, but low-level-programming requirements have hindered broad adoption. Nvidia aims to remove this obstacle using its DOCA framework, which abstracts the programming of  BlueField DPUs. Furthermore, customers will be able to program future converged DPU+GPU hardware using the combination of DOCA and CUDA.

CertusPro-NX Reinvigorates General-Purpose FPGAs

CertusPro-NX, the fourth product developed using Lattice Semiconductor’s Nexus platform in the last 18 months, delivers class-leading power, performance, and size for diverse applications. These general-purpose FPGAs offer low power, small packages, and high-bandwidth I/Os, such as PCIe Gen3 and Gigabit Ethernet. They’re well suited to edge AI, industrial IoT, 5G control planes, and other tasks.

Chiplets Gain Rapid Adoption: Why Big Chips Are Getting Small

Leading chip vendors such as AMD and Intel have adopted chiplet technology for several products. This technology can reduce cost for large 7nm designs by as much as 25%, according to our analysis; the savings are even greater at 5nm and beyond. We expect chiplets will be widely used for data-center processors and networking chips in these advanced nodes.

Expedera Redefines AI Acceleration for the Edge

Expedera is a small company with big ideas. Rather than optimizing the usual AI techniques, the company rethought neural-network acceleration from the ground up, creating a unique approach that greatly improves performance while maintaining consistent power and die area. This design is well suited to many consumer and automotive applications, enabling customers to increase the intelligence of their devices and add new capabilities to benefit their end users.

Growing AI Diversity and Complexity Demands Flexible Data-Center Accelerators

AI applications are becoming more diverse, even as models for specific applications rapidly advance. CPUs and GPUs offer the flexibility to handle new models, but they deliver poor throughput or efficiency for real-time inferencing. Purpose-built deep-learning accelerators excel for CNNs but often fare poorly on other model types. SimpleMachines developed a unique “composable computing” architecture that provides both programmability and efficiency.

Mach-NX: The Root of Trusted Systems

Robust system security requires a layered approach, and the root of trust must begin with a secure boot process. Building on its leadership position, Lattice advanced its secure-control platform by introducing the next-generation Mach-NX family. These new devices keep platform security one step ahead of emerging threats while easing customer designs.

Building Better AI Chips

As progressing to 7nm and beyond becomes ever more complex and expensive, GlobalFoundries is taking a different approach to improving performance by enhancing its 12nm node with lower operating voltages and new IP blocks. The changes are particularly effective for AI (neural-network) accelerators. The new 12LP+ technology builds on the success that the foundry’s customers have already achieved in AI acceleration.

Unified Inference and Training at the Edge

As more edge devices add AI capabilities, some applications are becoming increasingly complex. Wearables and other IoT devices often have multiple sensors, requiring different neural networks for each sensor, or they may use a single complex network to combine all the input data, a technique called sensor fusion. Others implement on-device training to customize the application. The GPX-10 processor can handle these advanced AI applications while keeping power to a minimum.

Deterministic Processing for Mission-Critical Applications

Time-sensitive applications such as automotive and robotics require fast and consistent response times. Features such as caches and branch prediction hamper responsiveness. SiFive CPUs can disable these features to deliver deterministic responses. They also combine Linux and RTOS CPUs in the same cluster to enhance responsiveness while offering smaller die area than competitors.

C-V2X Drives Intelligent Transportation

This white paper describes the benefits that cellular vehicle-to-everything (C-V2X) technology will provide by enabling vehicles to communicate directly with each other (V2V), transportation infrastructure (V2I), and network-connected service providers (V2N).

In-Memory Acceleration for Big Data

Standard CPUs can easily perform complex calculations on small datasets, but they struggle to handle larger datasets, particularly for simple tasks such as search. GSI Technology has created a new type of processor that combines memory and compute units on a single chip.

Certus-NX Innovates General-Purpose FPGAs

Certus-NX is the second product in Lattice's Nexus family, bringing the benefits of FD-SOI to a broader range of applications. It is well suited for many applications including smart home, IoT, consumer networking, and motor control.

Machine Learning Moves to the Edge

Embedded systems at the edge of the network will increasingly use machine learning (ML) to deliver new capabilities. Even as ML becomes more important, these new capabilities must be added without breaking the BOM cost and power budgets of existing products.  SiMa.ai is a new company developing a product to fit these requirements.

Accelerating AI Performance in an Open Software Ecosystem

Nvidia's software stack, based on its proprietary CUDA, creates a high barrier to entry for challengers. SYCL, the only open alternative to CUDA that has multivendor support, enables customers to move to higher performance hardware while retaining software flexibility. A commercial version of SYCL, branded ComputeCpp, is offered by Codeplay.

Arm Ecosystem Reduces SoC Design Cost and Time to Market

This paper discusses three critical facets of the Arm ecosystem: design verification, physical design, and software development. Each one directly affects the time required to develop a complete SoC and software stack.

AI Requires Many Approaches

Artificial intelligence (AI) is being applied to a wide range of problems, so no single processor can support them all. This paper describes and compares the various approaches needed for data center, autonomous driving, and consumer/IoT applications. 

Universal RDMA: A Unique Approach for Heterogeneous Data-Center Acceleration

Now available in 10G, 25G, 50G, and 100G Ethernet server adapters, RDMA reduces network processing overhead and minimizes latency. Cavium's FastLinQ stands out by handling both RoCE and iWARP, which are alternative protocols for RDMA over Ethernet.

EPYC Offers x86 Compatibility

AMD's new EPYC processor family delivers backward compatibility with a large installed base of server applications and third-party peripherals and adapters. In addition to being fully compatible with the x86 register set, EPYC supports all existing Broadwell instructions.

EPYC: Designed for Effective Performance

Measuring server-processor performance using clock speed (GHz) or even the traditional SPEC_int test can be misleading. AMD's new EPYC processor is designed to deliver strong performance across a wide range of server applications, meeting the needs of modern data centers and enterprises. 

ARC HS4x and HS4xD CPUs: New Dual-Issue Architecture Boosts Embedded Processor Performance

This white paper describes the Synopsys DesignWare® ARC® HS4x and HS4xD series of licensable CPU cores. These are the company’s newest CPUs for embedded applications requiring 32-bit RISC performance in a small silicon footprint with minimal power consumption. 

Performance Arms X-Gene 3 for Cloud

Cloud data centers run many different workloads with different performance requirements. Today's ARM-based processors address only a few of these workloads, but the new X-Gene 3 offers a new level of performance that satisfies most cloud applications. The 16nm chip is the industry's first ARM-compatible processor that matches Xeon E5 in CPU throughput, per- thread performance, and power efficiency. It offers significant advantages in memory bandwidth and cost of ownership. AppliedMicro has already validated performance on first silicon and is now sampling X-Gene 3.

Multi-Gigabit Ethernet Controllers for Enterprise Networks and Gaming Systems

This paper describes the markets and applications for Ethernet speeds beyond 1Gbps in client systems. It discusses NBase-T technology, which forms the basis for the 2.5GBase-T and 5GBase- T standards under 802.3bz. It then describes client implementations, including Aquantia’s new AQtion controller chip.

Easing Heterogeneous Cache Coherent SoC Design using Arteris' Ncore Interconnect IP

Heterogeneous processing has become a hallmark of mobile SoCs, but designing cache coherency across these diverse processing elements can be difficult. Standard on-chip interfaces and network-on-a-chip (NoC) technology are the first step, giving architects IP to efficiently connect compute processing elements as different as CPUs, GPUs, and DSPs. Hardware IP to enable coherent communication between different types of compute engines is the next step. This white paper describes how Arteris’ Ncore IP can help architects design processors fully supporting coherency between heterogeneous elements.

X-Gene 3 Challenges Xeon E5

AppliedMicro’s new X‑Gene 3 processor design combines 32 ARMv8-compatible CPU cores to deliver an estimated 550 SPECint_rate, competitive with the performance of today’s fastest Xeon E5 processors. With eight DDR4 memory channels, X‑Gene 3 even outclasses Xeon E5 in memory bandwidth, making it well suited to memory-intensive applications. AppliedMicro expects to sample the new design in 2H16, leading to production shipments in 2H17, about the same schedule as for Intel’s Skylake server products.

Beyond the Data Center: How Network-Function Virtualization Enables New Customer-Premise Services

This paper describes how network-function virtualization (NFV) and software-defined networking (SDN) will help network operators profit from greater flexibility and the faster rollout of new revenue-generating services. Important building blocks in this transformation are embedded processors optimized for networking and communications. NXP’s QorIQ processors are well positioned to meet the requirements of virtualized network services.

Low-Power Design Using NoC Technology

Network-on-a-chip (NoC) technology is not just for high-performance SoC designs. The size and power of the NoC can scale down to accommodate even very small and low-power processors. Furthermore, the NoC helps automate the chip's power management. The NoC can also simplify designing a single die that produces multiple end products. This white paper describes how a NoC can achieve these advantages, using TI's CC26xx microcontroller as a case study.

Carrier Aggregation Turbocharges Mobile Apps

This white paper describes the benefits that carrier aggregation will provide for operators and users of LTE-Advanced wireless networks. Carrier aggregation enables downlink and uplink in multiple frequency bands to increase connection speed and network capacity over single-carrier LTE systems. These improvements enable higher performance in mobile devices, but require SoC designs that take full advantage of the faster data rates. Users will benefit from new and enhanced applications that exploit the capabilities that carrier aggregation will provide.

Automating Front-End SoC Design With NetSpeed's On-Chip Network IP

This white paper describes NetSpeed's NocStudio design tool and the Orion and Gemini licensable network-on-chip (NoC) products. Orion is a configurable on-chip interconnect fabric, and Gemini adds cache coherence for processor cores, acceleration engines, and other components. NocStudio is a pre-RTL design tool that enables architects to customize these NoCs and compare alternative topologies before committing the design to a C-level simulation or RTL. NocStudio is also capable of automatically generating a network topology that connects all the IP blocks in a preliminary layout that optimizes the design for performance, power efficiency, die area, low latency, and deterministic quality of service (QoS).

Achieving Energy Efficiency with EFM32 Gecko Microcontrollers

This paper describes Silicon Lab's EFM 32 Gecko Microcontroller family. This large product portfolio of 32-bit MCUs includes more than 240 devices based on the ARM Cortex-M0+, Cortex-M3, and Cortex-M4 processor cores. Despite this variety, all EFM32 MCUs are software compatible with each other, and EFM32 chips with the same package configuration are pin compatible. They also have many common peripherals and other features.

ARC HS38: Single- and Multicore CPUs for High-Speed Linux Processing on an Embedded Budget

Designers of performance-intensive, embedded SoCs running Linux or other virtual-memory operating systems must address increasing performance requirements with power budgets that are often constant or shrinking. Available processors that offer the needed performance often draw too much power, while processors that fit within the power budget lack the necessary performance. The DesignWare® ARC® HS38 multicore processor is designed specifically for embedded Linux applications and enables chip designers to create single-, dual- or quad-core configurations with an MMU and fast L2 cache. Read this white paper to learn about the ARC HS processor architecture features and new features in the ARC HS38 processor.

Always Listening, Always On: Advances in Sensory Processing

Smartphones use sensors and voice input to become aware of their surroundings and to interact more naturally with their users. Smartphone makers continue to develop more advanced capabilities, adding new types of sensors and more sophisticated voice functions. If implemented improperly, these changes can greatly reduce battery life. This paper explores the latest advances in sensory processing and how new semiconductor products can implement these features using minimal power.

Xilinx SDNet: A New Way to Specify Network Hardware

This paper examines Xilinx's SDNet specification environment and its role, both in defining elements in Software-Defined Networks, and in implementing reconfigurable network elements in both control and data plane.

Qualcomm Pushes Mobile to UltraHD

Qualcomm's new Snapdragon 805 processor redefines the mobile experience to focus on video and imaging capabilities. It is the first mobile processor with end-to-end 4K support, enabling users to capture, decode, and display video in UltraHD resolution. When paired with Qualcomm's next-generation Gobi 9x35 LTE Advanced modem chip, the platform offers the fastest cellular downloads as well. This white paper describes the momentum behind 4K video and explains how the Snapdragon 805 platform offers the best support for this emerging standard.

Analyzing The Power of Mobile SoCs: Pitfalls and Best Practices

Power is a critical part of the user experience, but assessing the power of a mobile SoC is challenging. We recommend measuring power for the entire chipset rather than the SoC alone, as this method captures system-level optimizations that the vendor has made. A full assessment of the chipset's power characteristics requires testing across a variety of use cases. A chipset that excels in one area may fare poorly in others, depending on the design choices the chip vendor has made. Many use cases measure power for a fixed task, but some cases require measuring both power and performance for a fair comparison.

Synopsys ARC HS Processors: High-Speed Licensable CPU Cores for Embedded Applications

This white paper describes the Synopsys DesignWare® ARC® HS (High Speed) processor family. ARC HS34 and HS36 are the first members of the company's newest family of licensable CPU cores for embedded applications that need 32-bit RISC performance in a small silicon footprint with minimal power consumption. The Linley Group prepared this report after evaluating ARC HS performance data and technical features.

A New Era of Network Processing

This paper examines traditional network-processor (NPU) architectures, technology trends driving new requirements, limitations of NPUs and CPUs, and new architectures that overcome these challenges.

Vectoring and Bonding Renews DSL

DSL data rates have failed to keep up with those of cable and fiber networks. The maximum throughput in DSL connections is limited by the crosstalk between adjacent wire pairs bundled together in a cable or binder. Vectored-DSL technology mitigates and even eliminates crosstalk to delivers a big boost and allows Telcos to provide bandwidth similar to that offered by cable operators. Vectoring enables the VDSL2 downstream data rate to reach around 150Mbps. Using two-pair bonding, Telcos can further increase the data rate to 200Mbps or greater.

Synopsys DesignWare ARC EM Family: Efficient CPU Cores for Embedded Applications

This paper describes Synopsys’s DesignWare® ARC™ EM Processor Family, the company’s newest licensable CPU cores for embedded applications that benefit from 32-bit RISC performance with a tiny silicon footprint and minimal power consumption. According to vendor testing with EEMBC, SPEC, and other benchmarks, the newest ARC EM CPUs have excellent code density while delivering high performance using less power in a small silicon-area footprint. The Linley Group prepared this report after evaluating performance data and technical features for the recently upgraded EM4 and EM6 CPU cores.

AppliedMicro’s X-Gene: Minimizing Power in Data-Center Servers

In cloud data centers, power-hungry server processors drive up operating costs both directly and indirectly. As these data centers grow, cloud server providers are seeking to reduce costs by using lower-power processors. ARM technology is one approach that can provide significant power savings. AppliedMicro is developing an ARM-compatible 64-bit server processor called X-Gene that will deliver a leap forward in power efficiency.

The Precision32 Family of Mixed-Signal MCUs

This paper describes Silicon Labs’ new Precision32™ microcontrollers, the company’s first 32-bit MCUs. In addition to the ARM-compatible CPU, the chips integrate USB and a set of analog components along with the usual flash memory, SRAM, timers, and serial interfaces.

QLogic's 3GCNA: SAN Leadership Meets LAN Convergence

This white paper examines QLogic's third-generation converged network adapter (CNA) technology, which combines elements from the company's Fibre Channel HBA, iSCSI HBA, and intelligent NIC product lines.

Suite B: Classified Network Security Goes Commercial

This white paper examines the need for Suite B security, the underlying technology, chip-level requirements, and real-world implementations.

Marvell Processor Development

This paper describes the lengthy effort behind Marvell's launch of their Sheeva processor line. We examine these new processors and their applicability to communications, printers, storage, consumer, and mobile applications, and provide a peek at some next-generation CPUs.

Single-Chip Control/Data-Plane Processors

This paper examines the trend toward combining control-plane and data-plane processing on a single chip. It discusses the technologies driving this trend, the common features of these chips, their advantages and disadvantages, and how they are being deployed today and into the future.

Free Newsletter

Linley Newsletter
Analysis of new developments in microprocessors and other semiconductor products


Linley Spring Processor Conference 2022
Conference Dates: April 20-21, 2022
Hyatt Regency Hotel, Santa Clara, CA
Linley Fall Processor Conference 2021
Held October 20-21, 2021
Proceedings available
Linley Spring Processor Conference 2021
April 19 - 23, 2021
Proceedings Available
More Events »