|
A Guide To Server Processors Second Edition Published November 2012 Authors: Jag Bolaria and Bob Wheeler Single License: $3,495 (single copy, one user) |
Examining the Processors Powering Scalable Computing
The market for server processors is changing, creating openings for new vendors. With the emergence of mega data centers and cloud computing, server economics no longer focus on capital expenses alone. Demand for ultimate performance from a single processor has been replaced by a balanced view of capital and operating costs. Performance per watt and performance per watt per dollar are the new metrics driving purchasing decisions in large data centers. Physical density is also growing in importance, driving greater scalability and new form factors such as microservers that pack more nodes into precious rack space.
The market is moving to a new era of where backward compatibility is less important than before and innovation takes the front seat. Intel and AMD—the incumbent vendors—continue to innovate and advance their Xeon and Opteron designs, respectively. Integration, microarchitecture advances, and process technology are the primary factors when evolving these x86 processors. But new entrants are eyeing cloud-computing environments as an opening for radically different architectures or more power-efficient CPU architectures.
Product Information Tempered By In-Depth Analysis
This report covers processors designed specifically for servers. We provide detailed coverage of Intel’s Xeon product lines across Westmere, Sandy Bridge, and Ivy Bridge generations. We cover AMD’s Opteron family and details of the company’s Bulldozer microarchitecture. The new vendors include Tilera, which is shipping 64-bit processors today. Because other vendors entering this market are using ARM-based designs, we provide coverage of ARM’s intellectual-property cores including the Cortex-A15. Vendors of ARM-based server processors include AppliedMicro, Calxeda, Cavium, and Marvell. Startup Calxeda and larger Marvell are shipping 32-bit ARM processors, whereas AppliedMicro and Cavium are developing 64-bit ARMv8 chips as their first server offerings.
New to this edition is coverage of coprocessors (or accelerators) for high-performance computing (HPC). Our initial coverage includes Intel’s new Xeon Phi (Knights Corner) and Nvidia’s Tesla GK110 (Kepler). These board-level products promise to break the double-precision teraflops barrier for floating-point performance.
This report analyzes each vendor and each product, probing their strengths and weaknesses and presenting key details in a consistent, easy to compare fashion. We examine processor performance, integration, power dissipation, and overall system design. Where possible, we also look at the vendors' roadmap.
Make Informed Decisions
As the leading vendor of technology analysis for microprocessors, The Linley Group has the expertise to deliver a comprehensive look at these technologies. Authors Jag Bolaria and Bob Wheeler use their broad experience to deliver the technical and strategic information you need to make informed business decisions. And in case you are not familiar with all of the concepts involved in processor and server designs, the report includes several introductory chapters that define and describe terms such as superscalar, multithreading, pipelines, and virtualization.
This report is written for:
- OEMs that need to make strategic vendor selections
- ODMs supplying cloud-computing and HPC customers
- Data-center architects looking at alternative platforms
- Marketing and engineering staff at companies that sell other server components
- Financial analysts who desire a detailed analysis and comparison of both incumbent and new vendors
What's New in This Edition
“A Guide to Server Processors” has been extensively updated to include the latest vendor disclosures.
Here are some of the many changes you will find:
- Coverage of many new products from Intel, including Xeon E5 (Sandy Bridge), Xeon E3-1200v2 (Ivy Bridge), and Xeon Phi (Knights Corner)
- Coverage of AMD’s Bulldozer-based Opteron 4200 (Valencia) and 6200 (Interlagos) processors
- Coverage of Calxeda’s first ARM-based server SoC, the ECX-1000
- Coverage of AppliedMicro’s X-Gene processor, which should be the industry’s first 64-bit (ARMv8) product
- Coverage of Cavium’s Project Thunder, a multicore ARMv8 design built off of the company’s successful Octeon architecture
- Coverage of ARM’s new ARMv8 cores and fabric IP
- New coverage of NVIDIA’s Tesla accelerators for high-performance computing, focusing on the new Kepler generation
- Extensive updates to company-background information, roadmaps, and analysis
- Forecast for merchant server processors through 2016
- Revised and updated tutorials
The market for server processors is changing, creating openings for new vendors. With the emergence of mega data centers and cloud computing, server economics no longer focuses on capital expenses alone. Demand for ultimate performance from a single processor has been replaced by a balanced view of capital and operating costs. Performance per watt and performance per watt per dollar are the new metrics driving purchasing decisions in large data centers. Physical density is also growing in importance, driving greater scalability and new form factors such as microservers that pack more nodes into precious rack space.
The market is moving to a new era where backward compatibility is less important than before and innovation takes the front seat. Intel and AMD—the incumbent vendors—continue to innovate and advance their Xeon and Opteron designs, respectively. Integration, microarchitecture advances, and process technology are the primary factors when evolving these x86 processors. But new entrants are eyeing cloud-computing environments as an opening for radically different architectures and more-power-efficient CPU designs. With the merchant server-processor market exceeding $7 billion, success requires taking only a few percentage points of share from Intel.
Having reached practical power limits, server-processor designers are increasing performance primarily by adding cores rather than increasing clock speeds. Mainstream x86 processors now offer 16 cores per chip, while startup Tilera is already shipping 36-core processors. Mainstream server processors are currently using 32nm technology, although Intel is shipping its first processors using the 22nm node. By moving to a finer geometry process, vendors get more transistors in the same die area and power envelope. This additional transistor count can be used to add CPUs or to increase cache sizes. Larger caches increase performance by absorbing DRAM latency, which is not decreasing as rapidly as processor performance is growing.
Following the earlier integration of memory controllers, processors are now integrating PCI Express (PCIe) controllers. This step eliminates one system-logic component, the “north bridge,” reducing the chipset to a single “south bridge” chip. For desktop and workstation applications, processors are also integrating the graphics controller. New entrants like Calxeda, Marvell, and Tilera offer system-on-a-chip (SoC) designs that integrate Ethernet controllers in addition to memory and PCIe controllers. For high-density server form factors, this additional level of integration provides differentiation.
Intel offers the broadest line of server processors, which are built using its industry-leading process technology. The company’s current two-socket design is the Xeon E5-2600 platform (Romley) based on the 32nm Sandy Bridge architecture. New to Intel’s Xeon lineup is a four-socket version of this platform, the E5-4600. This new platform offers a lower-cost alternative to the Westmere-based Xeon E7 line, which is designed for scale-up servers with four or more sockets (4P and above). By adding reliability, availability, and serviceability (RAS) features to the E7 line, Xeon processors now serve mission-critical designs that formerly required Itanium (IA-64) processors.
Using its tick-tock development approach, Intel shrank Sandy Bridge to produce the 22nm Ivy Bridge architecture. The first server processors to use Ivy Bridge comprise the Xeon E3-1200 v2 line, which serves uniprocessor designs. By the end of 2012, Intel plans to introduce a server processor based on its low-power Atom architecture. At six watts, this chip will compete with new entrants fielding ARM-based designs.
AMD’s Opteron line of server processors offers a good alternative to Intel for high-volume two- and four-socket platforms. With its 16-core Opteron 6200 (Interlagos), AMD continues to lead in core count for x86 processors. The 32nm Opteron 6200 processors are based on a novel microarchitecture called Bulldozer that improves density. The company also offers Opteron 4200 and 3200 processors based on this design. In 4Q12, AMD introduced the Opteron 6300 Series (Abu Dhabi), which uses an update to Bulldozer called Piledriver. AMD has also licensed ARM’s Cortex-A57 CPU and plans to introduce processors using this core in 2014.
New entrants fielding non-x86 server processors are targeting cloud computing, where compatibility requirements are more manageable compared with traditional enterprise applications. Most of these new vendors implement the ARM instruction set using either licensed or custom CPU designs. Marvell was first to market with a four-core ARM processor suitable for servers. Startup Calxeda added unique server features to its quad-core ARM processor. Other vendors are skipping 32-bit ARM designs and instead using the new 64-bit ARMv8 instruction set. AppliedMicro and Cavium are designing custom 64-bit CPUs for their respective server processors, whereas ARM is licensing its new Cortex-A57 (Atlas) core to vendors that prefer an off-the-shelf CPU design.
One vendor, Tilera, relies solely on open-source Linux distributions and open-source or customer-ported applications. The company is shipping 64-bit server processors with up to 36 proprietary CPUs connected using a unique on-chip mesh network. The startup’s chips deliver a several-fold improvement in performance per watt compared with x86 processors.
| Table of Contents |
| List of Figures |
| List of Tables |
| About the Authors |
| About the Publisher |
| Preface |
| Executive Summary |
| 1 Processor Technology |
| Processor Basics |
| Central Processing Unit (CPU) |
| Caches |
| MMUs and TLBs |
| Bus Bandwidth |
| CPU Microarchitecture |
| RISC Versus CISC |
| Endianness |
| Scalar and Superscalar |
| Instruction Reordering |
| Pipelining and Penalties |
| Branch Prediction |
| Server Processors and Technologies |
| What Is a Server Processor? |
| Multicore |
| Multithreading |
| System Buses |
| Memory Subsystem |
| PCI Express |
| Server Benchmarks |
| SPEC Benchmarks |
| TPC Benchmarks |
| VMmark |
| HPL |
| ApacheBench |
| 2 Instruction Sets |
| x86 Instruction Set |
| Background |
| Initial Instruction Set |
| ISA Extensions |
| ARM Instruction Set |
| Background |
| Initial Instruction Set |
| ARMv7 |
| ARMv8 |
| 3 Server System Technology |
| Basic Server Architecture |
| Main Memory |
| System-Logic Chipset |
| Baseboard-Management Controller |
| Storage |
| RAID |
| Storage Interfaces |
| High-Performance Computing |
| InfiniBand |
| RDMA Over Ethernet |
| MPI and OFED |
| Networking |
| Storage Networking |
| Form Factors |
| Operating Systems |
| Windows Server |
| Linux Server |
| Virtualization |
| Hypervisor Software |
| 4 Technology and Market Trends |
| Technology Trends |
| x86 Versus ARM |
| SoC Integration |
| The Main-Memory Bottleneck |
| Microservers |
| Cloud-Computing Workloads |
| High-Performance Computing |
| Market Outlook |
| Cloud Computing |
| Open Compute |
| Market Forecast and Segmentation |
| Market Share |
| 5 Intel |
| Company Background |
| Key Features and Performance |
| Sandy Bridge-Based Xeon Processors |
| Ivy Bridge-Based Xeon Processors |
| Westmere-Based Xeon Processors |
| Itanium Processors |
| Internal Architecture |
| System Design |
| Product Roadmap |
| Ivy Bridge |
| Haswell |
| Itanium |
| Atom-Based Processors: Centerton |
| Conclusions |
| 6 AMD |
| Company Background |
| Key Features and Performance |
| Internal Architecture |
| System Design |
| Product Roadmap |
| Conclusions |
| 7 Tilera |
| Company Background |
| Key Features and Performance |
| Internal Architecture |
| System Design |
| Development Tools |
| Product Roadmap |
| Conclusions |
| 8 ARM |
| Company Background |
| Key Features and Performance |
| Internal Architecture |
| System-on-a-Chip Design |
| Development Tools |
| Product Roadmap |
| Conclusions |
| 9 AppliedMicro |
| Company Background |
| Key Features and Performance |
| Design Details |
| Conclusions |
| 10 Calxeda |
| Company Background |
| Key Features and Performance |
| Design Details |
| Product Roadmap |
| Conclusions |
| 11 Cavium |
| Company Background |
| Key Features and Performance |
| Conclusions |
| 12 Marvell |
| Company Background |
| Key Features and Performance |
| Internal Architecture |
| System Design |
| Development Tools |
| Product Roadmap |
| Conclusions |
| 13 HPC Coprocessor Vendors |
| Intel Xeon Phi |
| Company Background |
| Key Features and Performance |
| Internal Architecture |
| Programming Model and Tools |
| Conclusions |
| Nvidia Tesla |
| Company Background |
| Key Features and Performance |
| Design Details |
| Product Roadmap |
| Conclusions |
| 14 Processor Comparisons |
| Microserver Processors |
| Performance |
| Integration |
| Uniprocessor Platforms |
| Performance |
| Integration |
| Two-Socket Platforms |
| Performance |
| Integration |
| Four-Socket Platforms |
| Performance |
| Integration |
| Conclusions |
| 15 Conclusions |
| Vendor Outlook |
| Intel |
| AMD |
| Tilera |
| ARM |
| Calxeda |
| Marvell |
| AppliedMicro and Cavium |
| HPC Coprocessors |
| Closing Thoughts |
| Appendix: Further Reading |
| Index |
| Figure 1‑1. Basic processor design. |
| Figure 1‑2. Simple superscalar processor design. |
| Figure 1‑3. CPU pipelining examples. |
| Figure 1‑4. Block diagram of a typical server processor. |
| Figure 1‑5. Interleaved tasks on a multithreaded CPU. |
| Figure 3‑1. Typical server architecture. |
| Figure 3‑2. Rack-mount servers and a standard-size rack. |
| Figure 3‑3. IBM's BladeCenter H. |
| Figure 3‑4. Typical blade-server architecture. |
| Figure 4‑1. Dell's PowerEdge C5000 12-bay microserver. |
| Figure 4‑2. Server-processor shipment forecast and segmentation. |
| Figure 4‑3. Server-processor unit share, 2009-2011. |
| Figure 5‑1. Intel server-processor roadmap. |
| Figure 5‑2. Block diagram of Intel Sandy Bridge microarchitecture. |
| Figure 5‑3. Block diagram of Intel Xeon E5-2600. |
| Figure 5‑4. Server design based on Intel Xeon E3-1200 v2. |
| Figure 5‑5. Dual-socket server design based on Intel Xeon E5-2600. |
| Figure 5‑6. Four-socket server design based on Intel Xeon E7. |
| Figure 6‑1. Diagram of Bulldozer CPU module. |
| Figure 6‑2. Block diagram of Bulldozer microarchitecture. |
| Figure 6‑3. Opteron 4200 system design. |
| Figure 6‑4. Opteron 6200 system design. |
| Figure 7‑1. Block diagram of Tilera Tile-Gx3036. |
| Figure 7‑2. Tilera Tile-Gx3036 server design. |
| Figure 8‑1. Block diagram of Cortex-A15 microarchitecture. |
| Figure 8‑2. Block diagram of ARM Cortex-A15 in an SoC. |
| Figure 9‑1. Block diagram of X-Gene CPU. |
| Figure 9‑2. Block diagram of X-Gene SoC. |
| Figure 10‑1. Block diagram of Calxeda ECX-1000. |
| Figure 10‑2. Block diagram of Calxeda four-node system. |
| Figure 11‑1. Conceptual block diagram of Cavium Thunder. |
| Figure 12‑1. Microarchitecture of Marvell PJ4B. |
| Figure 12‑2. Block diagram of Armada XP. |
| Figure 13‑1. Xeon Phi coprocessor card. |
| Figure 13‑2. Microarchitecture of Intel Xeon Phi core. |
| Figure 13‑3. Simplified block diagram of Xeon Phi coprocessor. |
| Figure 13‑4. Tesla GK110 SMX array. |
| Table 1‑1. Selected SPEC benchmarks. |
| Table 5‑1. Product lines and selected versions of Intel Xeon processors. |
| Table 5‑2. Key parameters for selected Intel Xeon E5-series processors. |
| Table 5‑3. Key parameters for selected Intel Itanium 9300-series processors. |
| Table 6‑1. Key parameters for selected Opteron processors. |
| Table 6‑2. Key parameters for Bulldozer-based Opteron processors. |
| Table 6‑3. Key parameters for AMD SR56x0 north-bridge chips. |
| Table 6‑4. Key parameters for AMD SP5100 south-bridge chip. |
| Table 7‑1. Key parameters for Tilera TilePro64 and Tile-Gx3036 processors. |
| Table 8‑1. Key parameters for ARM Cortex-A9 and Cortex-A15. |
| Table 10‑1. Key parameters for Calxeda ECX-1000 processor. |
| Table 12‑1. Key features for Armada XP processors. |
| Table 13‑1. Key parameters for Xeon Phi coprocessor cards. |
| Table 13‑2. Key parameters for Nvidia Tesla coprocessor cards. |
| Table 14‑1. Comparison of microserver processors. |
| Table 14‑2. Comparison of high-performance single-socket processors. |
| Table 14‑3. Comparison of processors for dual-socket servers. |
| Table 14‑4. Comparison of processors for four-socket servers. |






