FPGA Evolution

By Linley Gwennap    


When the FPGA was invented 20 years ago, the idea that most customers would not use all of the gates on the chip was considered wasteful and radical. But in time, as the number of transistors per chip skyrocketed, this concern was quickly forgotten.

Today, we have reached the point where a chip can have hundreds of ALUs or CPUs, more than most customers can use. While some might consider this wasteful, others see the next step in the evolution of the FPGA.

For example, Cswitch is developing an FPGA that includes dedicated networking logic and interface controllers surrounded by a sea of traditional programmable gates. Rejecting gates altogether, Ambric is developing a design that includes an array of CPU and memory blocks that can be configured for a variety of tasks. Both companies plan to sample chips by the end of this year.

New Approaches

Cswitch’s architecture includes dedicated packet parsers that can be programmed to extract data from packet headers. Hardwired arithmetic units can perform packet editing or math operations. Built-in CAMs can be configured in binary or ternary modes. For storing tables and buffers, the architecture includes a sizable dual-port RAM as well as a larger amount of single-port RAM.

All of this dedicated logic is wrapped within arrays of standard programmable gates and connected via a high-speed point-to-point interconnect. The chip also embeds DRAM controllers, Ethernet and Fibre Channel MACs, and serdes that operate at up to 6.4Gbps. These serdes can support PCI Express, XAUI, Gigabit Ethernet, Fibre Channel, and other high-speed interfaces.

Using these special function blocks, the Cswitch chip can implement common communications and packet-processing applications much more efficiently than a standard FPGA can. By programming the gates and configuring the dedicated logic, the chip can implement a wide variety of applications and protocols.

Ambric’s first chip, code-named Kestrel, contains 360 CPUs divided into 45 “brics,” each with 13KB of SRAM. Each CPU is a complete 32-bit processor; half are enhanced with DSP extensions to improve math throughput. Each CPU can easily communicate with nearby CPUs, allowing data to flow across the chip.

To program the Ambric chip, software is divided into a series of objects that each comprises a simple task. Ambric’s compiler assigns the objects to physical CPUs. Each CPU can process incoming data, then pass it along to another CPU for further processing. In this way, each CPU is similar to a function block implemented as a group of FPGA gates; in Ambric’s design, the function is simply implemented in software rather than hardware.

At 333MHz, Kestrel can perform 1.08 trillion operations per second, including 60 billion multiply-accumulate operations per second (60 GMACS). To support this compute rate, the chip includes 3.2GB/s of DDR2 SDRAM bandwidth, PCI Express, and other high-speed I/O.

Despite using inexpensive 130nm manufacturing, on many applications the initial Ambric chip delivers performance similar to or better than that of the leading 90nm FPGAs. Because each CPU operates independently, communicating only with its neighbors, the Ambric architecture will easily scale into next-generation manufacturing technology, simply by adding brics.

Improving Efficiency

Programmable gates are far less efficient than dedicated logic. A hardwired processor is faster, smaller, and uses less power than a processor implemented in programmable logic. With today’s advanced IC processes, a complete CPU is tiny, and a packet processor is even smaller. Thus, it makes sense to scatter these blocks around the chip while allowing customers to connect and configure these processors for their own specific application.

Xilinx, the leading FPGA vendor, is not ignoring this trend. The company’s Virtex 4 FX includes DSP blocks, Ethernet MACs, and one or two PowerPC CPUs. This approach satisfies the needs of many customers, particularly in cellular base stations and other DSP-intensive applications.

Cswitch, however, should have an advantage in networking and other applications that focus on protocol processing, not signal processing. Ambric’s architecture is well suited to computationally intensive applications such as video encoding, cellular protocols, and security processing.

One challenge for the newcomers is making it simple for engineers to use their architectures. Cswitch has partnered with Magma to fit its technology into standard tool flows. Ambric offers an integrated development environment as well as a library of precoded software. But designers must still learn the details of these new architectures to take full advantage of their performance.

For this reason alone, many designers will stick with the traditional FPGA and the familiar tools that go with it. But those who are pushing the limits of power or performance, particularly in communications or video, should consider these next-generation architectures.

 


Originally published in
Nikkei Electronics Asia, October 2006




© 2002-2006 The Linley Group