|
When the FPGA was invented 20 years ago, the idea that most customers
would not use all of the gates on the chip was considered wasteful
and radical. But in time, as the number of transistors per
chip skyrocketed, this concern was quickly forgotten.
Today, we have reached the point where a chip can have hundreds
of ALUs or CPUs, more than most customers can use. While some
might consider this wasteful, others see the next step
in the evolution
of the FPGA.
For example, Cswitch is developing an FPGA that includes dedicated
networking logic and interface controllers surrounded by a sea
of traditional programmable gates. Rejecting gates altogether,
Ambric is developing a design that includes an array of CPU and
memory blocks that can be configured for a variety of tasks.
Both companies plan to sample chips by the end of this
year. New
Approaches
Cswitch’s
architecture includes dedicated packet parsers that can be programmed
to extract data from packet headers. Hardwired arithmetic units
can perform packet editing or math operations. Built-in CAMs
can be configured in binary or ternary modes. For storing tables
and buffers, the architecture includes a sizable dual-port RAM
as well as a larger amount of single-port RAM.
All of this dedicated logic is wrapped within arrays of standard
programmable gates and connected via a high-speed point-to-point
interconnect. The chip also embeds DRAM controllers, Ethernet
and Fibre Channel MACs, and serdes that operate at up to
6.4Gbps. These
serdes can support PCI Express, XAUI, Gigabit Ethernet, Fibre
Channel, and other high-speed interfaces.
Using these special function blocks, the Cswitch chip can implement
common communications and packet-processing applications much
more efficiently than a standard FPGA can. By programming the
gates
and configuring the dedicated logic, the chip can implement
a wide variety of applications and protocols.
Ambric’s first chip, code-named Kestrel, contains 360 CPUs
divided into 45 “brics,” each with 13KB of SRAM. Each
CPU is a complete 32-bit processor; half are enhanced with DSP
extensions to improve math throughput. Each CPU can easily communicate
with nearby CPUs, allowing data to flow across the chip. To program the Ambric chip, software is divided into a series
of objects that each comprises a simple task. Ambric’s compiler
assigns the objects to physical CPUs. Each CPU can process incoming
data, then pass it along to another CPU for further processing.
In this way, each CPU is similar to a function block implemented
as a group of FPGA gates; in Ambric’s design, the function
is simply implemented in software rather than hardware.
At 333MHz, Kestrel can perform 1.08 trillion operations
per second, including 60 billion multiply-accumulate
operations
per second
(60 GMACS). To support this compute rate, the chip includes
3.2GB/s of DDR2 SDRAM bandwidth, PCI Express, and other
high-speed I/O.
Despite using inexpensive 130nm manufacturing, on many
applications the initial Ambric chip delivers performance
similar to or
better than that of the leading 90nm FPGAs. Because each
CPU operates
independently, communicating only with its neighbors,
the Ambric architecture will easily scale into next-generation
manufacturing
technology, simply by adding brics.
Improving
Efficiency
Programmable
gates are far less efficient than dedicated logic. A hardwired
processor is faster, smaller, and uses less power than
a processor implemented in programmable logic. With today’s
advanced IC processes, a complete CPU is tiny, and a packet
processor is even smaller. Thus, it makes sense to scatter
these blocks around the chip while allowing customers to
connect and configure these processors for their own specific
application.
Xilinx, the leading FPGA vendor, is not ignoring this
trend. The company’s Virtex 4 FX includes DSP blocks, Ethernet
MACs, and one or two PowerPC CPUs. This approach satisfies
the needs of many customers, particularly in cellular base
stations and other DSP-intensive applications.
Cswitch, however, should have an advantage in networking
and other applications that focus on protocol processing,
not signal processing. Ambric’s architecture is well
suited to computationally intensive applications such as
video encoding, cellular protocols, and security processing.
One challenge for the newcomers is making it simple for
engineers to use their architectures. Cswitch has partnered
with Magma
to fit its technology into standard tool flows. Ambric
offers an integrated development environment as well
as a library
of precoded software. But designers must still learn
the details of these new architectures to take full
advantage of their performance.
For this reason alone, many designers will stick with
the traditional FPGA and the familiar tools that
go with it.
But those who are pushing the limits of power or
performance, particularly in communications or video,
should consider
these next-generation architectures.
Originally published in Nikkei
Electronics Asia,
October 2006
© 2002-2006 The Linley Group
|