Multicore Design Issues

By Linley Gwennap    

At a recent Linley Tech seminar in San Jose, a panel of IP experts discussed trends in system-on-a-chip (SoC) design using third-party CPU cores. The seminar highlighted some key trends, particularly in the use of multiple CPU cores. Speakers from Freescale, IBM, MIPS, Tensilica, and ARM participated.

More, Faster CPUs

The trend toward multicore designs was evident in results of a survey of the seminar attendees, who were mainly designers of networking and communications SoCs. About 60% of the attendees were designing a chip with more than one CPU core, and nearly half of those were using four or more cores. Multicore designs are well suited to packet processing, particularly in the data plane. And once software is ported to run on multiple cores, the step from two CPUs to four is relatively easy.

There was some debate among the panelists about the pace of this trend. IBM’s Harry Linzer reported that only a small percentage of his customers are involved in multicore designs. But Tensilica’s Sumit Gupta sees many customers doing multicore designs, particularly in the data plane. This difference may reflect on the company’s products: IBM’s cores are more powerful, but Tensilica’s are small enough that a chip can easily include several of them.

The speed of licensed CPUs continues to increase as well. Whereas most designers were implementing 200–300MHz CPUs a couple of years ago, 77% of the attendees surveyed said their current designs use CPUs at 300MHz or above. In fact, some expect their licensed CPU to exceed 600MHz, a mark that few designs achieve today. But the newest CPUs, such as ARM’s Cortex, combined with increasing usage of 90nm and even 65nm technology, are driving clock speeds to new heights.

Multithreading vs. Multicore

Darren Jones of MIPS discussed the advantages of multithreading. The MIPS 34K is the only commercially available multithreaded CPU core, although other vendors are using this technology internally. Darren noted that multithreading adds only a small amount of die area to the 34K CPU while improving performance by 60% on certain EEMBC benchmarks.

Gupta pointed out that Tensilica’s Diamond 570T CPU core, at 0.5mm2, is so small that four of these CPUs can fit in the same space as a single 34K. The 570T runs at about half the clock speed of the MIPS 34K, but on highly parallel applications, a four-core configuration could deliver better performance than a single 34K.

Most applications don’t scale linearly with multiple CPUs, however, so a four-core design won’t deliver four times the single-core performance. Jones also pointed out that a multithreaded design, such as the 34K, generates better single-thread performance than a multicore design. For example, if a high-priority thread needs maximum performance, the 34K can devote all its CPU cycles to that thread, whereas the Tensilica design would be limited to the speed of a single, slower CPU.

According to Jones, the 34K supports up to nine thread contexts to allow programmers to “park” critical routines in the CPU, avoiding the need to fetch thread state. For example, placing an interrupt handler in one of the thread contexts will greatly reduce interrupt response time.

Dealing With Complexity

Moving from a single-core to a multicore model requires major software work. The hardware impact is debatable. Toby Foster of Freescale noted that simply combining two CPUs that have already been validated together is not a difficult challenge, particularly since modern SoCs have extensive custom logic outside of the CPU that are usually the design bottleneck. In fact, Gupta argued that a multicore design is simpler than a design containing several special-purpose hardware engines; these custom engines can be replaced with off-the-shelf CPUs plus software.

Another complexity of multicore designs is the need for a high-bandwidth interconnect for the CPUs. Even an SoC with only one CPU may have high-speed memory and I/O controllers, or fixed-function accelerators, that must be efficiently interconnected. ARM’s Dave Steer noted a trend from buses to fabrics in order to achieve the necessary bandwidth while supporting multiple, nonblocking transfers. Many large vendors have an in-house fabric for this purpose, but attaching a licensed CPU to this custom interconnect often requires significant design effort.

Support from the CPU vendor is critical in a multicore design. Vendors such as ARM provide MP-validated CPUs with system-level simulation tools that can identify inter-CPU problems before fabrication. Freescale will instead design and validate a semicustom chip to meet a customer’s specifications, offloading validation. To achieve a successful design, any multicore designer must take these issues into account.


Originally published in
Nikkei Electronics Asia, March 2007




© 2002-2007 The Linley Group