NEC has boosted the performance of its little-known V800 line of microprocessors with a pair of new chips that improve integer performance and address computationally intensive systems. The new chips, which are sampling now, have a slightly different instruction set that makes them more mainstream than their predecessors.

NEC's new V830 and V831 processors boost the clock rate to 100 MHz and add a number of new instructions aimed at improving numerical performance. The new processors support 32-bit integer arithmetic, including multiply-accumulate (MAC), and are aimed at popular high-volume applications such as printers, television decoders, and automobile accessories.

In price and performance, the new chips lie somewhere between NEC's existing V850 chips and its well-known MIPS processors. The V830s offer midrange integer performance, good code density, and low prices, while the MIPS chips continue to serve applications that require floating-point math or Windows CE compatibility.

NEC has its eye on newer consumer-electronics items, such as set-top boxes, DVD players, and Web terminals. In this market, the V830 competes with StrongArm, Motorola's ColdFire, and IBM's PowerPC chips, among others.

New Generation Extends Long V800 Line

NEC's V800 family has been lurking in the shadows since the original V810 rolled out in 1993 (see MPR 6/21/93, p. 9). The V810 and its siblings, the V805 and V820, were never (and still are not) marketed in North America despite moderate success in Japan. The V810 inaugurated the V800 architecture and instruction set, with its five-stage pipeline, 32 * 32-bit register file, and combination of 16-bit and 32-bit instructions. These first chips were heavy on bit-string manipulation and included rudimentary floating-point instructions.

The next generation, which includes the V851 through the V854, dropped the V810's string instructions, FP instructions, and separate I/O space while adding a 16 * 16-bit hardware multiplier. Otherwise, the first- and second-generation V810 and V850 parts are binary-compatible.

The third-generation V830 extends the V850's hardware multiplier to a full 32 bits, adds caches and some new instructions, and reinstates the V810's separate I/O and memory maps. The V830 also comes with an all-important multiply-accumulate instruction and supports nondestructive, three-operand addressing for the first time. The changes to the V830 were substantial enough that NEC had to alter the underlying instruction encoding--breaking binary compatibility with the V810 and V850 generations.

This break in compatibility prevents any easy upgrades from the V850 (or, for Japanese customers, the V810) to the V830. NEC offers the usual set of in-house development tools, including an assembler and a C compiler, but there is no binary--or even assembly-level--translator. In this sense, the V830 imitates Motorola's ColdFire: the architecture is familiar, but code is not transportable.

Enhancements Add Media-Handling Math

The instruction-level changes to the V830 aren't major, and at first they don't appear substantial enough to warrant a new binary encoding scheme. The V830 extends support for signed and unsigned multiplication and division from 16 to 32 bits and adds three-operand forms of multiplication and subtraction, saturating add and subtract, a pair of minimum and maximum instructions, and 64-bit left and right shifts. Most of the new instructions are encoded in 32 bits, eroding some of the V800's historical advantage in code density, at least for applications that use them.

The V830 also has new 64-bit multiply and multiply-accumulate instructions that can be used for pseudo-FP operations. For example, the multiply-truncate-and-accumulate instruction calculates a 64-bit product but drops the lower half of the result, accumulating only the upper 32 bits. This operation is useful for accumulating several fixed-point numbers that have all been normalized beforehand.

Register conventions are the same as those for the V850 family, with the exception of register r30. The V830 uses r30 to hold the upper 32 bits during multiplication and the remainder during division; the V850, which supports only 16-bit arithmetic, uses r30 as an address pointer.

Chips Include Both RAM and Cache

The V830 and V831 are the first V800 chips with cache and are among the few 32-bit microprocessors to include both cache and RAM. In fact, the parts have four separate banks of memory: a 4K instruction cache, a 4K data cache, a 4K instruction RAM, and a 4K data RAM, as Figure 1 shows.

NEC F1 2J
Figure 1. NEC's V830 and V831 share the same CPU core, including dual caches and on-chip SRAM. The V831 adds a memory controller, DMA, serial port, and other peripherals.

Both caches are direct mapped; the data cache's write-through update policy causes more bus traffic than a write-back cache might, but it facilitates memory coherence in a multimaster system. For the closed embedded systems typical of the V800's historical customer base, a write-back cache might have been preferable. A four-word write buffer alleviates some of the bus stalls the write-through cache would cause.

The two 4K sections of on-chip memory are permanently mapped into the V830's address space. As their descriptions imply, the chip can't fetch instructions from its data RAM, nor can it read or write data from its instruction RAM. The contents of these RAM banks are managed with two pairs of special load and store instructions. Accesses to on-chip RAM are fast, completing in a single cycle, and bypass the caches.

As the memory map in Figure 2 shows, the V830 carves its 4G address space into cachable and uncachable areas, with four optional hardwired chip selects appearing twice, once in each space. (Like x86 chips, the V830 separates its I/O space from its memory space.) Although the V830's internal instruction and data RAMs are both mapped into cachable address space, their contents are never cached.

NEC F2 2J
Figure 2. The V830 maps the 4G memory space into cachable and uncachable regions, with four optional chip selects and two built-in RAM areas.

The instruction RAM always holds the chip's exception vector table. The minimum vector table is 256 bytes long, leaving up to 3,840 bytes of instruction RAM for code. The instruction RAM is located just below the upper end of the address map, where the boot ROM is addressed. The V830's 26-bit maximum jump offset makes this code space easily reachable only from within 32M of the instruction RAM addresses (i.e., 0xFD000000 through 0xFF000000).

Bus Designed for Easy Integration

NEC wants the V830 family to make it big in high-volume media-processing systems such as cable-television decoders and WebTV-like devices. In these markets, easy integration is an important characteristic. The hardware is just a platform for delivering content, and as long as the CPU can handle the underlying processing tasks, it should be as unobtrusive as possible. To this end, the V830 has a flexible bus interface with a number of cost-saving options.

For its external interface, the part has demultiplexed 32-bit address and data buses. The bus interface runs at either one-half (50 MHz) or one-third (33 MHz) of the internal processor frequency, at the user's option. Also selectable at power-up are a 16-bit bus mode and an option to enable the four chip-select outputs. When enabled, these pins take the place of high-order address bits A28-A31. Using the chip selects eliminates the need for even simple decoding logic.

Unfortunately, the V830 can't do dynamic bus sizing; when 16-bit bus mode is enabled at power-up, its effects are permanent. In 16-bit mode, the chip's high-order data lines become useless appendages. This mode might be used in a lower-cost system design.

Regardless of bus width, the minimum bus transaction takes two cycles; both single-access and burst cycles are supported. Maximum bandwidth ranges from 66-100 Mbytes/s using single transactions to upwards of 133 Mbytes/s using bursts over a 50-MHz bus. Although not impressive by workstation standards, the V830's bus bandwidth is sufficient for feeding the caches and exceeds that of most comparably priced parts except for Digital's SA-110.

The bus is simple and straightforward to interface with, enabling inexpensive system designs. The chip drives four byte-enable outputs corresponding to the size of the operand transfer; these are easier to decode than low-order address and size identifiers. In 32-bit mode, address pins A0 and A1 are commandeered for byte-enable lines; without the two LSBs, unaligned byte transfers become a practical impossibility. The V830's support for unaligned data transfers or branches consists of ignoring the LSBs of the offending address pointer.

Burst transactions are used for filling the V830's on-chip instruction and data RAM and for refilling cache lines. External memory need not support bursts; the burst request can be ignored. Nonbursting memory and all I/O should be addressed in the noncachable region of the memory map.

Exception Processing Average

NEC makes hay of the V830 family's interrupt-processing capability, which is actually pretty mundane. Interrupts are signaled by asserting the INT pin; the chip jumps to one of 15 interrupt handlers based on the level of the four INTV0-3 pins. The V831 handles this automatically; the V830 requires external logic to prioritize and mask interrupts. This method is similar to the 68000's (albeit with more levels), prioritizing interrupts in hardware rather than requiring software to poll devices or read a status register.

As in most CPUs, the programmer is still responsible for saving machine state before exception processing and for restoring context before returning to the interrupted task. Despite the built-in hardware prioritization, the V831 doesn't even mask lower-priority interrupts during exception processing. All in all, these chips' interrupt-handling capabilities are between those of most RISC chips and those of an average CISC processor.

V831 Adds DRAM Controller, DMA, Other I/O

The V831 is a superset of the V830 with additional integrated logic. For an extra $5, the V831 adds a memory controller, a four-channel DMA controller, a UART, interrupt-prioritization logic, some timers, and debug logic.

The V831's memory controller drives the RAS, CAS, and WE signals for EDO DRAMs, or the appropriate equivalent signals for SRAMs, page-mode ROMs, or peripherals. The chip can be programmed with RAS-precharge and RAS-to-CAS delay times and can generate CAS-before-RAS refresh cycles at selectable frequencies. If used with page-mode ROMs, the V831 can alter its access-time requirements to accommodate page hits and misses.

The V830 is packaged in a 144-lead PQFP. To accommodate the additional I/O, the V831 comes in a larger, 160-lead PQFP package. Both parts run on 3.3 V; at speed, the V830 consumes 500 mW (typical), while the V831 draws about 550 mW. Sleep mode, which halts processing but keeps the PLL alive, drops power consumption to about 33 mW; for the truly power-conservative, stop mode reduces power to less than 0.2 mW. An NMI or reset can bring the chip back from stop mode--after two million clock cycles (20 ms).

The V830 die, pictured in Figure 3, measures just under 34 mm2 in NEC's 0.35-micron three-layer-metal process. The MDR Cost Model projects an estimated manufacturing cost of $10 for this device, placing it between National's 486SXL ($8) and Digital's SA-110 ($12).

V830 die photo
Figure 3. The V830 measures 5.63 * 5.97 mm in NEC's 0.35-micron CMOS process. The chip contains a total of 1.1 million transistors.

The V830's performance lies somewhere between those two competitors. NEC claims 118 Dhrystone MIPS at 100 MHz, roughly halfway between the 486's 12 MIPS and the 200+ rating for StrongArm. When both parts are running at 100 MHz, however, the V830 and the SA-110 deliver equivalent integer performance for about the same price.

Power consumption is the big differentiator in this case: 500 mW (typical) for the V830 vs. 110 mW (typical) for the SA-110. Of course, the Digital chip requires a separate 1.65-V supply to achieve its low wattage. Since NEC is not targeting portable devices, such power differences are probably not significant, so long as the V830 doesn't require a cooling fan.

V830 the Ace Up NEC's Sleeve

The V830 serves the same role the V800 always has in NEC's product line: a low-cost, royalty-free processor family for embedded applications where software compatibility and development tools are not primary concerns. As such, these chips are a good complement to NEC's MIPS line.

At 100+ Dhrystone MIPS, the V830's performance overlaps that of NEC's R4100 family but can't touch the faster R4300s. The V830 chips have an advantage in code density that the rigid, 32-bit instruction set of the MIPS chips can't match. On the other hand, MIPS processors have a clearer upgrade path, a better-developed tool chain, and some measure of vendor independence for both hardware and software.

With substantially the same pipeline structure, it's not clear why V850 chips are limited to 33 MHz while the V830 parts run at 100 MHz. Certainly the move from 0.5-micron to 0.35-micron manufacturing helps, but 3* speed increases don't come from die shrinks alone. Despite initial promises to the contrary (see MPR 10/3/94, p. 16), the V850 family has never gone faster than 33 MHz.

It's also odd that the V830 and V831 would be offered at only one speed grade. The usual CMOS yield curve should produce a useful number of parts that are either faster or slower than the mean. From NEC's point of view, however, slower parts might compete with the V850 chips, while faster ones would infringe MIPS territory. V800 speeds appear to be marketing driven and not technically motivated.

For customers deciding which V800 processor to pick, NEC keeps the division between the V850 and V830 pretty clear. As Table 1 shows, the former has less arithmetic precision, no cache, on-chip ROM, much lower clock rates, and usually a bit of integrated I/O. The V830 is more computer-like, with better media-processing capabilities (multiply, MAC, shift, etc.), faster clock speeds, and fewer peripherals.

NEC T1 2J
Table 1. NEC's V800 family now includes three generations, based on the V810, V851, and V830. The amount of on-chip integration has diminished as caches and media-processing features have been added. *based on Dhrystone 2.1. Ýestimates based on MDR Cost Model.

Finally, for the truly high volume customer, NEC offers both the V830 and the V831, almost intact, as ASIC cores. The company provides nearly the complete chip--CPU core, caches, RAM, and peripherals--as an ASIC macrocell, ready for customer designs. This unusual strategy is intended to help ASIC customers get their integrated designs up and running quickly without the headaches of debugging basic peripherals. NEC sees this "macro-core" as its competitive advantage over MIPS or ARM cores, for example, which have no standard peripherals.

V830 Outshines Others in Consumer Systems

Looking outside NEC's product portfolio, we see the V830's most obvious benefit is its price/performance ratio. Few chips run at 100 MHz and sell for only $25 in quantity. Hitachi's SH7708 comes close, at $20 for a 66-MHz version. The SA-110 is also in the neighborhood, at $29 for 100 MHz or $45 for 200 MHz. IBM's 401GF is on the right curve, but lower down, at only $13 for the 50-MHz chip.

The V830's mixed 16/32-bit instruction set offers better code density than any of these except the SuperH part, which sticks with 16-bit instructions only. All but the PowerPC have 32-bit multiply-accumulate capability; only the V830 has saturating arithmetic and min/max instructions. Table 2 compares the V830 and V831 with a few of their better-known competitors.

NEC T2 1J
Table 2. The V830 and V831 compete with a number of moderately priced chips for consumer-electronics design wins. With its 100-MHz clock rate and integrated logic, the V830 is very competitively priced. *based on Dhrystone 2.1 (Source: vendors)

Within the consumer-electronics space, NEC has no shortage of technical competitors. Many vendors are extending their instruction sets with media- or signal-processing DSP extensions. A few, such as IBM, have added moderate amounts of on-chip logic to ease integration, lower chip count, and reduce overall system cost. But none offers the same combination of performance, integration, and price as the V830, and this is NEC's strength.

NEC has been willing to retool its V800 family as often as it takes to create an economical solution for the year's hot applications. The original V810, the updated V850, and now the V830, all have similar but incompatible instruction sets designed for control, consumer products, and media processing, respectively. The older V805, V810, and V820 are now fading away; except for a few existing customers, NEC does not recommend these chips for new designs, preferring the newer V850 or V830 series.

The V830 family is unlikely to take the microprocessor world by storm and suddenly displace the 68K, SuperH, x86, or other general-purpose architectures in mainstream applications or broad market appeal. That field is already crowded, and without some compelling technical advantage and broad industry support a new architecture has little hope of rising above the general fray. But that is not NEC's goal. The V830 is aimed at a fairly narrow market, and one that NEC understands well. Rather than trying to be all things, the V830 family is a tactical weapon in NEC's arsenal, and one that stands a good chance of making a dent in the consumer-electronics market.

Price & Availability

NEC's V830 (order number µPD705100) is available in production volume at 100 MHz. In 10,000-unit quantities, the part is priced at $24.92.

The V831 is sampling now, with production scheduled for 3Q97. The chip is priced at $29.95 in 10,000-unit quantities.

For more information contact NEC (Santa Clara, Calif.) at 800.366.9782 or visit www.nec.com.