Out in Front: November 9, 1995
MicroDesign Resources held its Eighth Annual Microprocessor Forum on Oct 10 and 11, 1995, in San Jose, CA. The forum comprised four sessions on x86 µPs, multimedia processors, embedded processors, and high-performance RISC µPs. Before the sessions started, however, Dr Craig Barrett, Intels executive vice president and chief operating officer, took a look at the semiconductor industrys next 15 years.
Barrett started by reviewing an article Intel published in the October 1989 issue of IEEE Spectrum. That article predicted that a µP fabricated in 2000 would comprise 50 million transistors, run at 250 MHz, execute 2000 MIPS, and be compatible with existingpresumably x86software. The predictions were mostly on target from todays perspective. Barrett showed that the transistor count on both DRAMs and µPs has grown at a nearly constant, exponential rate since the original 4004 µP appeared in 1970. At current growth rates, with µP transistor counts already nearing 10 million, 50 million appears to be a realistic number. Similarly, the MIPS-rating trend line has grown exponentially and 2000 MIPS in 2000 seems realistic.
Where the 1989 prediction fell short was in predicting the clock rate. Digital Equipment Corps Alpha µP already exceeds the 250-MHz prediction, five years before schedule. Barrett predicted that the semiconductor road map would lead to 1-GHz clock rates by 2000 and would reach 4 GHz by 2010. According to Barretts crystal ball, a µP built in 2010 could comprise 1 billion transistors, run at 4 GHz, and execute 100,000 MIPS.
The forums first session indirectly suggested that processors circa 2000 probably will still be running x86 code. Designers have become adept at decomposing x86 instructions into streams of RISC-like subinstructions. Intel calls these subinstructions in its Pentium mops, or micro-ops. Advanced Micro Devices K5 designers call them ROPs, and NexGens Nx686 developers call them RISC86 instructions.
Whatever name the instructions go by, their decoders convert the instructions into streams of RISC instructions, which then enter increasingly complex instruction queues, dispatch machines, and multiple execution units. The net result is the conversion of conventional x86 code into RISC instruction streams for processing tricks, such as speculative execution and out-of-order instruction completion.
The forums second session jumped from the world of legacy software to multimedia. John Moussouris, chairman and chief executive officer of Microunity, started by discussing his companys MediaProcessor, a µP for processing the oddly sized packets that comprise various sound and video byte streams. Moussouris key point was that processors designed for multimedia byte streams must handle more data types than just the integers and floating-point numbers that general-purpose processors handle well. Consequently, the MediaProcessor handles 1-, 2-, 4-, 8-, 16-, 32-, 64-, and 128-bit operands with parallel-execution units that can simultaneously operate on as many as four 128-bit operands or mixtures of operands that total 128 bits.
Very-long-instruction-word (VLIW) machines appear to have the lead in advanced media processing. Presenters from Chromatic, IBM, and Philips Semiconductors discussed designs for multimedia processing. The Chromatic Mpact µP works as a coprocessor for existing general-purpose µPs. This device handles media-processing requirements in five asymmetric execution units. Four of the execution units consume 50 ALUs, but the motion-estimation unit for MPEG compression and decompression requires 400 more.
Philips Semiconductors TM-1 Trimedia processor similarly uses 27 functional units and a five-slot instruction-issue register to handle multiple simultaneous bit streams. It has input and output ports for video and audio bit streams. DMA controllers move the bit streams from these ports to and from the device.
Chromatics Mpact is available for sampling, and Philips Semiconductors TM-1 will be available for sampling in 1996. IBMs MFAST folded-array transform processor is only a disclosed technology for now, however. MFAST architect Dr Gerald Pechanek used some tricky mathematics to fold a 2-D multiprocessor mesh along the diagonal. The result is a mathematically identical processing mesh that is easier to fabricate because the folded mesh halves the number of interconnections the unfolded mesh needs. The VLIW instructions operate the processor mesh as a single-instruction, multiple-data or multiple-instruction, multiple-data machine.
Two of the multimedia processor presentations covered multimedia extensions to general-purpose mP architectures. Sparc Technology Business VIS (visual-instruction set) is a 30-instruction extension of the 64-bit UltraSPARC RISC µPs instruction set. These instructions accelerate image processing, 2- and 3-D graphics, and video compression and decompression, making hardware accelerators unnecessary for many applications. Several of the VIS instructions operate on smaller data types (4-, 16-, and 32-bit) in a manner similar to that of Microunitys MediaProcessor.
VIS-enhanced MicroSPARC µPs are in developers hands. Picker International, a vendor of computerized-axial-tomography and magnetic-resonance-imaging medical equipment, has realized a four-times speed improvement in image processing using the VIS development tools. H&P Eurosoft, a European software vendor, has demonstrated a 25-frames/sec, software-only MPEG-2 audio and video decoder based on the VIS development tools. MicroDesign Resources President Michael Slater intimated that Intels yet-to-be-introduced Pentium P55C would also incorporate multimedia instruction-set extensions.
Cyrix is also adding architectural enhancements for multimedia processing; however, the company is focusing on creating a multimedia x86 architecture instead of RISC. The 5GX86 processor starts with the companys 586 processor core and adds a DRAM interface, a graphics accelerator, a fast integer multiplier and multiply/accumulator, and virtual-system trapping to provide the multimedia performance boost.
The integrated DRAM controller is important in a low-cost system because it eliminates the need for level-2 cache and separate video memory. The graphics accelerator compresses the video frame-buffer data by as much as 12:1, so that display operations consume much less of the main memorys bandwidth in a unified memory architecture.
The 5GX86s integration of multimedia features was an indicator of things to come during the embedded-processor session. Intel and Motorola discussed the 80969RP and MPC860 embedded processors, respectively. Both carry features for specific embedded applications.
Intels 80960RP intelligent I/O processor surrounds the companys existing 80960JF core with a two-port PCI bridge and a sophisticated messaging unit designed to reduce the communications overhead between the host CPU and the I/O processor. The 80960RP has an integrated memory controller that can operate DRAM, ROM, and flash EPROM. Motorolas MPC860 is based on the embedded PowerPC µP core developed for the MPC505 µP. The MPC860 also carries specialized serial communications hardwarea DSPand has an integrated, programmable memory controller that adapts to almost any type of DRAM.
Three of the papers in the embedded-processor session discussed new RISC core processors for ASIC development. Digital Semiconductor revealed some details of the StrongARM processor core developed in cooperation with Advanced RISC Machines. The StrongARM core runs on 2V at 215 MHz, a speed that Chief Processor Architect Rich Witek calls relaxing compared with the current clock rates of 333 MHz for the companys Alpha µP. At 1.5V, the StrongARM core throttles down to a leisurely 160 MHz, consuming 120 mW at that speed.
LSI Logics CW4020 MiniRISC superscalar core is based on the MIPS R4000 µP architecture. The core implements the full MIPS-II instruction set and adds some instructions for embedded applications including bit-shift, multiply/accumulate, and wait for interrupt (for power management). The CW4020 core employs a two-instruction dispatch unit and executes an average of 1.3 instructions/clock.
IBMs PowerPC 401 core fits into 5.5 mm2 of space, yet it can still execute all PowerPC application code. Extensions for embedded applications include hardware support for unaligned data accesses and big- and little-endian support. The processor runs on 2.4V, consuming 26 mW at that supply voltage.
These RISC core processor discussions led last session on high-performance RISC processors. Hewlett-Packard introduced the PA-7300LC, a highly integrated version of the companys Precision Architecture RISC processors. With 9.2 million transistors, this processor will appear in HP workstations in 1996 and will double the performance of the low-end machines.
by Steven H Leibson
MicroDesign Resources, Sebastopol, CA. (707) 824-4004.