In the decades-long rivalry between Intel and AMD it’s easy to see AMD as the technical follower. First reverse engineering the Intel 8080. Then a second source for the Intel 8086. Later, developing its own implementation of Intel’s 80386.
But even in the early days of the two firms’ rivalry AMD’s technology sometimes outdid Intel’s. The AM9511 Arithmetic Processing Unit in 1977 is one example.
Eight-bit microprocessors had no hardware floating point support built in. There simply wasn’t room on the die for the complex circuits needed. Software applications that needed floating point had to use assembly language routines. With the limited registers available, calculations were slow and cumbersome. It was only in 1978 that the 6809 introduced a simple integer hardware multiply instruction.
So if you need fast floating point then why not have a dedicated chip do this for you?
AMD was first to the market in 1977 with the AM9511. The AM9511 worked as a ‘coprocessor’ alongside the main processor to provide a wide range of both integer and single precision floating point functions, including trigonometric functions.
The interface with the main processor was through an 8-bit data bus. Eight-bit long instructions were transmitted on the bus from the main processor to the coprocessor.
The AM9511 could perform calculations whilst the microprocessor continued with its own separate routines. An interrupt was issued to the main processor when the calculations were finished.
It was followed in 1978 by the AM9512 which added double precision floating point and compliance with the, then new, IEEE 754 standard but removed the trigonometric functions.
Although nominally designed to work with the Intel 8080 or equivalents, it was possible to get the chips working with other microprocessors too. The AM9511/2 could be used with the Z80, 6800, 6809, and 6502, representing the large majority of 8-bit processor sales. It was even possible to use them with the next generation of 16-bit processors such as the 8086 and Z8000.
The AM9511/2 were so successful that Intel abandoned its own efforts to develop math coprocessors and instead second sourced AMD’s designs as the Intel 8231/2.
The internal organisation of the AM9511/2 is interesting. Numerical values were stored on a stack that was large enough to contain eight 2-byte integers or four 4-byte floating point numbers. An ‘Add’ instructions would, for example, add the top two numbers on the stack and replace them with the result.
AMD helpfully provided a block diagram of the internal arrangement of the AM9512.
The chip has a 17-bit internal data bus and ten 17-bit registers. The operations of the chip are controlled by a program in a ROM which is 768 16-bit instructions wide.
The 17 bit data width looks a little odd, but maybe because of a need for 16 bit accuracy plus a sign bit. The 32 or 64 -bit integer arithmetic required for many operations would have been implemented in software.
One can perhaps think of the AM9511/2 as a dedicated (roughly) 16-bit processor. It’s able to run the calculations much faster than an 8-bit microprocessor because of its 16 bit arithmetic and because, with all operations on the die, it was not constrained by the need to fetch and store data over an 8-bit bus.
A die shot of the AM9511 (from cpu-world) shows a large rectangular area bottom right that is likely to be the program ROM occupying roughly 20% of the die area. The numeric constant ROM is likely to be top right with working registers and the data stack to the right and below this area.
There is quite a lot of functionality packed into the AM9512. The working registers and data stack alone would need over 1,800 transistors and there are over 14,000 ROM bits. This compares, for example, with an 8080 which used only 4,500 transistors.
As a result, the AM9511 wasn’t cheap. The Intel second sourced version sold for $149 each in quantities of 100 or more, considerably more than one of the 8-bit microprocessors it would have accompanied.
So how fast was the AM9511/2? Variants of the AM9511 operated between 2 and 4 MHz. AMD helpfully supplied some data comparing performance with a software math library running on an 8080. For floating point addition or subtraction, the AM9511 was roughly between 10 and 20 times faster than the software version. For multiplication and division, the advantage widened somewhat. The IEEE compliant AM9512 was slower than the AM9511 meaning that the latter chip remained an attractive option for some applications.
A single 32-bit floating point multiplication in the AM9511 took around 160 clock cycles. So a 2MHz chip could do around 12,500 of these per second.
In addition to price there were other obstacles to getting a chip like the AM9511/2 widely adopted. The computer motherboard (or a daughterboard) needed to have space for it. So it was hard to add to many existing systems. Without widespread adoption, there was little reason for software developers to support it, so the designs never really became popular.
It’s perhaps a bit surprising that AMD wasn’t able to make more of this early innovation. They had the first math coprocessor, the first IEEE 754 compliant hardware and the first double precision floating point hardware.
But by 1978 the 8086 had already been launched and in 1980 Intel announced the 8087 coprocessor. The 8086/7 took a different approach with dedicated floating point registers and instructions that slotted into the 8086 instruction stream. With an 8087 socket on the original IBM PC motherboard, the 8087 soon became much more popular. With the 80486 in 1989, floating point was eventually incorporated into the microprocessor die itself.
Today, fast floating point remains a key capability, underpinning the development of machine learning and much else. AMD’s graphics processing units can perform thousands of billions of floating point operations per second. Floating point hardware has come a long way since 1977 but AMD was there at the beginning.
Image Credits
By Wosch21149 - Own work
CC BY 3.0