The Strangeness Of The Intel 4004
The start of a detective story to understand a quirky little chip
We’ve covered the history and some of the limitations of the 4004. We’re now going to delve into its design in more detail. It has some surprising features and omissions.
Warning for the unprepared: this post includes some 4004 assembly language!
Harvard(ish) Architecture
The 4004 has distinct address spaces for programs and data. The program address space was 4KB (Kilobytes) and normally populated with 256-byte 4001 ROM chips.
The maximum space for data was 1280 4-bit ‘nibbles’ (or 640 bytes) in total. This would normally be populated by several 4002 RAM chips each containing 80 nibbles.
The 4004 isn’t a standard Harvard architecture processor, however, given that a single bus carries addresses and data for both the program and data address spaces.
Program address space can also be occupied by RAM using an adapter chip. A special instruction (WPM for ‘Write Program Memory’) allows the 4004 to write to program RAM in this configuration.1
Lots of Registers and On-Chip Stack
The 4004 has a surprising number of user-accessible registers. A four-bit accumulator, sixteen four-bit index registers, three twelve-bit stack registers and the twelve-bit program counter. In total that’s 116 bits. Contrast this with, for example, the later 8-bit 6502 which made do with only 48.
The rationale behind this is clear though. The smallest possible 4004 system had just the 4004 itself and a single 256-byte 4001 ROM chip. In this configuration, the programmer would need to rely entirely on the 4004’s registers for temporary storage.
The same logic applies to the on-chip stack. In a system with no RAM, the stack has to be on the 4004. Lack of space though means that it is limited to only three entries.
One detail on the implementation of all these bits. To save space on the 4004 die, this memory is implemented as dynamic RAM. Presumably, the slow memory access meant that there were enough clock cycles to refresh the registers when they weren’t being used.
The sixteen index registers can be also grouped into eight 8-bit registers. These are used to specify memory addresses and there is no 8-bit arithmetic available on the 4004.
Random Access Memories
There are effectively two address spaces for RAM. Each 4002 RAM chip had 80 4-bit nibbles of storage, arranged into 64 nibbles of main memory, plus 16 nibbles of ‘status’ memory.
The MCS-4 Assembly Language Programming Manual portrays this as follows, with a set of sixteen nibbles plus four status nibbles labelled as a ‘DATA RAM REGISTER’. These ‘REGISTERS’ have no connection with the registers in the 4004.
The RAM chips were arranged in up to eight ‘banks’ of up to four chips, with a maximum of 320 nibbles per memory bank. To access main memory a ‘DCL’ (Designate Command Line) instruction first selects the RAM bank. An ‘SRC’ (Send Register Control) instruction then selects the nibble within the bank, telling the 4004 to use the 8 bits in a specified index register pair as the address. The next instruction then tells the 4004 what to do with that nibble, for example, whether to read or write from the accumulator.
So to read a single nibble at address 0x002 one might use the sequence.
D1 LDM 1 Load 1 into the accumulator
20 02 FIM 0,2 Load 2 into index register pair 0
FD DCL Select RAM Bank 1
21 SRC Select nibble 2 in RAM bank
E9 RDM Read memory into the accumulator
That’s six bytes to read one 4-bit memory in a 1024 nibble wide address space! If there is only one RAM bank (as in the Busicom calculator) then the LDM and DCL instructions can be omitted.
Different instructions are used to access the status ‘characters’. Again ‘DCL’ and ‘SRC’ instructions specify the RAM Bank and ‘DATA RAM Register’. The ‘RD0’ instruction then, for example. specifies that nibble number 0 is to be read into the accumulator.
This is all very odd and confusing. Others have found it puzzling too.
Missing Logic and Comparison Instructions
The 4004 instruction set omitted any logical operations. No “AND”, “OR” or “XOR”.
The assembly language manual does, helpfully, include subroutines for each of these operations. Here is the one for “AND”. This generates the logical “AND” of the two 4-bit values in index registers 0 and 1.
Phew! That’s 21 bytes to do a single 4-bit ‘AND’.
There are also no instructions to compare the contents of two registers.
In contrast to these omissions there are, however, some quite specialised and somewhat surprising instructions.
Branch Back and Load (BBL)
This combines placing the last address saved on the stack into the program counter (so a return from subroutine) with loading a 4-bit value (encoded in the instruction) into the accumulator.
Perhaps even more odd is the fact that there is no instruction that returns from a subroutine without placing a constant value into the accumulator.
Increment and Skip if Zero (ISZ)
This increments one of the index registers. If the result is non-zero then the second byte of the instruction replaces the last eight bits of the program counter. If the result is zero then the program continues execution at the next address.
This is designed to support looping. But … behaviour is different when the instruction is located at the end of a 256-byte block of memory. In this case, the first four bits of the program counter may also be increased by one.
The MCS-4 Assembly language manual has a stern warning: “… this is dangerous programming practice and should be avoided whenever possible.” Oh no! This seems to me to be likely to be bug a in the implementation of this instruction.
Jump Indirect (JIN)
This replaces the lowest eight bits of the program counter with the contents of a specified index register pair. It allows a jump to a program location that is calculated by the program.
This would usually be helpful (in saving space) if a program has lots of program branches. Its inclusion in a processor designed for very small systems seems surprising.
Decimal Adjust Accumulator (DAA)
The instruction at least makes immediate sense in the context of the Busicom calculator. The calculator stored numbers as Binary Coded Decimal or BCD. In BCD each decimal digit is stored in a four-bit nibble.
To support this the 4004 had a ‘DAA’ or ‘Decimal Adjust Accumulator’ instruction. This adds six to the accumulator if it is more than 9. The idea is to make the addition of two BCD digits easier.
For example, if we add 5 to 6 in BCD we want to get 1 with a ‘carry’ set. In the 4004 we add 5 and 6 to get 11 (decimal) in the accumulator and then perform DAA which gives us 1 and the carry flag is set. The program can then retain the 1 as the least significant digit and use the carry to set the next digit to 1.
Four Bits of Confusion
If you’re used to later 8-bit microprocessors, then the 4004 looks decidedly ‘quirky’. If I’m feeling less charitable I might say it’s weird. Doing some simple operations is complex, some common instructions are missing and the use of two RAM address spaces is very strange indeed.
It also doesn’t look likely to be the most efficient way to implement a simple 4-bit microprocessor.
In the next post we’re going to look for some more clues as to why the 4004 is like this. And the obvious place to look is the Busicom calculator source code.
Further Reading
The best sources to understand the 4004 are the original Intel manuals, and especially, the ‘MCS-4 Assembly Language Programming Manual’.
The shorter (9 page) data sheet is also good for an overview of the 4004.
If you want to try out the 4004 in your browser then there is an excellent emulator (and assembler) from Maciej Szyc at http://e4004.szyc.org/emu/ Have fun!
Image Credits
4004 Architecture
Appaloosa
CC BY-SA 3.0 http://creativecommons.org/licenses/by-sa/3.0/
via Wikimedia Commons
Other images from Intel manuals or the web included under ‘fair use’.
This makes the 4004 usable part of a truly programmable computer system justifying - in my view at least - it’s description as a microprocessor rather than a fixed function micro controller.