Discover more from The Chip Letter
The First RISC: John Cocke and the IBM 801
How a maverick genius at IBM helped to change the course of computing forever
This is quite a long post but covers an important series of events in the history of computing. It also includes secret meetings at a London hotel and a corporate maverick called a genius by his colleagues. I hope you enjoy.
A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.
Antoine de Saint-Exupery
RISC-V is built on a number of ideas. That A single ISA can be used in machines with widely differing capabilities. That an ISA can be shared between many different companies. The idea of an ‘Open Source’ ISA. And finally the concept of the Reduced Instruction Set Computer (RISC) itself.
The RISC idea has now been around for almost fifty years and it’s been the subject of debate and a degree of controversy for most of that period. We can eavesdrop on the debate in 1986 via an episode of The Computer Chronicles. David Patterson, one of the developers of RISC-V, and already a professor at Berkeley, is quizzed by Gary Kildall of CP/M fame.
(As an aside, the idea that instruction set architectures could be the subject of informed debate on a TV show is just a little jaw dropping.)
Joel Birnbaum, then of HP, explains RISC to the audience.
… the notion is that computers spend most of their time … doing a few simple things and don’t do the complex things very often. So the notion is that if you do the simplest things as well as possible, and the complex things as infrequently as possible, then you might come out with a machine … that is more effective than those that pay the penalty of complexity across all the things that it does each of the times that it does it.
There is a degree of skepticism though from the panelists Jan Lewis and George Morrow:
The commercial applications have yet to prove that RISC is actually going to give us true performance increases.
On the negative side, we just seem to keep making new instruction sets. It’s like inventing a new typeface every time you want to say something.
The machines that are out there do need extensions… We’ve sort of evolved from RISC in the purest sense to RISC like architectures.
There is also degree of confusion over what RISC actually means and whether RISC machines really are RISC in practice. For example, Jan Lewis comments that:
The concept is that you have very few instructions, I say fifty or less, George says 8 or less, each one using very few clock cycles … in fact the machines that are out there do need other things.
Some of the debate seems familiar even today when the effectiveness of RISC and what it really means continues to be the subject of debate.
But where did the RISC idea come from? Joel Birnbaum provides the link as he’d previously managed the project to develop the first RISC machine. This machine would not only pioneer RISC but also some of the key ideas behind the development of both hardware and software in the years that followed. And that machine would come from an unexpected source - the maker of the most complex machines in the universe.
In the early 1970s IBM’s mainframes dominated the market for more powerful computers. IBM had invested $5 billion (more than twice its annual turnover at the time) in the development of System/360. 1970 saw the introduction of its largely compatible successor the System/370.
The S/370 had lots of instructions and these were often complex. Some instructions (such as the ‘Move Character’ instruction) would read multiple items of data from one portion of memory, manipulate it in some way and then write that data to another part of memory.
This complexity was due to the need to avoid slow memory accesses (to fetch instructions) slowing down program execution. A single instruction specifying a lot of activity would save the work needed to fetch multiple instructions.
Customers could choose from a range of S/360 or S/370 systems with varying price and performance characteristics. Cheaper machines were made possible, in part, by more extensive use of ‘microcode’ where instructions were broken down into smaller ‘microinstructions’. Decoding and executing these microinstructions took time and so slowed program execution. A typical S/370 instruction might be executed using 20 or 30 microinstructions. More expensive machines had more of the instruction set ‘hard coded’ into the circuity and so were faster.
This approach worked well for IBM’s business. Customers with lower budgets could buy cheaper a machine and always had the option to upgrade to a more expensive, but still compatible, machine.
In 1971 IBM was approached by the Ericsson telephone company with a proposal for a joint venture to build telephony systems that would compete with IBM’s great technology rival AT&T.
IBM’s engineers realised that an architecture like that of the S/370 wouldn't be a good match for telephony. The performance of a microcoded system wouldn't be good enough and the cost of a ‘hard-wired’ machine would be too great.
So IBM’s engineers started to look at alternatives that would be a better fit for the telephony system. After three months of intensive work teams from the two companies met at Claridge’s Hotel in London. In a late night meeting IBM’s team presented their ideas. The go ahead was given. But the following morning Ericsson had second thoughts. If IBM’s ideas worked that spelt trouble for Ericsson’s core business. The joint venture was off.
But did IBM need Ericsson? The team felt that IBM could go it alone and take on AT&T single handedly. So work started to develop a telephony system.
The results of the project were again rejected, this time by IBM’s senior management.
Work Starts On The 801
But with the ideas so advanced and IBM’s engineers convinced of the benefits IBM Research started a project to turn the ideas into reality.
Working out of building 801 at the Thomas J. Watson Research Centre in Yorktown a team of about twenty engineers (initially managed by Joel Birnbaum as seen in the Computer Chronicles video) started work on turning the ideas into reality. The machine had previously been known as ‘the telephone machine’ but this started to seem inappropriate so the team settled on the name of the building where they were working.
The leading figure in the group was IBM Fellow John Cocke. Cocke was a veteran of the Korean War and had huge experience in developing computer systems. He’d been deeply involved in the development of IBM’s Stretch (the mainframe series before the S/360). He was also an expert on compilers and wrote a seminal textbook on compiler construction.
Quoting Joel Birnbaum:
John Cocke was the genius behind the machine. The smartest man I’ve ever known. The most creative man I’ve ever known … Really this was John’s project.
Cocke was a heavy smoker and, in the days when you could smoke in offices, would leave a trail of both ideas and ashes behind him:
He would sort of go from one room to another, and from one lab to another, you know, spreading ideas and trailing cigarette ashes as he went along.
Cocke’s involvement in earlier projects, and his observation of the S/360 development project had started to sow the seeds of doubt in his mind about the approach taken in the design of the S/360.
The project’s objective was to create a system that was significantly better than IBM’s uninspiring System/3 minicomputer, which was struggling against rival Digital Equipment Corporation’s commercially successful VAX series.
The team had the benefit of having extensive details of the pattern of execution of programs on the S/360 (known as ‘instruction traces’). This showed which instructions were being executed most often and that a large proportion of those instructions were in fact very simple.
To quote one of the team:
It came as a surprise … that load, store, branch and a few simple register operations completely dominated the mix of instructions.
Furthermore, analysis of the more complex instructions, such as ‘Move Character’, showed that for typical instances of use of the instruction the amount of data being manipulated was so small that using a small number of simpler instructions would have been faster.
So armed with this data on the S/360 the team were able to start designing the new system.
Building The RISC Philosophy
To make their new design work needed a number of features that would soon become part of the RISC philosophy. Not all of these ideas were new but this was the first time they were brought together.
Taking an idea from the Control Data 6600 (the first ‘Supercomputer’ designed by Seymour Cray) the new machine would have a ‘load-store’ architecture where simple ‘load’ or ‘store’ instructions would be the only way of accessing or changing memory. Gone were the ‘memory to memory’ instructions of the S/360.
This in turn helped the machine to implement a simple form of pipelining, where one instruction can be executing at the same time that the next instruction is being loaded from memory.
Then there was the introduction of an instruction cache, distinct from a data cache, which helped to remove the penalty incurred in other systems as a result of delays in accessing memory to read instructions. The team reasoned that if the S/360 had fast access to memory holding the microcode then they could provide fast access to 801 instructions via an instruction cache.
But the the development of the 801 wasn't just about the hardware. The success of the 801 would depend on having compilers that would both avoid the need for most assembly language programming and optimise the generated code to make the most of the new hardware.
So the team also built a compiler for a new language called PL 0.8. PL The name was chosen because it was initially a large subset of IBM’s PL/1 language - PL/1 was considered to be too ‘rich’ for reimplementation for an experimental project like this. In due course the decimal point was removed and it became known as PL/8.
Key to making good use of the 801’s hardware was efficient use of the machine’s registers. Register allocation used an approach called ‘graph colouring’.
The 801 Architecture and Performance
So what did the 801 look like?
The 801 was a minicomputer rather than a microprocessor. It was built using ‘off the shelf’ logic chips manufactured by Motorola mounted onto a number of circuit boards. The boards were arranged in a semi-circle with connections at the centre, reducing the distance between components. In the picture of John Cocke above, he can be seen leaning on some of the 801’s boards pointing outwards from the centre of the machine. The first version had 7600 logic gates
There were actually two versions of the 801 architecture. Let’s initially look at the first version of the 801.
As planned the instructions were kept simple. There were no memory to memory instructions or instructions that combined loading or storing data from memory with another operation. All these simple instructions could be implemented without having to resort to microcode.
The original 801 had sixteen 24-bit registers, plus three special purpose registers including the program counter and the condition codes register.
Instructions were either sixteen or 32 bits long. Longer instructions allowed for the inclusion of a sixteen bit constant that could be added
The simplicity of the instruction set meant that a simple pipelined operation could be implemented. #Add
Consistent with register length memory addressed were 24 bits long, allowing the machine to access up to 16 megabytes (or 16777216 bytes) of memory.
Branches posed a problem for pipelining. So the 801 introduced ‘execute branch’ instructions (these would later become known as ‘delayed branches’). These allowed for a companion instruction to the branch instruction which would execute in parallel with the branch instruction whether or not the branch was taken.
Cache invalidation posed a potential problems which the team solved by adding instructions that voided cache lines:
At that time it was widely accepted that a running program would not modify itself at execution time. Therefore, no mechanisms were added to ensure that stores into the instruction stream were immediately reflected in the instruction cache. Instead, the ability to void cache lines was added to the instruction set.
Most 801 instructions could execute, as planned, in a single clock cycle. On average the machine could execute an instruction every 1.1 or 1.2 clock cycles.
The Revised 801
The second version of the 801, updated based on the team’s experience and their perception of users likely needs, made a number of significant changes. So. much so that it almost seems odd to use the same name for the two machines.
Following work to investigate register allocation the revised version doubled the number of registers from 16 to 32. Registers were increased in size from 24 bits to 32 bits.
The second 801 had a uniform 32-bit instruction length, which simplified the instruction decode mechanism and meant that instructions could no longer straddle cache lines, again simplifying the design.
The 801 Delivers
As the team had predicted, the 801 offered significantly better performance than the more conventional and more complex machines. When running ‘real world’ tests and being compared against the IBM System/3 minicomputer, the experimental machine was found to be around three or four times faster.
Picking Up The RISC Baton
But once again IBM’s management passed on turning the 801 into a commercial project. The team seemed to sense that they were swimming against the tide at IBM. George Radin had commented in a paper earlier in the project:
In some sense the 801 appears to be rushing in the opposite direction to the conventional wisdom of this field. Namely, everyone else is busy moving software into hardware and we are clearly moving hardware into software. Rather than consuming the projected cheaper, faster hardware, we are engaged in an effort to save circuits, cut path lengths and reduce functions at every level of the normal system hierarchy.
The ideas were shelved and the team moved on to other things. As we will see work on the RISC philosophy did continue within IBM but it would be outside IBM that the concept would rise to prominence.
Neither John Cocke nor the IBM team coined the term ‘Reduced Instruction Set Computer’. In fact Joel Birnbaum and others would protest that the philosophy should actually be about a ‘Reduced Complexity Instruction Set Computer’. Whereas RISC could be read as implying that it was all about reducing the number of instructions, which might be a by-product, but wasn't the underlying rationale. But the acronym has stuck, leading to substantial confusion over the years.
The term RISC was invented by David Patterson at Berkeley. Along with John Hennessy at Stanford University, Patterson and his students would pick up the RISC concept and build the first microprocessors using RISC principles. And that’s where our series goes next.
Epilogue : John Cocke
There was a puzzling conundrum I faced when writing about the 801. John Cocke and his ideas were clearly at the heart of the project, but his fingerprints are hard to see. He didn’t write papers or books about it, there is no film of him presenting the ideas behind the machine.
That might be partly due to his character and role in the project. It may also be because he spent his whole career at IBM.
So it seemed appropriate to provide some evidence of Cocke’s talents and contribution. For that we can turn to a series of recollections about John Cocke by his former colleagues at IBM there is great video on YouTube. A small sample of the quotes:
I used to go around, sort of follow him around, taking notes and trying to capture his brilliant ideas as they were bubbling forth.
You can't predict what John will come up with. It will always be original, and it will often be brilliant.
Most of the modern concepts of compiler technology were concepts that John was responsible for seeding throughout universities and IBM.
Cocke was honoured extensively for his work. He won the ACM Turing Award, the National Medal of Technology and the National Medal of Science in 1994 amongst many others.
Thanks for reading The Chip Letter! Subscribe for free to receive new posts and support my work.
John Cocke with the IBM 801
Published under fair use as per:
Its hard to know precisely how many transistors this would be but I’d estimate between 30,000 and 45,000 which is similar to, for example the Intel 8086 or the first ARM microprocessor. It seems reasonably clear that this could have been implemented as a microprocessor using the technology of the late 1970s.