RISC-V - Part 1: Origins and Architecture
The revolutionary instruction set with roots in the first Berkeley RISC design
What happens when you combine two of the most influential technology movements of the last few decades, RISC and Open Standards?
You get RISC-V.
RISC-V (pronounced risk-five) is a relatively new Instruction Set Architecture. It’s supporters make ambitious claims for the architecture. From the RISC-V International Website:
RISC-V is an open standard Instruction Set Architecture (ISA) enabling a new era of processor innovation through open collaboration
RISC-V enables the community to share technical investment, contribute to the strategic future, create more rapidly, enjoy unprecedented design freedom, and substantially reduce the cost of innovation.
By any standards, RISC-V is starting to make an impact. RISC-V International also says that as a the end of 2022, over 10 billion RISC-V cores had been shipped. RISC-V International corporate members include Intel, Google, Qualcomm, AMD, Nvidia and hundreds more.
So why is RISC-V attracting so much attention?
In this post, we’ll look at the early development of RISC-V and the architecture itself. In ‘RISC-V - Part 2’, we’ll look at the architecture’s later development and ambitions of its backers.
It’s only possible to give a brief of overview of RISC-V in a couple of posts. There will be some areas where the description given here is somewhat simplified. So this week’s supplementary post, for paid subscribers, has links that provide lots more in-depth information on the architecture.
If you value The Chip Letter, then please consider becoming a paid subscriber. You’ll get additional weekly content, learn more, and help keep this newsletter going!
Pre-History
Let’s start with a very high-level overview of the precursors to RISC-V.
We’ve already seen how the concept of Reduced Instruction Set Computers originated at IBM with the work of John Cocke on the IBM 801. Although the 801 never became an IBM product, word of the ideas behind the machine soon spread, including to a group at the University of California Berkeley led by David Patterson, who invented the term ‘Reduced Instruction Set Computer’ or ‘RISC’
The 1980 paper, “The Case for a Reduced Instruction Set Computer”, by Patterson and David Ditzel, argued the case for RISC machines, setting out their key advantages, including better performance than their more complex competitors.
Alongside the RISC paper, the Berkeley team also developed their own RISC microprocessor, which became known as the Berkeley RISC-I. The RISC-I design had just 31 instructions but a large set of registers, with 78 32-bit registers arranged into six ‘register windows’.
As work on RISC-I progressed, a second RISC design, ‘RISC-II’ also known as ‘Blue’, was developed at Berkeley.
Meanwhile, spurred on in large part by the work at Berkeley, many other RISC designs started to emerge. At Stanford, a separate research project under John Hennessy led to the development of the MIPS architecture. Many large computer companies started to develop their own RISC architectures including PA-RISC from HP, SPARC from Sun (influenced by the RISC-II designs), PC/RT from IBM and, across the Atlantic in Cambridge in the UK, ARM from Acorn Computers.
At UC Berkeley, two further RISC projects, SOAR and SPUR, followed RISC-II in the late 1980s.
Meanwhile, even more new RISC architectures emerged during the late 1980s and early 1990s as firms looked to take advantage of RISC techniques.
One by one, though, each of these architectures later fell by the wayside. Today, only Arm remains as a viable business.
For lots more on the early days of RISC see RISC on a Chip: David Patterson and Berkeley RISC-I and The RISC Wars Part 1 and Part 2.
RISC-V
The story of RISC-V takes us back to the UC, Berkeley in February 2005.
A group of researchers, including David Patterson and Krste Asanović, started meeting to discuss issues around parallelism in computing system. This in turn led to the formation of a ‘Parallel Computing Laboratory’ also known as ‘Par Lab’ at Berkeley.
This in turn led to the formation of a research program to investigate the issues around building systems with large numbers (from 64 to over a thousand) of processor cores. The program became known as ‘Research Accelerator for Multiple Processors’ or RAMP.
Over the course of the next few years, they built a series of color-coded systems: RAMP Red, RAMP Blue, RAMP White and RAMP Gold. These systems used a range of different processor core including PowerPC 405, Xilinx MicroBlaze cores running on Field Programmable Gate Arrays, or Sun’s SPARC.
At around the same time Intel and Microsoft, who seeing the increasing importance of multicore and parallel computing had run a competition for university projects to do research in the area. The competition and $10m of funding was jointly won by Berkeley’s Par Lab. A photo from 2008 shows Pat Gelsinger, then and now again of Intel, cutting a large red ribbon to open the new, partly Intel funded, offices of Par Lab.
By 2008, members of the RAMP team were getting frustrated with the cores they were using, commenting that “Most pre-existing IP cores are inappropriate without significant work.”
As RAMP was coming to an end in 2010, Asanović, together with his graduate students Yunsup Lee and Andrew Waterman who had been working together on RAMP Gold, needed a processor core for use in what was intended to be a ‘three month’ research project.
Their first thoughts were to use the Arm ISA for the new processor. Born in the UK, Asanović had owned a 6502-based Atom home computer from Arm developer Acorn. He’d then studied in Cambridge at the time that the Arm processor was being developed. He’d left the UK in 1989 to study under David Patterson at Berkeley, going on to gain a PhD at Berkeley in 1998, with a thesis on the subject of in vector microprocessors.
Asanović, Waterman and Lee soon realised, though, that using Arm would mean buying a licence. Even worse, they would probably not be able to modify the core as they wanted and would then struggle to distribute their designs to researchers at other universities to study and build on.
They looked at existing ISAs with fewer restrictions, such as SPARC, OpenRISC and DLX, another ISA designed by David Patterson together with John Hennessy at Stanford in the 1990s. In each case though the ISA didn’t meet their needs.
With no obvious alternatives available, in May 2010 they started work on the development their own ISA.
By August 2010 the new processor had a name ‘RISC-V’: with V chosen as ‘V for Five, V for Vector, V for Variants’ and was already taking shape.
The designation as the ‘fifth RISC’ was seen as appropriate as Patterson’s later 1980s RISC projects at Berkeley, SOAR and SPUR, had been retrospectively designated as RISC-III and RISC-IV respectively.
The Berkeley team asked the question ‘Are we crazy not to use a standard ISA?’ before noting that existing standard ISAs (x86, ARM and GPUs) would probably be too complex for a university project anyway.
Asanović would later, credit Lee and Waterman for pushing the idea, reflecting that:
One of the greatest powers in the universe is grad-student naivety. When you don’t realise something is impossible, you try it anyway.
They were soon joined by David Patterson, who had developed RISC-I at Berkeley almost thirty years before.
The team iterated on the new ISA. In order to make implementation as simple as possible, they gradually reduced the number of instructions down to the absolute minimum needed.
By May 2011, they were ready to publish the first version of the ‘RISC-V Instruction Set Manual’. This short document (at just 32 pages) outlined several goals for the RISC-V ISA, including to:
Provide a realistic but open ISA that captures important details of commercial general-purpose ISA designs and that is suitable for direct hardware implementation.
Provide a small but complete base ISA that avoids “over-architecting” for a particular microarchitecture style.
Be simple to subset for educational purposes and to reduce the complexity of bringing up new implementations.
The new ISA was designed to support 32 and 64-bit variants (designated RV32 and RV64), multi-core implementations and floating point operations. Crucially, it was also designed to allow processor designers to add their own extensions to the architecture.
And the ISA was was very simple indeed. The minimum implementation of RV32 described in the first manual had just 47 instructions.
The first RISC-V Instruction Set Manual acknowledges the influence of previous research architectures, including Torrent-0, The Scale (Software-Controlled Architecture for Low Energy) project at MIT and Maven (Malleable Array of Vector-thread ENgines).
However, RISC-V represented a clean break from these predecessors, which had typically been based on Stanford’s MIPS RISC project.
Learning From The Past
The Berkeley team would later describe how a new ‘clean slate’ RISC ISA can learn from the mistakes of previous designs:
… a new RISC ISA can be better than its predecessors by learning from their mistakes:
- Leaving out too much: No load/store byte or load/store half word in the initial Alpha ISA, and no floating-point load/store double in MIPS I.
- Including too much: The shift option for ARM instructions and register windows in SPARC.
- Allowing current micro-architectural designs to affect the ISA: Delayed branch in MIPS and SPARC, and floating-point trap barriers in Alpha.
If RISC-V set out to learn from mistakes in previous ISA’s, it also re-used a lot from those ISA’s. David Patterson and Tony Chen published a paper entitled ‘RISC-V Geneology’ (sic) in 2016, which, on an instruction by instruction basis, sets out the historical antecedents of RISC-V:
This report discusses the historical precedents of RV32G. We look at 18 prior instruction set architectures, chosen primarily from earlier UC Berkeley RISC architectures and major proprietary RISC instruction sets.”
The ISA’s ranged from the mainframe CDC6600 of 1964, followed by RISC-I in 1981 and then leading to ARMv6 in 2002 and Cray X1 in 2003.
There are 122 instructions in RISCV-G (a version of RISC-V that includes single and double precision floating point instructions) . Patterson and Chen found that 98 of the instructions appear in at least three historical ISAs. Only 6 were not found in the 18 ISA’s examined.
The paper includes a detailed comparison of the instruction sets of RISC-V and 18 earlier architectures:
David Patterson later did did a more specific comparison with an earlier Berkeley design of his own, highlighting how much RISC-I and RISC-V have in common, in his 2017 blog post “How close is RISC-V to RISC-I?”
By far the biggest surprise was how close is the original instruction set of the RISC-I to the base instruction set of RISC-V (RV32I). The figure below compares the two instruction sets. In fact, RISC-I is may be the closest instruction set to RISC-V of any era; it is certainly much closer than the original Stanford MIPS and IBM 801 instruction sets.
So RISC-V really was a return to the roots of RISC.
Chisel and Rocket
An instruction set is of limited use without working silicon. Over the next three years, Asanović and his colleagues developed a series of processor cores using the new architecture. To support this work, they created a number of tools to enable the development of hardware using the RISC-V architecture.
Chisel
The most important is probably ‘Chisel’, a tool, based on the Scala programming language, that takes the place of traditional ‘hardware description languages’ like Verilog and VHDL. These languages enable designers to specify the structure of the digital logic in an integrated circuit in a text file.
Verilog and VHDL are relatively ‘low level’ in how they describe the hardware. Scala on the other hand is a ‘high-level’ language that allows for the use of object oriented and functional approaches to programming.
Chisel enables designers to use a higher level of abstraction, which should make re-use of components easier and so support rapid iteration of designs.
Here is a brief extract from the definition of the RISC-V ‘SonicBoom’ (one of the RISC-V cores developed by the Berkeley team) core’s decode module, specified using Chisel. This is part of the logic that decodes ‘branch’ instructions.
Rocket Chip Generator
Alongside Chisel they created RocketChip generator, a tool to enable researchers to quickly build their own System-on-Chip (SoC) designs.
Rocket Chip is an open-source System-on-Chip design generator that emits synthesizable RTL. It leverages the Chisel hardware construction language to compose a library of sophisticated generators for cores, caches, and interconnects into an integrated SoC. Rocket Chip generates general-purpose processor cores that use the open RISC-V ISA, and provides both an in-order core generator (Rocket) and an out-of-order core generator (BOOM). For SoC designers interested in utilizing heterogeneous specialization for added efficiency gains, Rocket Chip supports the integration of custom accelerators in the form of instruction set extensions, coprocessors, or fully independent novel cores. Rocket Chip has been taped out (manufactured) eleven times, and yielded functional silicon prototypes capable of booting Linux.
Early RISC-V Hardware
Using these new tools, the Berkeley team started to build real hardware using the RISC-V ISA.
The first core was called Raven-1, developed in May 2011, which was built using a STMicroelectronics 28nm process. Raven-1 was followed by Raven-2 in August 2012 and Raven-3 in September 2013.
By 2014 the Berkeley team were able to demonstrate significant progress with their new ISA. The paper “A 45nm 1.3GHz 16.7 Double-Precision GFLOPS/W RISC-V Processor with Vector Accelerators” claimed:
A 64-bit dual-core RISC-V processor with vector accelerators has been fabricated in a 45nm SOI process. This is the first dual-core processor to implement the open-source RISC-V ISA designed at the University of California, Berkeley.
To demonstrate the extensibility of the RISC-V ISA, we integrate a custom vector accelerator alongside each single-issue in-order scalar core.
The paper then goes on to compare this RISC-V core, favourably, against a commercial Arm core, the Cortex-A5:
In a standard 40 nm process, the RISC-V scalar core scores 10% higher in DMIPS/MHz than the Cortex-A5, ARM’s comparable single-issue in-order scalar core, and is 49% more area-efficient.
To compare against published numbers for a 32-bit ARM Cortex-A5, we implemented a similarly configured 64-bit Rocket core and evaluated it with the same Dhrystone benchmark and the same TSMC40GPLUS process corner that ARM used to evaluate the Cortex-A5. To demonstrate the extensibility of the RISC-V ISA, we integrate a custom vector accelerator alongside each single-issue in-order scalar core.
Notably, the Berkeley team were already able to use standard open source software tools with the new ISA:
The open-source RISC-V software toolchain includes a GCC cross-compiler, an LLVM cross-compiler, a software ISA simulator, an ISA verification suite, a Linux port, and additional documentation, and is available at www.riscv.org.
From the very earliest of the architecture, its developers were adding specialised extensions to the instruction set. The SoC in this paper also included ‘Hwacha’ floating-point vector accelerators developed at Berkeley alongside the RISC-V cores:
Hwacha is a single-lane decoupled vector pipeline optimized for an ASIC process. Hwacha more closely resembles traditional Cray vector pipelines than the SIMD units in SSE, AVX, or NEON.
By 2015, the team was working on the development of more complex cores with the development of the BOOM ‘Berkeley Out of Order Machine’ core:
BOOM is an out-of-order, superscalar RV64G core generator. The goal of BOOM is to serve as a baseline implementation for education, research, and industry and to enable in-depth exploration of out-of-order micro-architecture.
BOOM is written in 10k lines of Chisel code. BOOM is able to accomplish this low line count in part by instantiating many parts from the greater Rocket Chip repository; the front-end, functional units, page table walkers, caches, and floating point units are all instantiated from the Rocket and hardfloat repositories.
By 2016 the Berkeley team had used their tools and the RISC-V ISA to design and build 11 chips on either 28nm or 45nm processes from STMicroelectronics, IBM or TSMC.
Instruction Sets Should Be Free
There was more to RISC-V than some tools and few cores though. There was a philosophy.
In 2014, the team returned to the original motivation behind the development of RISC-V. The paper ‘Instruction Sets Should Be Free: The Case For RISC-V’ by Asanović and Patterson sets out the case for an open and free ISA. The paper states that: “While instruction set architectures (ISAs) may be proprietary for historical or business reasons, there is no good technical reason for the lack of free, open ISAs”.
It’s not an error of omission. Companies with successful ISAs like ARM, IBM, and Intel have patents on quirks of their ISAs, which prevent others from using them without licenses. 1 Negotiations take 6-24 months and they can cost $1M-$10M, which rules out academia and others with small volumes. 2 An ARM license doesn’t even let you design an ARM core; you just get to use their designs. (Only ≈15 big companies have licenses that allow new ARM cores.)
Asanović would later turn this into a series of questions, considering a long list of historic, and mostly defunct, ISAs.
Do we need all these different ISAs?
Must they be proprietary?
Must they keep disappearing?
What if there was one stable free and open ISA everyone could use for everything?
So RISC-V would be a free and open ISA. According to RISC-V International:
The RISC-V ISA is free and open with a permissive license for use by anyone in all types of implementations. Designers are free to develop proprietary or open source implementations for commercial or other exploitations as they see fit.
RISC-V is sometimes compared with ‘Open Source’ projects such as Linux. However, it’s closer to open standards such as Ethernet. The RISC-V ISA defines on open standard for the interface between processor hardware and software just as the Ethernet standard defines a set of open networking standards.
However, the open standard does not mean that any particular RISC-V core implementation will be either free or open. Users of the RISC-V standards are free to be able to make their designs open or closed depending on their preference and purposes.
There are, however, a number of ‘open source’ RISC-V cores available, including the original Rocket and BOOM designs from Berkeley and cores from Western Digital and others.
Hot Chips
To have an impact, RISC-V needed to ‘escape’ from Berkeley, so the RISC-V team started to ‘spread the word’ about RISC-V.
In September 2014, they descended on the ‘Hot Chips 26’ conference in Cupertino. A dozen members of the RISC-V team with blue t-shirts, decorated with a blue and yellow RISC-V logo, manned a sponsor booth and started to proclaim the benefits of RISC-V.
Badges were distributed with the slogan ‘Instruction sets want to be free!’ and a sign compared the Berkeley ‘RISC-V Rocket’ design with the Arm Cortex A-5.
The Berkeley team found a receptive audience for their message. They have recalled that they found that attendees were frustrated, less with the cost of licensing cores, but with how long it took to get started using commercial cores.
‘Hot Chips 26’ was only the start of the RISC-V team’s efforts to build an ecosystem around RISC-V. As early as 2014, the team already had huge ambitions for their new architecture. Returning to the 2014 paper ‘Instruction Sets Should Be Free: The Case For RISC-V’ by Asanović and Patterson, the paper ends by stating a goal for the RISC-V architecture:
Although it’s hard to set aside biases, we believe that RISC-V is the best and safest choice for a free, open RISC ISA.
… our goal is grander: just as Linux has become the standard OS for most computing devices, we envision RISC-V becoming the standard ISA for all computing devices.
We’ll consider this goal and find out how the backers of RISC-V have set out to achieve it in the next post in this series, ‘RISC-V – Part 2 : Aims and Ambitions’. We’ll look at the formation and the work of ‘RISC-V International’, at the later development of RISC-V and at a number of the efforts to commercialise RISC-V.
The RISC-V Instruction Set : A Brief Introduction
Let’s have a very high level look at the basics of the instruction set.
Integer Registers
A RISC-V CPU typically has 321 general purpose registers designated x0 to x31.
The register x0 is hard-coded to be zero. The program counter is distinct from these ‘general purpose’ registers.
Conditional branch instructions, for example, specify the comparison to be undertaken to determine whether the branch is taken, rather than relying on a previously set ‘flag’ that is set by a previous arithmetic or logic instruction.
The width of these registers depends on the ‘Base ISA’ used, with 32 and 64 bit versions - defining both the size of integer registers and the size of the address space - currently commonly being used. There is also a 128-bit version, which started as a joke, but has now become part of the standard.
Base ISA
Although it is convenient to speak of the RISC-V ISA, RISC-V is actually a family of related ISAs which build on what are known as ‘base’ ISAs. There are currently four base ISAs.
The original integer ‘bases’ (known as I) provide basic arithmetic and logic and load / store instructions. The simplest 32-bit and 64-bit RISC-V implementations are known as RV32I and RV64I, respectively.
Base Integer Instruction Set
So let’s have a look at the instructions in RV32I. The objectives of RV32I are set out in the ‘RISC-V Instruction Set Manual’
RV32I was designed to be sufficient to form a compiler target and to support modern operating system environments. The ISA was also designed to reduce the hardware required in a minimal implementation.
Consistent with the RISC ‘load-store’ philosophy, all arithmetic and logic instructions use values in registers or immediate values from the instruction itself.
The instructions in RV32I can be grouped as follows (with the assembler mnenomics for the instructions in each group):
8 that load or store bytes, half-words or words to or from memory. [LB, LH, LW, LBU, LHU, SB, SH, SW].
6 that perform program branches, including conditional branches, depending on the results of a specified comparison. [BEQ, BNE, BLT, BGE, BLTU, BGE].
6 that shift registers by a specified number of bits. [SLL, SLLI, SRL, SRLI, SRA. SRAI].
3 arithmetic instructions. [ADD, ADDI, SUB]
6 logical instructions. [XOR, XORI, OR, ORI, AND, ANDI]
4 that load either 0 or 1 into a register, depending on the result of a comparison. [SLT, SLTI, SLTU, SLTUI].
2 jump and link instructions that store ‘program counter+4’ in a register and then jump to another address. [JAL, JALR].
2 that either load an immediate value into the upper 12 bits of a register or add an immediate value to the upper 12 bits of the program counter. [LUI, AUIPC]
2 that either transfer control either to the operating system or to a debugger. [SCALL, SBREAK]
1 relating to memory ordering. FENCE.
And that’s it! Just 40 instructions.
(Note that since the first version of the ISA manual was published in 2011, with 47 instructions, seven instructions, relating to memory ordering and status flags have been moved out from the base RV32I ISA.)
It’s worth emphasising that each of these instructions corresponds to a single machine instruction. There aren’t a multitude of addressing modes for each assembler mnemonic. Taking just one example, the SUB
instruction simply subtracts one register from another placing the result in a third.
SUB rd, rs1, rs2
This subtracts register rs2 from rs1 and places the result in register rd.
By now you might be wondering about simple instructions that are missing. For example there is no ‘move’ instruction that copies the value in one register to another register. In fact there is a simple answer to this. We can use ADDI
(add immediate) with an immediate value of zero to achieve this.
ADDI rd, rs, 0
This copies register rs into register rd.
This operation is so common that it’s convenient to give it it’s own assembly mnemonic MV
, so we can write this as:
MV rd, rs
In fact RISC-V specifies a whole series of these ‘pseudoinstructions’.
Instruction Encoding
The ISA is also designed to be straightforward for a processor to decode. The base 32-bit integer ISA has just 6 instruction formats and these place things like the opcode and source and destination register in the same place in the 32-bit long instruction.
Extensions
Many implementations of RISC-V ISA will add extensions that will materially increase the number of instructions - for example the vector extension adds almost two hundred new instructions - but the ISA is still significantly smaller than alternatives like x86 and Arm.
Here is a full list of the ‘standard’ extensions as at the start of August 2023:
In this table ‘G’ isn’t an extension but stands for a combination of the I base and a number of extensions. From the RISC-V instruction set specification:
One goal of the RISC-V project is that it be used as a stable software development target. For this purpose, we define a combination of a base ISA (RV32I or RV64I) plus selected standard extensions (IMAFD, Zicsr, Zifencei) as a “general-purpose” ISA, and we use the abbreviation G for the IMAFDZicsr Zifencei combination of instruction-set extensions.
Let’s consider the C extension. In the ‘I base’, all instructions are 32 bits long. One of the challenges for an architecture like RISC-V is obtaining code density that is competitive or better than competing ISA’s such as x86. The C extension allows a subset of the I instructions to be encoded in 16-bits, giving a significant improvement in code density.
The floating-point extensions are called F and D, and they respectively add single and double precision floating point.
These extensions add 32 new registers to the architecture. These registers are either 32-bit or 64-bit depending on whether the D extension is included in the architecture. In practice, F and D are usually implemented together.
We’ve now listed all the ‘standard’ extensions that are specified by RISC-V standards (we’ll explore what this means in Part 2). Crucially, though, there is nothing to stop a CPU designer from adding their own extensions. These can be then be made open or kept proprietary.
That’s the briefest of introductions to the RISC-V architecture. If that’s whetted your appetite for more then there are lots of resources to help explore the architecture in this week’s paid supplementary post.
Note that the RV32E (E for embedded) has only 16 registers.