Learning Assembly for Fun, Performance and…

Feb 14

Why take an interest in assembly language and which to learn?

31 Comments

Just recently retired after programming solely in commercial settings for over 50 years, more than half as a compiler writer for DEC, Apollo Computer, HP, IBM and Mathworks. Of course I started in assembly. There was really nothing else available for building commercial applications. I cut my teeth on PDP-11 assembly, learning from instruction set manauls that DEC handed out for free. Though not perfectly uniform (in particular, no byte add nor byte subtract), the regularity and orthogonality of the architectural features made learning a breeze.

To this day, if I were asked to teach assembly language concepts, the PDP-11 would be my very first choice.

In my case, that choice is not based on any early love. Were I truly to follow my heart, I could pitch the ISP of Apollo Computer's DN10000, which I designed (patent US5051885A) and for which I led the compiler team. After HP bought Apollo, we retargetted our compiler's backend to generate code for PA-RISC and Intel i860 (though HP never shipped those products). Along the way, I did persuade HP to include multiple DN10000 instruction set features into the PA93, the 64-bit extension of PA-RISC.

In addition to the machines mentioned above, I have deep exposure to a bewildering collection of assembly languages . I have shipped commercial assembly code for DEC PDP-8, VAX, and Alpha, Motorola 68000 and 68020, IBM PowerPC (Motorola MPC5xx, IBM/AMCC 440), MIPS32, and Intel 8080 and 80x86, . I further learned the assembly languages well enough to debug code on the DEC PDP-6 and -10, the PDP-9 and -15, the IBM 1130, many ARM variants, and even MIT's one-off PCP-1x during the hacker heyday :-).

Still, I stand by my position that the PDP-11 assembly language is a consummate teaching vehicle.

Getting to the point of being able to read and debug disassembled real code requires develing into the arcana of calling conventions. Not for the feint of heart!

Expand full comment

Thanks so much for sharing, John. I'd love to do a post on the DN10000 and PRISM but documentation seems to be thin on the ground. Are there any good sources out there?

Expand full comment

I would love to help you tell the story of the DN10000. I have reached out to David Boundy (inventor of its unusual assembly language). He is now a patent attorney and has both Apollo DN10000 (PRISM) and DEC PRISM resources. I have also reached out to Rick Bahr (processor architect, now a professor at Stanford) and Russ Barbour (project manager, hardware and software) to see what either of them may have preserved.

Expand full comment

Wow, what a highly debatable subject!

I think you will always have a soft spot for what you grew up with, and whilst I’ve done my fair share of Z80 and 8086 assembly, I’d always prefer my first 8 and 16 bit loves… 6502 and 68000.

But hey, I still love Pascal so what would I know!

Expand full comment

I love the 6502 too! I actually wrote a post 65 Reasons to Celebrate the 6502 [1] a couple of years ago. It took a quirky and unique approach and that's why it's hard to recommend for anyone starting out. The 68k is great too and Pascal is awesome!

[1] https://thechipletter.substack.com/p/65-reasons-to-celebrate-the-6502

Expand full comment

Feb 14Edited

That is a great article that I somehow missed. The 6502 just seems so simple and elegant despite its quirks… dare I say RISC like?

Expand full comment

6502 is as far from RISC as they come. RISC is, most of all, a load/store architecture. Where memory accesses are separated from other operations. While 6502 is the opposite: first 256 bytes act as registers.

It's simple enough, yes, but very far removed from load/store approach.

Expand full comment

Debatable…

A RISC-based CPU (Reduced Instruction Set Computer) is a type of microprocessor architecture that uses a small, highly optimized set of instructions. This design philosophy focuses on executing a large number of simple instructions rather than a smaller number of complex ones, leading to higher performance and efficiency, especially in tasks that can be broken down into smaller operations.

Expand full comment

I don't know what to debate there, really. The story of RISC started from IBM 801 and was always about the idea of separation of memory accesses from other operations and performing all operations on registers and not touching memory. Of course it's possible to ignore everything but some words on the tin… and tell that some words from RISC definition match the design philosophy of 6502… but that would like saying that Boeing-747 is just an off-road car. Hey, it has some wheels, it can be driven around… why is it not a car?

Expand full comment

I did say… dare I say RISC like. If you don’t get it I can’t help you.

Steve Furber and Sophie Wilson of Acorn Computers were both fans of the 6502. So when they came to design their own microprocessor they took some of the ideas from the 6502 - especially its simplicity, economy and high memory bandwidth and applied them in the creation of the ARM 1

Expand full comment

Continue thread →

Really, it has to depend highly on the definition - when you only have a single accumulator and and X and Y register, the latter with even less functionality than the X, it has to by definition have a reduced instruction count, there are only so many possibilities.

Obviously, it does make up for it a certain amount with page zero and the addressing modes, but still I find that a more elegant abstraction than dozens of registers and all the associated instructions that result.

And I would never had said that page 0 acts as registers, it is not like the concept used by PDP CPUs that only had one register, really it is just the same as any other memory, with same addressing modes, but only has single byte addressing, so saves typical/minimum one clock cycle on every instruction where it is used. When most instructions are 2 and 3 clock cycles, that is a big speed up.

Expand full comment

Sigh. May I recommend to open super-secret web site called wikipedia, open “RISC” page and scroll down to “Instruction set philosophy” which, quite literally, starts with this tidbit: “A common misunderstanding of the phrase "reduced instruction set computer" is that instructions are simply eliminated, resulting in a smaller set of instructions.”

It ALSO explains the whole concept and includes lots of references! Really good site, believe me!

</sarcasm off>

It's true that there are ambiguity because English works like that, but RISC is not meant to be read as “reduced (instruction set) computer“ but as “(Reduced instruction) set computer”.

It's not about number of instructions (PDP-8 is classic CISC and yet it has even smaller number of instructions than 6502), but about complexity of ONE, SINGLE instruction.

The idea of RISC is not to reduce NUMBER instructions, but to simplify them to execute them as quickly and as uniformly as possible.

On 6502 even NOP takes TWO cpu clocks and instructions may take 5, 6, 7 cpu cycles.

Yes, 6502 has pretty elegant design, but it's close to elegance of PDP-8, or PDP-11 where small number of instructions is complemented with pretty complicated semantic of these few, rather then compiler-oriented uniformity of RISC.

Expand full comment

4528 transistors, a bit later the 6809 came out, at around 900 transistors, but it was the last mainstream CPU to come out that did not have microcode, just a bunch of computational logic in silicon.

Expand full comment

I co-wrote an IBM/370 mainframe Security System product in 1981, all in assembler. Sold it in 1985 and retired, the product is still in operation around the world and is owned by Broadcom today.

Expand full comment

That's fantastic. I remember one of the key systems at the company where I worked in the early 2000s dated from around the same time and was written in 370 assembly. Memory was so scarce when it was originally written that they had to squeeze every last byte out using assembly - COBOL just didn't cut it!

Out of interest do you have a favorite 370 assembly reference document from that era?

Expand full comment

I've shipped 100% assembly programs commercially for the 65816 and 68000, and worked with 32-bit x86, Hitachi SH, MIPS, "classic" ARM, and PowerPC. I still prefer 65xx and 680x0, which feel like they were made for humans to program.

Like everyone these days I live 5000 feet above the hardware in C++17 and Javascript and things of that nature, but my Apple IIGS still works for when I want to touch grass, err, silicon.

Expand full comment

Completely agree that modern ISAs feel like they weren't made for humans to program. Even the RV32I feels a bit unfriendly. I think 68k would win the human friendly contest - or maybe ARM 1 as Sophie Wilson created it for her own use.

Expand full comment

In high school (Sacramento, 1970s) a guy who was a field service agent in the 50s and 60s built out a high school computer lab, with donated gear from his old customers. We had an early 60s drum-based computer! It had an assembler, but it ran all day and the computer overheated, so... you had to enter machine code through a very ergonomic switch setup. I saw down at a PDP-11 and was appalled at how clunky the switches were.

I would recommend starting at the bottom, and learning a microcode. After all, this is how SIMD machine code works, and GPUs are the rage. Then learn a normal assembler.

Ok, my assembler on the job story: my first computer-biz job was in the early 80s at "Fortune Systems". We made a 68000-based "small business computer" and became responsible for many of the weird caveats you find in VC funding offers. I got the motherboard boot ROM. The first few instructions were in assembler: "load a number into an address, jump forward 0x400000, fiddle with two registers, jump into a C subroutine".

Explanation: "program the memory-mapper unit, jump forward from position 0x8 to position 0x400008 so that now we're running through the memory-mapper, call the outer wrapper of the C-based boot code". From then on, everything ran through the memory-mapper.

Expand full comment

That's awesome. Thanks for sharing. Really tempted to do a post on microcode!

Expand full comment

Didn’t the USGov put out a statement about moving away from memory unsafe programming languages? I wonder how they feel about folks shipping direct assembly code.

Expand full comment

Didn't know the US Government had said that. Possibly this from NSA?

https://www.nsa.gov/Press-Room/Press-Releases-Statements/Press-Release-View/Article/3608324/us-and-international-partners-issue-recommendations-to-secure-software-products/

It's not a great idea these days to ship memory unsafe code when safe alternatives are available. I think learning assembly is really applied as a mostly 'read' skill - as a route to understanding and optimising code written in c or rust. I don't think we'll see a big uptick in shipped assembly code any time soon!

Expand full comment

Makes sense and to your point there are some real world examples, like Deepseeks optimized usage of GPUs, where understanding and analyzing the crux of how your software runs against hardware can bare fruit!

Expand full comment

I know (only, barely) enough to read and enjoy the hell out of this post. Thanks for the deep dive.

I'd love to see your take, regardless of how introductory, on fpga programming (e.g. with VHDL).

My hot take (please politely stifle all laughter at my naive impertinence..) -- even going super fast (i.e. assembly) on a SIMD device is not enough. GPU is arguably parallel SIMD, not quite MIMD. And MIMD itself (..are there even any MIMD chips?...) may not be enough. Future energy efficient computation will be vastly parallel, asynchronous, and lol probably virtually unprogrammable...by humans.

Expand full comment

Glad you enjoyed it. Agree 100% with your framing.

One of my long term aspirations is to do a post on all the major forms of parallelism.

Itanium was abandoned in part because we couldn't create compilers that were clever enough. Perhaps AI will change that?

Expand full comment

Thanks - Do you know of any MIMD chips (didn't TI have something DSP-y that might be multi-input, multi-data path?) Are you aware of any such chips?

If 'no' - do you plan to have to touch on fpga/asic work in VHDL? I realize it's a big ask -- assembly and 'regular' chips are already a heavy lift -- but it might draw in some of the hardware hackers...

As for "AI" (search on steroids, i.e.), I'd worry about that in this context. Very thin content for the LLMs to train on, higher risks of hallucination, given the complexity of the domain. Open to counterpoints, here.

Thanks again.

Expand full comment

> Although RISC-V advocates often claim that the architecture is be much simpler than the alternatives, by the time all the extensions required to create a modern system are added it’s still quite complex.

This makes you wonder if the whole RISC vs CISC dichotomy is a moot point to begin with 😅

Expand full comment

I just asked Grok "How many total instructions (mnemonics) are in RVA23U64?" and it came up with a very detailed answer listing and counting every instruction in every extension, which came to a total of 394 instructions, including 130 in the vector extension.

The same query about ARMv9-A came up with an answer of 956, and x86_64 up to AVX-512 and APX (which is just 40) at 1800 instructions.

With RVA23, ARMv9-A, and x86_64 with APX all three ISAs have very similar functionality, right down to RVV, SVE2, and AVX-512 having almost converged though with quite different approaches.

BUT, remember that RISC is about the complexity of each individual instruction, not the total number of instructions.

Expand full comment

Apr 12Edited

But the mighty MOV is all you need ...

https://github.com/xoreaxeaxeax/movfuscator

PS I'm having fun finding out which ISAs LLMs know about too.

Expand full comment

To be fair, MOV is technically a bunch of different instructions on x86. There are true OISC (One Instruction Set Computer) architectures like SUBLEQ, where the SUBLEQ instruction is defined as:

mem[B] = mem[B] - mem[A]; if (mem[B] <= 0) goto C;

Maybe one day I'll deep dive into esoteric programming languages and weird instruction set architectures like this 😅

Expand full comment

Wow, that is very impressive, it must have been tough choice to retire, you obviously have a strong affinity to this realm.

I am (slightly) younger than you at 58, by my guess, but at this point where some colleagues and similar aged acquaintances are retiring, i couldn't think of anything worse.

Do you do any relevant "hobbies", or did you take the path of leaving work at work?

Expand full comment

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts