31 Comments
Apr 10Liked by Babbage

I wished at least assembler syntax would at be standardized for the x86 ISA so that we don't have AT&T vs Intel syntax as well as slightly different mnemonics across various assemblers/compilers. As far as the instruction names, maybe in addition to standardized short form mnemonics, the standard could also define longer descriptive aliases for the same opcode, and/or maybe allow the user to define their own alias in the asm source.

Having a standard where the machine code would also optionally retain labels would be helpful too in debugging / disassembly. Or does that exist? I suppose it verges into compiler symbols maybe such asm labels could be stored in the same section. In working with Solaris in the past, one the most brilliant features the engineers implemented in the OS, compiler and debugger was always pushing the arguments for each function call onto the stack even if the arguments were actually passed by registers. The debugger, mdb or kmdb would retrieve this information for each stack frame and display it as arguments. It results in extra overhead but an intentional and worthy compromise IMHO, invaluable for debugging and troubleshooting. It worked for live disassembly with their debugger as well as with core dumps to retain the current arguments for the whole stack trace.

Expand full comment
Apr 9·edited Apr 9Liked by Babbage

One could argue the best way to "read assembly" is to read it's LLVM IR code.

The IR is why nearly all compiler development efforts have been switched to LLVM in recent years, and for good reason.

For one, it's actually a language meant to be read. Still, while staying implementation agnostic, it very closely resembles your machine code implementation.

And even if you don't have a source code, there are "lifters"" which translate your assembly code into IR, which in turn you can always compile back.

Expand full comment
Apr 8·edited Apr 8Liked by Babbage

It’s been well over 30 years since I last programmed assembly languages. Initially in process control computers with 256k (words) memory, later Digital Equipment’s PDP-11 and VAX/VMS.

Anyway, the “obscurity of assembly language”. Mind you, for all I remember a piece of code is as obscure as its programmer intends it to be, also bad programmers write obscure code. I’d rather maintain a well written assembly program than a C program where all language features are used apart from the comment feature (do refer to the Wikipedia page on “obfuscated c code contest” for examples).

Another point is “why on earth would you program in assembly language”? Valid reasons are “my system doesn’t support anything else”, or “I need to cram a program into 1024 bytes memory”. Invalid reasons are “only thing I know” or “real programmers write assembly” (google “real programmers don’t use Pascal” and read the article).

Yes, I wrote assembly programs, and yes, I wrote and modified operating systems. The most useful however was to let a compiler generate assembly code, then inspect and optimise it. Usually there was between 10% and 25% in size to be won, often more in speed and in process control environments that was interesting. I stopped looking at compiler generated assembly code in the mid-1980s. By then compilers became too smart to attempt to optimise the result any further.

Summarised “count your blessings and don’t use assembly code unless you really, really need to”.

Expand full comment
Apr 7·edited Apr 7Liked by Babbage

Check out the assembly language for the Analog Devices SHARC DSP series. They use an algebraic notation that is fairly easy to read.

The read-mostly nature of assembly language is a fairly recent phenomenon I think, corresponding to the point where C compilers regularly did better than about 1.2 times hand-coding (let's say gcc3.3 for argument's sake (2003)). Before then it was very often the reverse: vast reams of assembly code would be written for performance-critical programs like machine control, video games and signal processing, and most of it would never be read again. One of the aspects of assembly language is that it requires hard-coding largely irrelevant decisions like register allocation, which make minor code changes difficult. It is usually much simpler to re-write whole sections than it is to try to change functionality or add a feature. This is the great win from low-level-high-level languages: the automation of register allocation and instruction scheduling means that code can now reasonably be patched and changed. IMO. YMMV.

As far as why it is shaped the way it is: that's mostly for simplicity: there is a 1-1 relationship between each line of code and a corresponding machine instruction (give or take macro-expansion, which was very common in assemblers). You run the risk with algebraic notation that it immediately becomes possible to express things that don't correspond to the functionality of any single instruction. And conversely, there are quite a lot of instructions in modern architectures that require a page of pseudo-code to explain exactly what it is that they do, which you could never reasonably express algebraically.

Expand full comment
Apr 6Liked by Babbage

I have to say that I never used assembly language at work. But, when I was in school, boy I loved it. So intuitive and so addictive. Your faults are yours and not anybody else. For small programs it's marvellous. Easy to debug.

Expand full comment
Apr 13Liked by Babbage

No thank-you.

The entire point of assembly is that it is supposed to represent an exact one to one relation from the assembly operation code instructions to the underlying machine code binary instruction. Assuming that I know the processors instruction set architecture (ISA), I should be able to generate the actual binary machine code from looking directly at the assembly code instruction. I can look up the assembly operation code to get its numeric value, size, and calculate the rest of the instruction based on the operands. It's a very simple translation. I cannot do that with an instruction like:

M(k+3)×R to cA

It completely obscures what underlying binary code is supposed to be which is totally antithetical to the purpose of assembly language. I cannot even look up the instruction code in an assembly manual because the instruction code is never actually stated. This is reminiscent of other poorly written assembly languages such as x86 where the same 'mov' assembly code can represent 14 different underlying processor instructions (88, 89, 8A, 8B, 8C, 8E, A0, A1, A2, A3, B0, B8, C6, or C7) that must be inferred based on the type of the operands. THAT makes it difficult to read.

Determining the operands types should never be necessary to make the instruction distinctive. The instruction should always make the operands distinctive. The instruction to move values between registers should be different than the instruction that moves values from registers to memory and different from the instruction that moves a value from memory into a register. Design the names of your operation codes correctly and you will not have any trouble determining what the operands should be.

Expand full comment
Apr 10Liked by Babbage

It may be worth trying to dig out some assembly for the Apollo PRISM (that's "A88K", for the DN10000, not to be confused with DEC PRISM or Motorola M88K) if you'd like another example of a slightly more readable assembly language. (Unless I'm misremembering and am thinking of Pyramid Technology minicomputers...)

A fairly obvious point to anyone who's seen both: the Z80 incorporates the whole of the 8080 instruction set, but Z80 assembly uses more readable notation for the 8080 instructions than the original 8080 assembly, by far. (If you're ever coding for the 8080, do yourself a favour and use Z80 assembly for the purpose!)

Expand full comment
Apr 9Liked by Babbage

There is something to be said for the `mov ebx, eax` syntax: it reflects the instruction layout more accurately than `eax to ebx`. As far as opcode abbreviations are concerned, yeah it'd definitely be possible to give them more sane names.

Expand full comment

Check out the assembly for the BELLMAC-8:

#define NBYTES 100

char array[NBYTES];

sum()

{

b0 = &array;

a1 = 0;

for (a2 = 0; a2 < NBYTES; ++a2) {

a1 =+ b0;

++b0;

}

}

Yes, that's assembly, not C code.

Expand full comment

The early 6502 standard for "load accumulator immediate mode" was not

LDA #25

but rather

LDAim 25 (or LdaIm 25)

This matters because reputedly that (overlooked or extraneous) # ends up being the top reason for bugs AND misunderstood code in 6502.

Expand full comment

There’s also a HUGE difference between writing 8bit 6502 or z80 assembly compared to modern x86-64 asm

Expand full comment