A literary assembly language

A recent edition of [Babbage's] The Chip Letter discusses the obscurity of assembly language. He points out, and I think correctly, that assembly language is more often read than written, but nearly all of it is hampered by the obscurity left over from the days when punched cards had 80 columns and a six-letter symbol was anything you could manage in the computer's limited memory space. For example, without looking it up, what does the ARM instruction FJCVTZS do? The full name of the instruction is JavaScript Floating-Point Convert to Signed Fixed-Point Rounding to Zero. Not super useful.

But it occurred to me that nothing prevents you from writing a literary assembler designed to be easier to read. First of all, most C compilers accept some sort of asm statement, and you can probably handle that with string construction and macros at compile time. However, I think there is a better possibility.

Reuse, recycle

As I sometimes develop new CPU architectures, I have a universal cross assembler which is, honestly, an ugly hack, but works pretty well. I've covered this before, but if you don't want to read the whole article about it, it uses a few simple tricks to convert standard-looking assembly language formats to C code which is then compiled. Executing the resulting program produces the desired machine language in a desired file format. It's very easy to set up, and in the middle there's a nice C program that emits machine code. It's not much more readable than raw assembly, but you shouldn't have to see it. But what if we started the process there and made the format readable?

At the heart of the system is a C program that resides in soloasm.c. It handles command line options and output file generation. It calls an external function, genasm with a single integer argument. When this argument is set to 1, it indicates that the assembler is in its first pass and you only have to fill the label values ​​with real numbers. If the pass is a 2, it actually means filling in the array that contains the code.

This array is defined in the __solo_info statement (soloasm.h). It includes the memory size, a pointer to the code, the processor word size, start and end addresses, and an error flag. Normally the system converts your assembly language input into a set of function calls which it writes inside the genasm function. But in this case, I want to reuse soloasm.c to create a literal assembly language.

Modernize

I wrote all of this a long time ago, but wanted to make creating assembly literals easier, so I decided to do an effortless conversion to C++. This allows you to use nice data structures for the symbol table, for example. However, I didn't use all the C++ features I could get, just for the sake of time.

The base class is reasonably processor independent, and as an example I have provided a literate RCA 1802 assembler. Just a proof of concept, so I could probably name the instructions a bit more consistently, and there's plenty of room for further improvement, but it gets my point across.

Here is an excerpt from a flashing light program written for the 1802 using standard assembler syntax:

ORG 0 Major: HIGH LDI(R3GB) ISP R3 LDI LOW(R3GB) PLO R3 MS R3 R3Go: HIGH LDI (delay) PHI R9 LDI LOW (delay) PLO R9 HIGH LDI (stack) ISP R7 LDI LOW (battery) PLO R7 SEX R7 ILD 0 STR R7 Loop: OUT 4 . . . NO BR1 DELAY ORG $F0 Stack: BD 0 END Main

Now here is the exact same thing written for the literate assembler:

// Simple Literacy Program 1802 #include "lit1802.h" #set ON 1 #set OFF 0 #define DELAYPC 9 // delay the subroutine #define DELAYR 8 // delay count register #define MAINPC 3 // Main routine PC #set RX 7 // RX value #define DELAYVAL 0xFF // delay time (0-255) cancel the program (cancel) { Origin(0x0); // Flashing light program // Major: Define_Label("Main"); // Force R3 as PC just in case Load_R_Label(MAINPC,"R3Go"); Set_PC_To_Register(MAINPC); // Here we are P=3 // R3Go: Define_Label("R3Go"); // Set R9 to delay routine (default PC = 0) Load_R_Label(DELAYPC,"Delay"); // Set RX=7 to memory 00F0 Load_R_Label(RX,"Stack"); Set_X_To_Register(RX); Charge_D_Imm(0); Store_D_To_Reg_Address(RX); // Loop: Define_Label("Loop"); Output_Mem_RX_Incr(4); // write the count to the LED . . . NOP(10); Branch(Label("Delay1")); // note... could define BRANCH as _BRANCH then #define Branch(l) _BRANCH(Label(l)) if yo...

A literary assembly language

A recent edition of [Babbage's] The Chip Letter discusses the obscurity of assembly language. He points out, and I think correctly, that assembly language is more often read than written, but nearly all of it is hampered by the obscurity left over from the days when punched cards had 80 columns and a six-letter symbol was anything you could manage in the computer's limited memory space. For example, without looking it up, what does the ARM instruction FJCVTZS do? The full name of the instruction is JavaScript Floating-Point Convert to Signed Fixed-Point Rounding to Zero. Not super useful.

But it occurred to me that nothing prevents you from writing a literary assembler designed to be easier to read. First of all, most C compilers accept some sort of asm statement, and you can probably handle that with string construction and macros at compile time. However, I think there is a better possibility.

Reuse, recycle

As I sometimes develop new CPU architectures, I have a universal cross assembler which is, honestly, an ugly hack, but works pretty well. I've covered this before, but if you don't want to read the whole article about it, it uses a few simple tricks to convert standard-looking assembly language formats to C code which is then compiled. Executing the resulting program produces the desired machine language in a desired file format. It's very easy to set up, and in the middle there's a nice C program that emits machine code. It's not much more readable than raw assembly, but you shouldn't have to see it. But what if we started the process there and made the format readable?

At the heart of the system is a C program that resides in soloasm.c. It handles command line options and output file generation. It calls an external function, genasm with a single integer argument. When this argument is set to 1, it indicates that the assembler is in its first pass and you only have to fill the label values ​​with real numbers. If the pass is a 2, it actually means filling in the array that contains the code.

This array is defined in the __solo_info statement (soloasm.h). It includes the memory size, a pointer to the code, the processor word size, start and end addresses, and an error flag. Normally the system converts your assembly language input into a set of function calls which it writes inside the genasm function. But in this case, I want to reuse soloasm.c to create a literal assembly language.

Modernize

I wrote all of this a long time ago, but wanted to make creating assembly literals easier, so I decided to do an effortless conversion to C++. This allows you to use nice data structures for the symbol table, for example. However, I didn't use all the C++ features I could get, just for the sake of time.

The base class is reasonably processor independent, and as an example I have provided a literate RCA 1802 assembler. Just a proof of concept, so I could probably name the instructions a bit more consistently, and there's plenty of room for further improvement, but it gets my point across.

Here is an excerpt from a flashing light program written for the 1802 using standard assembler syntax:

ORG 0 Major: HIGH LDI(R3GB) ISP R3 LDI LOW(R3GB) PLO R3 MS R3 R3Go: HIGH LDI (delay) PHI R9 LDI LOW (delay) PLO R9 HIGH LDI (stack) ISP R7 LDI LOW (battery) PLO R7 SEX R7 ILD 0 STR R7 Loop: OUT 4 . . . NO BR1 DELAY ORG $F0 Stack: BD 0 END Main

Now here is the exact same thing written for the literate assembler:

// Simple Literacy Program 1802 #include "lit1802.h" #set ON 1 #set OFF 0 #define DELAYPC 9 // delay the subroutine #define DELAYR 8 // delay count register #define MAINPC 3 // Main routine PC #set RX 7 // RX value #define DELAYVAL 0xFF // delay time (0-255) cancel the program (cancel) { Origin(0x0); // Flashing light program // Major: Define_Label("Main"); // Force R3 as PC just in case Load_R_Label(MAINPC,"R3Go"); Set_PC_To_Register(MAINPC); // Here we are P=3 // R3Go: Define_Label("R3Go"); // Set R9 to delay routine (default PC = 0) Load_R_Label(DELAYPC,"Delay"); // Set RX=7 to memory 00F0 Load_R_Label(RX,"Stack"); Set_X_To_Register(RX); Charge_D_Imm(0); Store_D_To_Reg_Address(RX); // Loop: Define_Label("Loop"); Output_Mem_RX_Incr(4); // write the count to the LED . . . NOP(10); Branch(Label("Delay1")); // note... could define BRANCH as _BRANCH then #define Branch(l) _BRANCH(Label(l)) if yo...

What's Your Reaction?

like

dislike

love

funny

angry

sad

wow