diff options
Diffstat (limited to '02')
-rw-r--r-- | 02/Makefile | 5 | ||||
-rw-r--r-- | 02/README.md | 110 |
2 files changed, 115 insertions, 0 deletions
diff --git a/02/Makefile b/02/Makefile index 90df482..23fc8a1 100644 --- a/02/Makefile +++ b/02/Makefile @@ -1,2 +1,7 @@ +all: out01 out02 README.html out01: in01 ../01/out00 +out02: out01 + ./out01 +%.html: %.md ../markdown + ../markdown $< diff --git a/02/README.md b/02/README.md new file mode 100644 index 0000000..ec390ff --- /dev/null +++ b/02/README.md @@ -0,0 +1,110 @@ +# stage 02 + +The compiler for this stage is in the file `in01`, an input for our previous compiler. +The specifics of how this compiler works are in the comments in that file, but here I'll +give an overview. +Let's take a look at `in02`, an example input file for this compiler: +``` +jm +:-co jump to code +::hw +'H +'e +'l +'l +'o +', +' +'w +'o +'r +'l +'d +'! +\n +::he end of hello world +::co start of code +// +// now we'll calculate the length of the hello world string +// by subtracting hw from he. +// +im +--he +BA +im +--hw +nA ++B +DA put length in rdx +// okay now we can write it +im +##1. +JA set rdi to 1 (stdout) +im +--hw +IA set rsi to a pointer to "Hello, world!\n" +im +##1. write +sy +im +##0. exit code 0 +JA +im +##3c. exit = syscall 0x3c +sy +``` + +You can try adding more characters to the hello world message, and it'll just work; +the length of the text is computed automatically! + +This time, commands are separated by newlines instead of semicolons. +Each line begins with a 2-character command identifier. There are some special identifiers though: + +- `::` marks a *label* +- `--` outputs a label's (absolute) address +- `:-` outputs a label's relative address +- `##` outputs a number + +All other commands work like they did in the previous compiler—if you scroll down in the +`in01` source file, you'll see the full command table. + +## labels + +Labels are the most important new feature of this language. + +## two passes? + +## other features + +Now instead of writing out each of the 8 bytes making up a number, +we can just write it in hexadecimal (e.g. `##3c.` for `3c 00 00 00 00 00 00 00`), +and the compiler will automatically +extend it to 8 bytes. +This is especially nice because we don't need to write numbers backwards +for little-endianness anymore! +Numbers cannot appear at the end of a line (this was +to make the compiler simpler to write), so I'm adding a `.` at the end of +each one to avoid making that mistake. + +Anything after a command is treated as a comment; +additionally `//` can be used for comments on their own lines. +I decided to implement them as simply as possible: +I just added the command `//` to the command table, which outputs the byte `0x90`—this +means "do nothing" (`nop`) in x86-64. +Note that this means that the following code will not work as expected: +``` +im +// load the value 0x333 into rax +##333. +``` +since `0x90` gets inserted between the "load immediate" instruction code, and the immediate. + +## limitations + +Many of the limitations of our previous compilers apply to this one. Also, +if you use a label without defining it, it uses address 0, rather than outputting +an error message. This could be fixed: if the value in the label table is 0, and if we are +on the second pass, output an error message. This compiler was already tedious enough +to implement, though! +But thanks to labels, for future compilers at least we won't have to calculate +any jump offsets manually. |