diff options
Diffstat (limited to '03/README.md')
-rw-r--r-- | 03/README.md | 168 |
1 files changed, 168 insertions, 0 deletions
diff --git a/03/README.md b/03/README.md new file mode 100644 index 0000000..b4ab20b --- /dev/null +++ b/03/README.md @@ -0,0 +1,168 @@ +# stage 03 +The code for this compiler (the file `in02`, an input for our [stage 02 compiler](../02/README.md)) +is 2700 lines—quite a bit larger than the previous ones. And as we'll see, it's a lot more powerful too. +To compile it, run `../02/out01` from this directory. +Let's take a look at `in03`, the example program I've written for it: +``` +B=:hello_world +call :puts +; exit code 0 +J=d0 +syscall x3c + +:hello_world +str Hello, world! +xa +x0 + +; output null-terminated string in rbx +:puts + R=B + call :strlen + D=A + I=R + J=d1 + syscall d1 + return + +; calculate length of string in rbx +:strlen + ; keep pointer to start of string + D=B + I=B + :strlen_loop + C=1I + ?C=0:strlen_loop_end + I+=d1 + !:strlen_loop + :strlen_loop_end + I-=D + A=I + return +``` +This language looks a lot nicer than the previous one. No more obscure two-letter label names +and commands! Furthermore, try changing `:strlen_loop` on line 31 +to a typo like `:strlen_lop`. You should get: +``` +Bad label 001f +``` +Not only do we get an error message, we also get the line number +of the error! It's in hexadecimal, unfortunately, but that's +better than nothing. + +I spent a while on this compiler (perhaps I went a bit overboard +on the features), because for the 02 language +was the first that was actually pleasant to use! +It's much less sophisticated than even most assembly languages, +but being able to use labels without having to worry about filling +in the offsets later made it way nicer to use than the previous +languages. + +In addition to `in03`, this directory also has `ex03`, +which gives examples of all of the instructions supported by this compiler. + +Seeing as this is a relatively large compiler, +here is an overview of how it works: + +## functions + +Thanks to labels, we can actually use functions in this compiler, without +it being a complete nightmare. Functions are called like this: +``` +im +--fu +cl (this would call the function ::fu) +``` +and at the end of each function, we get `re`, which returns from the function. +I've used the convention of storing return values in `rax` and +passing the argument to a unary function in `rbx`. + +This compiler ended up having a lot of functions, some of them used in all sorts +of different places. + +## execution + +Just as with the 02 compiler, we need two passes: +the first one +computes the address of each label, +and the second one uses the correct addresses to +write the executable. + +Each pass is a loop, which starts by incrementing +the line number (`::L#`). Then we read in a line +from the source file, `in03`. This is done one character +at a time, until a newline is reached. The line is stored +in the buffer `::LI`. In the remainder of the program we +(mostly) use the fact that the line is newline-terminated, +rather than keeping track of how long it is. + +Once the line is read in, a bunch of tests are performed on it. +We start by looking at the first character: if it's a `;`, +the line is a comment; if it's a `!`, it's an unconditional jump; etc. +Failing that, we look at the second character, to see if it's +`=`, `+=`, `-=`, etc. If it doesn't match any of them, we use +the `::s=` (string equals) function, which conveniently lets you +set the terminator. We check if the line is equal to `"syscall"` +up to a terminator of `' '` to check if it's a syscall, for example. + +## `+=`, et al. + +We can emit the correct instruction for `D+=C` with: + +- `mov rbx, rdx` +- `mov rax, rcx` +- `add rax, rbx` +- `mov rdx, rax` + +A similar pattern can be used for `-=`, `&=`, etc. +This made it pretty easy to write the implementation of all of these: +there's one function for setting `rbx` to the first operand (`::B1`), +another for setting `rax` to the second operand (`::A2`), and another for +setting the first operand to `rax` (`::1A`). The implementations of +`+=`/`-=`/etc. just call those three functions, with a bit of stuff in between +to perform the corresponding operation. +A similar approach also works for loading/storing values in memory. + +## label list + +Instead of a label table, we now have a "label list" (or array +if you prefer) at `::LB`. +A pointer to the current end of the list is stored at `::L$`. +Each entry is the name of the label, including the `:`, then a newline, +then the 4-byte address. +`::ll` is used to look up labels. If it's the first pass, +`::ll` just returns 0. Otherwise, it looks up the label by +comparing it to each entry using `s=` with a terminator of `'\n'`. +If no label matches, we get an error. + +## alignment +A lot of data used in this program is +[not correctly aligned](https://en.wikipedia.org/wiki/Bus_error#Unaligned_access)—e.g. +8-byte values are not always stored at an address that is a multiple of 8. +This would be a problem on some processors, but x86-64 can handle it. +It's still not a good idea in practice—reading unaligned memory +is much slower. But we're not really concerned about performance here, +and it would be a bit finnicky to align everything correctly. +However, I have introduced `align` into this language, +which you can put before a label to ensure that its address is aligned +to 8 bytes. + +## errors + +Errors are handled in functions beginning with `!`, e.g. `::!n` for "bad number". +Each of these ends up calling `::er`. `::er` prints +a string specific to the type of error, then +converts the line number to a string, and prints it. +The line number is always converted to a 4-digit hexadecimal number. +This means it won't fully work past 65,535 lines, but +let's hope we don't need to write any programs that long! + +## limitations + +Functions in this 03 language will probably overwrite the previous values +of registers. This can make it kind of annoying to call functions, since +you need to make sure you store away any information you'll need after the function. +And the language definitely won't be as nice to use as something with real variables. But overall, +I'm very happy with this compiler, considering it's written in a language with 2-letter label +names. + |