summaryrefslogtreecommitdiff
path: root/03
diff options
context:
space:
mode:
Diffstat (limited to '03')
-rw-r--r--03/Makefile2
-rw-r--r--03/README.md168
-rw-r--r--03/ex03105
-rw-r--r--03/in023
-rw-r--r--03/in036
5 files changed, 250 insertions, 34 deletions
diff --git a/03/Makefile b/03/Makefile
index 3d71765..2a50640 100644
--- a/03/Makefile
+++ b/03/Makefile
@@ -1,4 +1,4 @@
-all: out02 out03
+all: out02 out03 README.html
out02: in02 ../02/out01
../02/out01
out03: out02 in03
diff --git a/03/README.md b/03/README.md
new file mode 100644
index 0000000..b4ab20b
--- /dev/null
+++ b/03/README.md
@@ -0,0 +1,168 @@
+# stage 03
+The code for this compiler (the file `in02`, an input for our [stage 02 compiler](../02/README.md))
+is 2700 lines—quite a bit larger than the previous ones. And as we'll see, it's a lot more powerful too.
+To compile it, run `../02/out01` from this directory.
+Let's take a look at `in03`, the example program I've written for it:
+```
+B=:hello_world
+call :puts
+; exit code 0
+J=d0
+syscall x3c
+
+:hello_world
+str Hello, world!
+xa
+x0
+
+; output null-terminated string in rbx
+:puts
+ R=B
+ call :strlen
+ D=A
+ I=R
+ J=d1
+ syscall d1
+ return
+
+; calculate length of string in rbx
+:strlen
+ ; keep pointer to start of string
+ D=B
+ I=B
+ :strlen_loop
+ C=1I
+ ?C=0:strlen_loop_end
+ I+=d1
+ !:strlen_loop
+ :strlen_loop_end
+ I-=D
+ A=I
+ return
+```
+This language looks a lot nicer than the previous one. No more obscure two-letter label names
+and commands! Furthermore, try changing `:strlen_loop` on line 31
+to a typo like `:strlen_lop`. You should get:
+```
+Bad label 001f
+```
+Not only do we get an error message, we also get the line number
+of the error! It's in hexadecimal, unfortunately, but that's
+better than nothing.
+
+I spent a while on this compiler (perhaps I went a bit overboard
+on the features), because for the 02 language
+was the first that was actually pleasant to use!
+It's much less sophisticated than even most assembly languages,
+but being able to use labels without having to worry about filling
+in the offsets later made it way nicer to use than the previous
+languages.
+
+In addition to `in03`, this directory also has `ex03`,
+which gives examples of all of the instructions supported by this compiler.
+
+Seeing as this is a relatively large compiler,
+here is an overview of how it works:
+
+## functions
+
+Thanks to labels, we can actually use functions in this compiler, without
+it being a complete nightmare. Functions are called like this:
+```
+im
+--fu
+cl (this would call the function ::fu)
+```
+and at the end of each function, we get `re`, which returns from the function.
+I've used the convention of storing return values in `rax` and
+passing the argument to a unary function in `rbx`.
+
+This compiler ended up having a lot of functions, some of them used in all sorts
+of different places.
+
+## execution
+
+Just as with the 02 compiler, we need two passes:
+the first one
+computes the address of each label,
+and the second one uses the correct addresses to
+write the executable.
+
+Each pass is a loop, which starts by incrementing
+the line number (`::L#`). Then we read in a line
+from the source file, `in03`. This is done one character
+at a time, until a newline is reached. The line is stored
+in the buffer `::LI`. In the remainder of the program we
+(mostly) use the fact that the line is newline-terminated,
+rather than keeping track of how long it is.
+
+Once the line is read in, a bunch of tests are performed on it.
+We start by looking at the first character: if it's a `;`,
+the line is a comment; if it's a `!`, it's an unconditional jump; etc.
+Failing that, we look at the second character, to see if it's
+`=`, `+=`, `-=`, etc. If it doesn't match any of them, we use
+the `::s=` (string equals) function, which conveniently lets you
+set the terminator. We check if the line is equal to `"syscall"`
+up to a terminator of `' '` to check if it's a syscall, for example.
+
+## `+=`, et al.
+
+We can emit the correct instruction for `D+=C` with:
+
+- `mov rbx, rdx`
+- `mov rax, rcx`
+- `add rax, rbx`
+- `mov rdx, rax`
+
+A similar pattern can be used for `-=`, `&=`, etc.
+This made it pretty easy to write the implementation of all of these:
+there's one function for setting `rbx` to the first operand (`::B1`),
+another for setting `rax` to the second operand (`::A2`), and another for
+setting the first operand to `rax` (`::1A`). The implementations of
+`+=`/`-=`/etc. just call those three functions, with a bit of stuff in between
+to perform the corresponding operation.
+A similar approach also works for loading/storing values in memory.
+
+## label list
+
+Instead of a label table, we now have a "label list" (or array
+if you prefer) at `::LB`.
+A pointer to the current end of the list is stored at `::L$`.
+Each entry is the name of the label, including the `:`, then a newline,
+then the 4-byte address.
+`::ll` is used to look up labels. If it's the first pass,
+`::ll` just returns 0. Otherwise, it looks up the label by
+comparing it to each entry using `s=` with a terminator of `'\n'`.
+If no label matches, we get an error.
+
+## alignment
+A lot of data used in this program is
+[not correctly aligned](https://en.wikipedia.org/wiki/Bus_error#Unaligned_access)—e.g.
+8-byte values are not always stored at an address that is a multiple of 8.
+This would be a problem on some processors, but x86-64 can handle it.
+It's still not a good idea in practice—reading unaligned memory
+is much slower. But we're not really concerned about performance here,
+and it would be a bit finnicky to align everything correctly.
+However, I have introduced `align` into this language,
+which you can put before a label to ensure that its address is aligned
+to 8 bytes.
+
+## errors
+
+Errors are handled in functions beginning with `!`, e.g. `::!n` for "bad number".
+Each of these ends up calling `::er`. `::er` prints
+a string specific to the type of error, then
+converts the line number to a string, and prints it.
+The line number is always converted to a 4-digit hexadecimal number.
+This means it won't fully work past 65,535 lines, but
+let's hope we don't need to write any programs that long!
+
+## limitations
+
+Functions in this 03 language will probably overwrite the previous values
+of registers. This can make it kind of annoying to call functions, since
+you need to make sure you store away any information you'll need after the function.
+And the language definitely won't be as nice to use as something with real variables. But overall,
+I'm very happy with this compiler, considering it's written in a language with 2-letter label
+names.
+
diff --git a/03/ex03 b/03/ex03
index 510018e..0270bb9 100644
--- a/03/ex03
+++ b/03/ex03
@@ -1,42 +1,87 @@
+; You can use registers like variables: rax = A, rbx = B, rcx = C, rdx = D, rsi = I, rdi = J, rsp = S, rbp = R
+; However, because of the way things are implemented, you should be careful about using A/B as variables:
+; they sometimes might not work correctly, and will be overwritten by a lot of statements
+
+; set register to...
+; decimal
+D=d123
+; hexadecimal
+D=x1ef
+; another register
+D=R we can have a comment here and in some other places. not after numbers or labels though.
+; label address
+D=:label
+; add
D+=d4
+D+=R
+; subtract
+D-=d123
+D-=R
+; left/right shift (only rcx is supported for variable shifts)
+D<=C
+D<=d33
+D>=C
+D>=x12
+; arithmetic right shift
D]=d7
D]=C
-D^=C
-D|=C
-D&=C
-~C
-B|=A
-8D=C
-A=1B
-B>=d33
-call :funciton
+; bitwise xor, or, and
+D^=R
+D|=R
+D&=R
+D^=d1
+D|=d1
+D&=d1
+; bitwise not
+; (this sets D to ~D)
+~D
+; dereference
+; set 8 bytes at rdx to rbp
+8D=R
+; set 4 bytes at rdx to ebp
+4D=R
+2D=R
+1D=R
+; set rcx/ecx/cx/cl to 8/4/2/1 bytes at rdx
+C=8D
+C=4D
+C=2D
+C=1D
+; call a function
+call :function
+; return
+return
+; label declarations
+;:function
+;:label
+; literal byte
x4b
+'H
+'i
+; string
+str This text will appear in the executable!
+; unconditional jump
!:label
-?J<B:label
-:label
-1B=C
-; :l ba b
-J=d0
-A=d60
+; conditional jump
+?R<S:label
+?R=S:label
+?R!S:label
+?R>S:label
+; (unsigned comparisons above/below)
+?RaS:label
+?RbS:label
+; syscall
syscall x3c
+; align to 8 bytes
align
-:label
+; reserve some number of bytes of memory
reserve d1000
-B+=J
-B<=d9
-B-=J
-?J=B:label
-?A!B:label
-?A>B:label
-A=:label
-x3c
-return
+; signed/unsigned multiply/divide
imul
idiv
mul
div
-:funciton
-call A
-str Here is some text which will be put in the executable!
-?CaD:label
-
+; e.g. to compute 5*3 into rcx (note rdx is wiped in the process):
+A=d5
+B=d3
+mul
diff --git a/03/in02 b/03/in02
index 879e17a..1632de1 100644
--- a/03/in02
+++ b/03/in02
@@ -2886,6 +2886,9 @@ jm
~~
::LI line buffer
~~
+~~
+~~
+~~
::L$ end of current label list
--LB
::LB labels
diff --git a/03/in03 b/03/in03
index ef0640a..a8d8744 100644
--- a/03/in03
+++ b/03/in03
@@ -1,6 +1,6 @@
-; write to stdout
B=:hello_world
call :puts
+; exit code 0
J=d0
syscall x3c
@@ -11,15 +11,15 @@ x0
; output null-terminated string in rbx
:puts
+ R=B
call :strlen
- I=D
D=A
+ I=R
J=d1
syscall d1
return
; calculate length of string in rbx
-; keeps pointer to start of string in rdx, end of string in rsi
:strlen
; keep pointer to start of string
D=B