summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--00/Makefile2
-rw-r--r--00/README.md26
-rw-r--r--01/Makefile2
-rw-r--r--01/README.md32
-rw-r--r--01/commands.txt5
-rw-r--r--02/Makefile4
-rw-r--r--02/README.md116
-rw-r--r--02/in0124
-rw-r--r--02/in0211
-rw-r--r--Makefile6
-rw-r--r--README.md27
-rw-r--r--instructions.txt1
-rw-r--r--markdown.c5
13 files changed, 177 insertions, 84 deletions
diff --git a/00/Makefile b/00/Makefile
index a328fee..b882d3e 100644
--- a/00/Makefile
+++ b/00/Makefile
@@ -3,3 +3,5 @@ out00: in00
./hexcompile
%.html: %.md ../markdown
../markdown $<
+clean:
+ rm -f out00 README.html
diff --git a/00/README.md b/00/README.md
index b30adb8..41b50bf 100644
--- a/00/README.md
+++ b/00/README.md
@@ -102,7 +102,7 @@ execute-enabled. Normally people don't do this, for security, but we won't worry
about that (don't compile any untrusted code with any compiler from this series!)
Without further ado, here's the contents of the program header:
-- `01 00 00 00` Segment type 1 (this should be loaded into memory)
+- `01 00 00 00` Segment type 1 (this segment should be loaded into memory)
- `07 00 00 00` Flags = RWE (readable, writeable, and executable)
- `78 00 00 00 00 00 00 00` Offset in file = 120 bytes
- `78 00 40 00 00 00 00 00` Virtual address = 0x400078
@@ -114,7 +114,7 @@ memory address that the segment will be loaded to.
Nowadays, computers use virtual memory, meaning that
addresses in our program don't actually correspond to where the memory is
physically stored in RAM (the CPU translates between virtual and physical
-memory addresses). There are many reasons for this: making sure each process has
+addresses). There are many reasons for this: making sure each process has
its own memory space, memory protection, etc. You can read more about it
elsewhere.
@@ -130,7 +130,7 @@ each page (block) of memory is 4096 bytes long, and has to start at an address
that is a multiple of 4096. Our program needs to be loaded into a memory page,
so its *virtual address* needs to be a multiple of 4096. We're using `0x400000`.
But wait! Didn't we use `0x400078` for the virtual address? Well, yes but that's
-because the *data in the file* is loaded to address `0x400078`. The actual page
+because the segment's data is loaded to address `0x400078`. The actual page
of memory that the OS will allocate for our segment will start at `0x400000`. The
reason we need to start `0x78` bytes in is that Linux expects the data in the
file to be at the same position in the page as when it will be loaded, and it
@@ -156,7 +156,8 @@ These instructions execute syscall `2` with arguments `0x40026d`, `0`.
If you're familiar with C code, this is `open("in00", O_RDONLY)`.
A syscall is the mechanism which lets software ask the kernel to do things.
[Here](https://filippo.io/linux-syscall-table/) is a nice table of syscalls you
-can look through if you're interested. You can also install `strace` (e.g. with
+can look through if you're interested. You can also install
+[strace](https://strace.io) (e.g. with
`sudo apt install strace`) and run `strace ./hexcompile` to see all the syscalls
our program does.
Syscall #2, on 64-bit Linux, is `open`. It's used to open a file. You can read
@@ -175,13 +176,13 @@ descriptor Linux gave us. This is because Linux assigns file descriptor numbers
sequentially, starting from
[0 for stdin, 1 for stdout, 2 for stderr](https://en.wikipedia.org/wiki/Standard_streams),
and then 3, 4, 5, ... for any files our program opens. So
-this file, the first one our program opens, will have descriptor `3`.
+this file, the first one our program opens, will have descriptor 3.
Now we open our output file:
- `48 b8 72 02 40 00 00 00 00 00` `mov rax, 0x400272`
- `48 89 c7` `mov rdi, rax`
-- `48 b8 41 02 00 00 00 00 00 00` `mov rax, 0x41`
+- `48 b8 41 02 00 00 00 00 00 00` `mov rax, 0x241`
- `48 89 c6` `mov rsi, rax`
- `48 b8 ed 01 00 00 00 00 00 00` `mov rax, 0o755`
- `48 89 c2` `mov rdx, rax`
@@ -193,11 +194,12 @@ similar to our first call, with two important differences: first, we specify
`0x241` as the second argument. This tells Linux that we are writing to the
file (`O_WRONLY = 0x01`), that we want to create it if it doesn't exist
(`O_CREAT = 0x40`), and that we want to delete any previous contents it had
-(`O_TRUNC = 0x200`). Secondly, we are setting the third argument this time. It
+(`O_TRUNC = 0x200`). Secondly, we're setting the third argument this time. It
specifies the permissions our file is created with (`0o755` means user
read/write/execute, group/other read/execute). This is not very important to
the actual execution of the program, so don't worry if you don't know
about UNIX permissions.
+Note that the output file's descriptor will be 4.
Now we can start reading from the file. We're going to loop back to this part of
the code every time we want to read a new hexadecimal number from the input
@@ -223,13 +225,13 @@ We're telling Linux to output to `0x40026a`, which is just a part of this
segment (see further down). Normally you would read to a different segment of
the program from where the code is, but we want this to be as simple as
possible.
-The number of bytes *actually read*, taking into account that we might have
+The number of bytes *actually* read, taking into account that we might have
reached the end of the file, is stored in `rax`.
- `48 89 c3` `mov rbx, rax`
- `48 b8 03 00 00 00 00 00 00 00` `mov rax, 3`
- `48 39 d8` `cmp rax, rbx`
-- `0f 8f 50 01 00 00` `jg 0x400250`
+- `0f 8f 50 01 00 00` `jg +0x150 (0x400250)`
This tells the CPU to jump to a later part of the code (address `0x400250`) if 3
is greater than the number of bytes we got, in other words, if we reached the
@@ -307,7 +309,7 @@ Okay, now `rax` contains the byte specified by the two hex digits we read.
- `48 93` `xchg rax, rbx`
- `88 03` `mov byte [rbx], al`
-Write the byte to a specific memory location (address `0x40026c`).
+Put the byte in a specific memory location (address `0x40026c`).
- `48 b8 04 00 00 00 00 00 00 00` `mov rax, 4`
- `48 89 c7` `mov rdi, rax`
@@ -356,7 +358,7 @@ This is where we conditionally jumped to way back when we determined if we
reached the end of the file. This calls syscall #60, `exit`, with one argument,
0 (exit code 0, indicating we exited successfully).
-Normally, you should close files descriptors (with syscall #3), to tell Linux you're
+Normally, you would close files descriptors (with syscall #3), to tell Linux you're
done with them, but we don't need to. It'll automatically close all our open
file descriptors when our program exits.
@@ -387,4 +389,4 @@ a while.
But these problems aren't really a big deal. We'll only be running this on
little programs and we'll be sure to check that our input is in the right
format. And with that, we are ready to move on to the
-[next stage...](../01/README.md).
+[next stage...](../01/README.md)
diff --git a/01/Makefile b/01/Makefile
index 5dde439..f40b401 100644
--- a/01/Makefile
+++ b/01/Makefile
@@ -5,3 +5,5 @@ out00: in00
../00/hexcompile
%.html: %.md ../markdown
../markdown $<
+clean:
+ rm -f out00 out01 README.html
diff --git a/01/README.md b/01/README.md
index 5ba8c52..a67d28b 100644
--- a/01/README.md
+++ b/01/README.md
@@ -8,7 +8,7 @@ is the executable for this stage's compiler. Run it (it'll read from the file
`Hello, world!` when run. Let's take a look at the input we're providing to the
stage 01 compiler, `in01`:
-<pre><code>
+```
|| ELF Header
;im;01;00;00;00;00;00;00;00 file descriptor for stdout
;JA
@@ -24,9 +24,9 @@ stage 01 compiler, `in01`:
;sy
;'H;'e;'l;'l;'o;',;' ;'w;'o;'r;'l;'d;'!;\n the string we're printing
;
-</code></pre>
+```
-Look at that! There are comments! Much nicer than just hexadecimal digit pairs.
+Look at that! There are even comments! Much nicer than just hexadecimal digit pairs.
## end result
@@ -50,9 +50,9 @@ actually print out an error message and exit, rather than continuing as if
nothing happened! Try adding `xx;` to the end of the file `in01`, and running
`./out00`. You should get the error message:
-<pre><code>
+```
xx not recognized.
-</code></pre>
+```
Pretty cool, huh?
Anyways let's see how this compiler actually works.
@@ -63,7 +63,7 @@ Writing in our stage 00 language is much nicer than editing an
executable, because it's easier to move things around, and also, we can separate
our program into lines! Let's take a look at the start:
-<pre><code>
+```
7f 45 4c 46
02
01
@@ -90,7 +90,7 @@ a8 00 40 00 00 00 00 00
00 10 02 00 00 00 00 00
00 10 02 00 00 00 00 00
00 10 00 00 00 00 00 00
-</code></pre>
+```
This is the ELF header and program header. It's just like our last one, but with
a couple of differences. First, our entry point is at offset 0xa8 instead of 0x78.
@@ -113,7 +113,7 @@ recognized."`
- `00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00` (unused)
Here's the data for our program. As you can see from my annotations, we have the
-input and output file, as well as the error message. The command part of the
+input and output file names, as well as the error message. The command part of the
error message is left blank for now (we'll fill it in when the code is actually
run).
@@ -182,8 +182,8 @@ program with exit code 0 (successful).
- `48 01 d8` `add rax, rbx`
This here looks at the two bytes we read in (we'll call them `b1` and `b2`) and
-computes `b1 * 128 + b2` (more specifically `(b1 << 7) + b2`). This is the index
-in our command table corresponding to the two characters from the input file.
+computes `b1 * 128 + b2` (more specifically `(b1 << 7) + b2`). This is the corresponding index
+in our command table.
- `48 c1 e0 03` `shl rax, 3`
- `48 89 c3` `mov rbx, rax`
@@ -211,7 +211,7 @@ is `03 48 89 c3`. We set the length to 0 for unused entries.
So this code checks if the entry for this command starts with a zero byte. If it
does, that means the two characters we read in don't actually correspond to a
real command. If that's the case, this next bit of code is executed (otherwise
-it's skiped over):
+it's skipped over):
- `48 b8 02 00 00 00 00 00 00 00` `mov rax, 2 (stderr)`
- `48 89 c7` `mov rdi, rax`
@@ -228,7 +228,7 @@ it's skiped over):
- `00 00 00 00 00 00 00 00 00 00 00 00 00 00` (unused)
This prints our error message, now filled in with the specific unrecognized
-instruction, to standard error, and exits with code 1, to indicate failure.
+instruction, to standard error, then exits with code 1, to indicate failure.
- `48 89 eb` `mov rbx, rax`
- `31 c0` `mov rax, 0`
@@ -273,7 +273,7 @@ all the way back to read the next command. Otherwise, we keep looping. This
skips over any comments/whitespace we might have between a command and the
following command.
-And that's all the *code* for this compiler. Next comes some data.
+And that's all the *code* for this compiler. Next comes the command table.
First, there's a whole bunch of unused 0s. Then there's the line
@@ -293,7 +293,7 @@ Which is the encoding of the `syscall` instruction.
You can look through the rest of the table, if you want. But let's look at the
very end:
-<code><pre>
+```
78
7f 45 4c 46
02
@@ -321,7 +321,7 @@ very end:
00 00 08 00 00 00 00 00
00 00 08 00 00 00 00 00
00 10 00 00 00 00 00 00
-</code></pre>
+```
This is at the position for `||`, and it contains an ELF header. One thing you
might notice is that we decided that each entry is 8 bytes long, but this one is
@@ -340,5 +340,5 @@ fixed this, but frankly I've had enough of writing code in hexadecimal. So let's
move on to [stage 02](../02/README.md),
now that we have a nicer language on our hands. From now
on, since we have comments, I'm gonna do most of the explaining in the source file
-itself, rather than the README. But there'll still be a bit of stuff there each
+itself, rather than the README. But there'll still be some stuff there each
time.
diff --git a/01/commands.txt b/01/commands.txt
index 812b026..9cfdb1e 100644
--- a/01/commands.txt
+++ b/01/commands.txt
@@ -7,11 +7,12 @@ ff - Byte ff
'a - Character a (byte 0x61)
'! - Character ! (byte 0x21)
etc.
+\n - Newline (byte 0x0a)
zA - Zero rax
im - Set rax to an immediate value, e.g.
- im;05;00;00;00;00;00;00;00;
- will set rax to 5.
+ im;05;00;00;00;00;00;00;00;
+ will set rax to 5.
ax bx cx dx sp bp si di
A B C D S R I J
diff --git a/02/Makefile b/02/Makefile
index 23fc8a1..17b0b0e 100644
--- a/02/Makefile
+++ b/02/Makefile
@@ -1,7 +1,9 @@
all: out01 out02 README.html
out01: in01
../01/out00
-out02: out01
+out02: out01 in02
./out01
%.html: %.md ../markdown
../markdown $<
+clean:
+ rm -f out01 out02 README.html
diff --git a/02/README.md b/02/README.md
index ec390ff..a114413 100644
--- a/02/README.md
+++ b/02/README.md
@@ -1,13 +1,15 @@
# stage 02
The compiler for this stage is in the file `in01`, an input for our previous compiler.
-The specifics of how this compiler works are in the comments in that file, but here I'll
+So if you run `../01/out00`, you'll get the file `out01`, which is
+this stage's compiler.
+The specifics of how this compiler works are in the comments in `in01`, but here I'll
give an overview.
Let's take a look at `in02`, an example input file for this compiler:
```
jm
:-co jump to code
-::hw
+::hw start of hello world
'H
'e
'l
@@ -23,11 +25,12 @@ jm
'!
\n
::he end of hello world
+
+
+
::co start of code
-//
-// now we'll calculate the length of the hello world string
+// calculate the length of the hello world string
// by subtracting hw from he.
-//
im
--he
BA
@@ -36,7 +39,7 @@ im
nA
+B
DA put length in rdx
-// okay now we can write it
+// okay now write it
im
##1.
JA set rdi to 1 (stdout)
@@ -54,56 +57,123 @@ im
sy
```
-You can try adding more characters to the hello world message, and it'll just work;
-the length of the text is computed automatically!
+We can compile it by running `./out01`. This will produce
+the executable `out02`, which you can run. It prints
+`Hello, world!`.
-This time, commands are separated by newlines instead of semicolons.
-Each line begins with a 2-character command identifier. There are some special identifiers though:
+In this language,
+commands are separated by newlines instead of semicolons.
+Each line begins with a 2-character command.
+All of the commands from the previous compiler are here,
+plus six new ones:
- `::` marks a *label*
- `--` outputs a label's (absolute) address
- `:-` outputs a label's relative address
- `##` outputs a number
-
-All other commands work like they did in the previous compiler—if you scroll down in the
-`in01` source file, you'll see the full command table.
+- `//` is for comments
+- `\n\n` does nothing (used for spacing)
## labels
Labels are the most important new feature of this language.
+A line like
+```
+::xy
+```
+associates the name `xy` with the address of the next byte of the program.
+In the example program, `hw` is associated with `0x40007d`,
+which is the virtual memory address of the `Hello, world!` data.
+We can then use
+```
+--xy
+```
+to output that address, and
+```
+:-xy
+```
+to output it relative to the current address.
+So now instead of computing how far to jump, we can just jump to a label, e.g.
+```
+jm
+:-xy (use the relative address, because jumps are relative in x86-64)
+```
+And instead of figuring out the address of a piece of data, we can just use its label:
+```
+im
+--xy
+// rax now points to the data at the label "::xy"
+```
+
+This also lets us compute the length of the hello world string automatically!
+By taking the address of the end of the string (`he`) and subtracting the
+start (`hw`), we get the length in bytes.
+So you can try adding more characters to the hello world message, and it'll just work.
+
+All labels must be two ASCII characters. The address of each label is stored
+as a 32-bit number in the "label table". This is sort of like the command table—the
+index of the label `xy` is `128 * x + y`. Specifically, the entry for `xy` is at
+`0x420000 + 4 * (128 * x + y)`, since the label table starts at `0x420000`
+and each entry is 4 bytes.
+When we encounter `::xy`, we get the current position in the output file
+(using `lseek`), add the address of the start of the file (`0x400000`),
+and store that in the label table.
+When we encounter `:-xy` or `--xy`, we look up `xy` in the label table,
+and write the address (subtracting the current address for `:-`) to the output file.
## two passes?
+This compiler actually needs to read through the source code,
+and output an executable, twice.
+This is because a label may be defined *after* it is used, e.g.:
+```
+jm
+:-aa jump forward
+...
+::aa this is where we're jumping to
+...
+```
+In the first pass, the `:-aa` will
+treat `aa` as having an address of 0. Then when
+we get to `::aa`, the address in the label table will be corrected.
+At the end of the first pass, we seek back to the start
+of the input and output files,
+and run the exact same code for the second pass.
+But this time, the correct address of `aa` is used, namely the
+one we calculated in the first pass.
+
+
## other features
Now instead of writing out each of the 8 bytes making up a number,
-we can just write it in hexadecimal (e.g. `##3c.` for `3c 00 00 00 00 00 00 00`),
-and the compiler will automatically
-extend it to 8 bytes.
+we can just write it in hexadecimal, e.g. `##1c4.` for `c4 01 00 00 00 00 00 00`.
This is especially nice because we don't need to write numbers backwards
for little-endianness anymore!
-Numbers cannot appear at the end of a line (this was
-to make the compiler simpler to write), so I'm adding a `.` at the end of
+Numbers cannot appear at the end of a line (this made
+the compiler simpler to write), so I'm adding a `.` at the end of
each one to avoid making that mistake.
Anything after a command is treated as a comment;
additionally `//` can be used for comments on their own lines.
-I decided to implement them as simply as possible:
+I decided to implement this as simply as possible:
I just added the command `//` to the command table, which outputs the byte `0x90`—this
-means "do nothing" (`nop`) in x86-64.
-Note that this means that the following code will not work as expected:
+means ["do nothing"](https://en.wikipedia.org/wiki/No-op)
+in x86-64.
+Note that the following code will not work as expected:
```
im
// load the value 0x333 into rax
##333.
```
-since `0x90` gets inserted between the "load immediate" instruction code, and the immediate.
+since `0x90` gets inserted between the "load immediate" instruction code and the immediate.
+`\n\n` works identically, and lets us space out code a bit. But be careful:
+the number of blank lines must be a multiple of 3!
## limitations
Many of the limitations of our previous compilers apply to this one. Also,
if you use a label without defining it, it uses address 0, rather than outputting
-an error message. This could be fixed: if the value in the label table is 0, and if we are
+an error message. This could be fixed: if the value in the label table is 0 and we are
on the second pass, output an error message. This compiler was already tedious enough
to implement, though!
But thanks to labels, for future compilers at least we won't have to calculate
diff --git a/02/in01 b/02/in01
index 1615667..f72459c 100644
--- a/02/in01
+++ b/02/in01
@@ -3,7 +3,7 @@
;'i;'n;'0;'2;00 (0x40007d) input filename
;'o;'u;'t;'0;'2;00 (0x400082) output filename
;00;00;' ;'n;'o;'t;' ;'r;'e;'c;'o;'g;'n;'i;'z;'e;'d;\n;00;00;00;00;00;00 (0x400088) error message/where we read to
-;00 (0x4000a0) stores which pass we're on (1 for second pass)
+;00 (0x4000a0) stores which pass we're on (0 for first pass, 1 for second pass)
;00;00;00;00;00;00;00
;00;00;00;00;00;00;00;00 (0x4000a8) used for output
unused padding
@@ -180,11 +180,11 @@ okay it's 0-9
;+B
;BA
-okay we now have a digit in RBX
+okay we now have a digit in rbx
;AR
;<I;04
;+B
-;RA store away in RBP
+;RA store away in rbp
;jm;38;ff;ff;ff continue loop
unused padding
@@ -195,7 +195,7 @@ unused padding
;00;00;00;00;00;00;00;00;00;00;00;00;00;00;00;00
;00;00;00;00;00;00;00;00;00;00;00;00;00;00;00;00
-okay we have a full number in RBP, time to write it to the file
+okay we have a full number in rbp, time to write it to the file.
start by putting it at address 0x4000a8
;im;a8;00;40;00;00;00;00;00
;BA
@@ -210,7 +210,7 @@ now write
;IA
;im;08;00;00;00;00;00;00;00 write 8 bytes
;DA
-;im;01;00;00;00;00;00;00;00 write
+;im;01;00;00;00;00;00;00;00 write
;sy
;jm;c3;03;00;00 skip to newline
@@ -327,11 +327,11 @@ subtract current address
;nA;+B
;RA store relative address in rbp
-now we want to write eax to the output file.
+now we want to write ebp to the output file.
start by putting it at address 0x4000a8
;im;a8;00;40;00;00;00;00;00
;BA
-;AR put relative address in rax
+;AR
;sd
now write
@@ -341,7 +341,7 @@ now write
;IA
;im;04;00;00;00;00;00;00;00 4 bytes
;DA
-;im;01;00;00;00;00;00;00;00 write
+;im;01;00;00;00;00;00;00;00 write
;sy
;jm;66;01;00;00 skip to newline
@@ -368,7 +368,7 @@ it's not a label or a number. let's look it up in the instruction table.
;BA
;RA store away address of command text in rbp
;zA;lb
-;DA number of bytes to write (used for syscall if no error)
+;DA number of bytes to write (used for syscall if command exists)
;BA
;zA
;cm;jn;54;00;00;00 check if # of bytes is 0, if not, skip outputting error
@@ -392,7 +392,7 @@ this is a real command
;im;01;00;00;00;00;00;00;00 add 1 because we don't want to write the length
;+B
;IA address of data to write
-;im;04;00;00;00;00;00;00;00 out file descriptor
+;im;04;00;00;00;00;00;00;00 out file descriptor
;JA
;im;01;00;00;00;00;00;00;00 write
;sy
@@ -1777,7 +1777,7 @@ the formatting changed appropriately.
;00;00;00;00;00;00;00;00
;00;00;00;00;00;00;00;00
;00;00;00;00;00;00;00;00
-;00;00;00;00;00;00;00;00
+;01;90;00;00;00;00;00;00 \n\n
;00;00;00;00;00;00;00;00
;00;00;00;00;00;00;00;00
;00;00;00;00;00;00;00;00
@@ -6550,7 +6550,7 @@ the formatting changed appropriately.
;00;00;00;00;00;00;00;00
;00;00;00;00;00;00;00;00
;00;00;00;00;00;00;00;00
-;01;90;00;00;00;00;00;00
+;01;90;00;00;00;00;00;00 // comments
;00;00;00;00;00;00;00;00
;00;00;00;00;00;00;00;00
;00;00;00;00;00;00;00;00
diff --git a/02/in02 b/02/in02
index 8355546..987d32e 100644
--- a/02/in02
+++ b/02/in02
@@ -1,6 +1,6 @@
jm
:-co jump to code
-::hw
+::hw start of hello world
'H
'e
'l
@@ -16,11 +16,12 @@ jm
'!
\n
::he end of hello world
+
+
+
::co start of code
-//
-// now we'll calculate the length of the hello world string
+// calculate the length of the hello world string
// by subtracting hw from he.
-//
im
--he
BA
@@ -29,7 +30,7 @@ im
nA
+B
DA put length in rdx
-// okay now we can write it
+// okay now write it
im
##1.
JA set rdi to 1 (stdout)
diff --git a/Makefile b/Makefile
index bd52445..07b4955 100644
--- a/Makefile
+++ b/Makefile
@@ -2,6 +2,12 @@ all: markdown README.html
$(MAKE) -C 00
$(MAKE) -C 01
$(MAKE) -C 02
+clean:
+ $(MAKE) -C 00 clean
+ $(MAKE) -C 01 clean
+ $(MAKE) -C 02 clean
+ rm -f markdown
+ rm -f README.html
markdown: markdown.c
$(CC) -O2 -o markdown -Wall -Wconversion -Wshadow -std=c89 markdown.c
README.html: markdown README.md
diff --git a/README.md b/README.md
index 2c4d34e..9a97c8a 100644
--- a/README.md
+++ b/README.md
@@ -17,7 +17,14 @@ Note that the executables produced in this series will only run on
64-bit Linux, because each OS/architecture combination would need its own separate
executable.
-The README for the first stage is [here](00/README.md).
+## table of contents
+
+- [stage 00](00/README.md) - a program converting a text file with
+hexadecimal digit pairs to a binary file.
+- [stage 01](01/README.md) - a language with comments, and 2-character
+command codes.
+- [stage 02](02/README.md) - a language with labels
+- more coming soon (hopefully)
## prerequisite knowledge
@@ -44,8 +51,7 @@ decimal.
- ASCII, null-terminated strings
- how pointers work
- how floating-point numbers work
-- maybe some basic Intel-style x86-64 assembly (you can probably pick it up on
-the way though)
+- some basic Intel-style x86-64 assembly
It will help you a lot to know how to program (with any programming language),
but it's not strictly necessary.
@@ -53,12 +59,11 @@ but it's not strictly necessary.
## instruction set
x86-64 has a *gigantic* instruction set. The manual for it is over 2,000 pages
-long! So, it makes sense to select only a small subset of it to use for all the
-stages of our compiler. The set I've chosen can be found in `instructions.txt`.
+long! So it makes sense to select only a small subset of it to use.
+The set I've chosen can be found in `instructions.txt`.
I think it achieves a pretty good balance between having few enough
instructions to be manageable and having enough instructions to be useable.
-To be clear, you don't need to read that file to understand the series, at least
-not right away.
+To be clear, you don't need to read that file to understand the series.
## principles
@@ -91,15 +96,15 @@ project can't necessarily even do that though, because the Linux kernel, which
we depend on, is compiled from C, so we can't fully trust *it*. To *truly*
create a fully trustable compiler, you'd need to manually write to a USB with a
circuit, create an operating system from nothing (without even a text editor),
-and then follow this series, or maybe you don't even trust your CPU vendor...
-I'll leave that to someone else
+and then follow this series, or maybe you don't even trust your CPU...
+I'll leave that to someone else.
## license
```
This project is in the public domain. Any copyright protections from any law
-for this project are forfeited by the author(s). No warranty is provided for
-this project, and the author(s) shall not be held liable in connection with it.
+are forfeited by the author(s). No warranty is provided, and the author(s)
+shall not be held liable in connection with it.
```
## contributing
diff --git a/instructions.txt b/instructions.txt
index 8553c1d..88c2a82 100644
--- a/instructions.txt
+++ b/instructions.txt
@@ -101,3 +101,4 @@ syscall
>0f 05
nop
>90
+(more will be added as needed)
diff --git a/markdown.c b/markdown.c
index f57e40f..c366904 100644
--- a/markdown.c
+++ b/markdown.c
@@ -58,7 +58,8 @@ static void output_md_text(FILE *out, int *flags, int line_number, const char *t
case '[': {
/* link */
char url2[256] = {0};
- const char *label, *url, *label_end, *url_end, *dot;
+ const char *label, *url, *label_end, *url_end;
+ char *dot;
int n_label, n_url;
label = p+1;
@@ -88,7 +89,7 @@ static void output_md_text(FILE *out, int *flags, int line_number, const char *t
/* replace links to md files with links to html files */
strcpy(dot, ".html");
}
- fprintf(out, "<a href=\"%s\" target=\"_blank\">%.*s</a>",
+ fprintf(out, "<a href=\"%s\">%.*s</a>",
url2, n_label, label);
p = url_end;
} break;