summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorpommicket <pommicket@gmail.com>2022-02-19 19:43:13 -0800
committerpommicket <pommicket@gmail.com>2022-02-19 19:43:13 -0800
commita8c884e6cd294942f1392908284808860e49fe54 (patch)
tree28a717fc60dd115d0884a90eae371aaf01d79f39
parent54a191a117bb3e4c217ff3fb06e6278b362f6309 (diff)
finish 05
-rw-r--r--05/README.md103
-rw-r--r--05/diffs.txt647
-rw-r--r--05/tcc-0.9.27/config.h5
-rw-r--r--Makefile2
-rw-r--r--README.md16
-rwxr-xr-xbootstrap.sh9
6 files changed, 747 insertions, 35 deletions
diff --git a/05/README.md b/05/README.md
index 85c1dd9..2347661 100644
--- a/05/README.md
+++ b/05/README.md
@@ -9,7 +9,7 @@ $ make
```
to build our C compiler and TCC. This will take some time (approx. 25 seconds on my computer).
-This also compiles a "Hello, world!" with our compiler, `a.out`.
+This also compiles a "Hello, world!" executable, `a.out`, with our compiler.
We can now compile TCC with itself. But first, you'll need to install the header files and library files
which are needed to compile (almost) any program with TCC:
@@ -20,8 +20,8 @@ $ sudo make install-tcc0
The files will be installed to `/usr/local/lib/tcc-bootstrap`. If you want to change this, make sure to change
both the `TCCINST` variable in the makefile, and the `CONFIG_TCCDIR` macro in `config.h`.
-Anyways, once this installation is done, you should be able to compile any C program with `tcc-0.9.27/tcc0`!
-We can even compile TCC with itself:
+Anyways, once this installation is done, you should be able to compile any C program with `tcc-0.9.27/tcc0`,
+including TCC itself:
```
$ cd tcc-0.9.27
@@ -44,7 +44,7 @@ $ diff tcc1 tcc1a
Binary files tcc1 and tcc1a differ
```
-!!! Is there some malicious code hiding in the difference between these two files? Well, unfortunately (or fortunately, rather) the
+!!! Is there some malicious code hiding in the difference between these two files? Well unfortunately (fortunately, really) the
truth is more boring than that:
```
@@ -267,48 +267,67 @@ our header files use, and then we define each of the necessary C standard librar
## limitations
There are various minor ways in which this compiler doesn't actually handle all of C89.
-Here is a list of things we do wrong (this list is probably missing things, though):
+Here is a (probably incomplete) list of things we do wrong:
- [trigraphs](https://en.wikipedia.org/wiki/Digraphs_and_trigraphs#C) are not handled
- `char[]` string literal initializers can't contain null characters (e.g. `char x[] = "a\0b";` doesn't work)
-- you can only access members of l-values (e.g. `int x = function_which_returns_struct().member` doesn't work)
-- no default-int (this is a legacy feature of C, e.g. `main() { }` can technically stand in for `int main() {}`)
+- you can only access members of l-values (e.g. `int x = function_which_returns_struct().member;` doesn't work)
+- no default-int (this is a legacy feature of C, e.g. `main() {}` can technically stand in for `int main() {}`)
- the keyword `auto` is not handled (again, a legacy feature of C)
-- `default:` must be the last label in a switch statement.
-- external variable declarations are ignored (e.g. `extern int x; int main() { return x; } int x = 5; ` doesn't work)
-- `typedef`s, and `struct`/`union`/`enum` declarations aren't allowed inside functions
+- `default:` must come after all `case` labels in a switch statement.
+- external variable declarations are ignored, and global variables can only be declared once
+(e.g. `extern int x; int main() { return x; } int x = 5; ` doesn't work)
+- `typedef`s, and `struct`/`union`/`enum` definitions aren't allowed inside functions
- conditional expressions aren't allowed inside `case` (horribly, `switch (x) { case 5 ? 6 : 3: ; }` is legal C).
- bit-fields aren't handled
- Technically, `1[array]` is equivalent to `array[1]`, but we don't handle that.
- C89 has *very* weird typing rules about `void*`/`non-void*` inside conditional expressions. We don't handle that properly.
- C89 allows calling functions without declaring them, for legacy reasons. We don't handle that.
- Floating-point constant expressions are very limited. Only `double` literals and 0 are supported.
-- Floating-point literals can't have their integer part greater than 2<sup>64</sup>-1.
+- In floating-point literals, the numbers before and after the decimal point must be less than 2<sup>64</sup>.
+- The only "address constants" we allow are string literals, e.g. `int y, x = &y;` is not allowed as a global declaration.
- Redefining a macro is always an error, even if it's the same definition.
- You can't have a variable/function/etc. called `defined`.
- Various little things about when macros are evaluated in some contexts.
-- The horrible, horrible, function `setjmp`, which surely no one uses is not properly supported.
+- The horrible, horrible function `setjmp`, which surely no one uses, is not properly supported.
Oh wait, TCC uses it. Fortunately it's not critically important to TCC.
-- `wchar_t` and wide character string literals are not supported.
+- Wide characters and wide character strings are not supported.
- The `localtime()` function assumes you are in the UTC+0 timezone.
- `mktime()` always fails.
-
-Also, the keywords `signed`, `volatile`, `register`, and `const` are all ignored. This shouldn't have an effect
-on any legal C program, though.
+- The keywords `signed`, `volatile`, `register`, and `const` are all ignored, but this should almost never
+have an effect on a legal C program.
## anecdotes
Making this C compiler took over a month. Here are some interesting things
which happened along the way:
-- A very difficult part of this compiler was parsing floating-point numbers in a language which
-doesn't have floats. Originally, there was a bug where negative powers of 2 were
+- Writing code to parse floating-point numbers in a language which
+doesn't have floats turned out to be quite a fun challenge!
+Not all decimal numbers have a perfect floating point representation. You could
+round 0.1 up to ~0.1000000000000000056, or down to ~0.0999999999999999917.
+This stage's C compiler should be entirely correct, up to rounding (which is all that the
+C standard requires).
+But typically C compilers
+will round to whichever is closest to the decimal value. Implementing this correctly
+is a lot harder than you might expect. For example,
+```
+0.09999999999999999861222121921855432447046041488647460937499
+rounds down, but
+0.09999999999999999861222121921855432447046041488647460937501
+rounds up.
+```
+Good luck writing a function which handles that!
+- Originally, there was a bug where negative powers of 2 were
being interpreted as half of their actual value, e.g. `x = 0.25;` would set `x` to
`0.125`, but `x = 4;`, `x = 0.3;`, etc. would all work just fine.
+- Writing the functions in `math.h`, although probably not necessary for compiling TCC,
+was fun! There are quite a few interesting optimizations you can make, and little
+tricks for avoiding losses in floating-point accuracy.
- The <s>first</s> second non-trivial program I successfully compiled worked perfectly the first time I ran it!
- A very difficult to track down bug happened the first time I ran `tcc`: there was a declaration along
the lines of `char x[] = "a\0b\0c";` but it got compiled as `char x[] = "a";`!
-- Originally, I was just treating labels as statements, but `tcc` actually has code like:
+- Originally, I was just treating labels the same as any other statements, but `tcc` actually has code like:
```
...
goto lbl;
@@ -318,7 +337,7 @@ if (some_condition)
```
so the `do_something();` was not being considered as part of the `if` statement.
- The first time I compiled tcc with itself (and then with itself again), I actually got a different
-executable. After spending a long time looking at disassemblies, I found the culprit:
+executable from the GCC one. After spending a long time looking at disassemblies, I found the culprit:
```
# if defined(__linux__)
tcc_define_symbol(s, "__linux__", NULL);
@@ -332,6 +351,43 @@ with itself!
## modifications of tcc's source code
+Some modifications were needed to bring tcc's source code in line with what our compiler expects.
+
+You can find a full list of modifications in `diffs.txt`, but I'll provide an overview (and explanation)
+here.
+
+- First, we (and C89) don't allow a comma after the last member in an initializer. In several places,
+the last comma in an initializer/enum definition was removed, or an irrelevant entry was added to the end.
+- Global variables were sometimes declared twice, which we don't support.
+So, a bunch of duplicate declarations were removed.
+- The `# if defined(__linux__)` and `# endif` mentioned above were removed.
+- In a bunch of places, `ELFW(something)` had to be replaced with `ELF64_something` due to
+subtleties of how we evaluate macros.
+- `offsetof(type, member)` isn't considered a constant expression by our compiler, so
+some initializers were replaced by functions called at the top of `main`.
+- In several places, `default:` had to be moved to after every `case` label.
+- In two places, `-some_long_double_expression` had to be replaced with
+a function call to `negate_ld` (a function I wrote for negating long doubles).
+This is because TCC only supports negating long doubles if
+the compiler used to compile it has an 80-bit long double type, which our compiler doesn't.
+- `\0` was replaced with `\n` as a separator for keyword names.
+- Forced TCC to use `R_X86_64_PC32` relocations, because its `plt` code doesn't seem to work for static
+executables.
+- Lastly, there's the `config.h` file, which is normally produced by TCC's `configure` script,
+but it's easy to write one manually:
+```
+#define TCC_VERSION "0.9.27"
+#define CONFIG_TCC_STATIC 1
+#define TCC_TARGET_X86_64 1
+#define ONE_SOURCE 1
+#define CONFIG_LDDIR "lib/x86_64-linux-gnu"
+#define CONFIG_TCCDIR "/usr/local/lib/tcc-bootstrap"
+#define inline
+```
+The last line causes the `inline` keyword (added in C99) to be ignored.
+
+Fewer changes would've been needed for an older version of TCC, but older versions didn't support
+x86-64 assembly, which might end up being relevant...
## \*the nightmare begins
@@ -345,12 +401,13 @@ if there a security bug were found in `printf`, it would be much easier to repla
every program which uses `printf`.
Now this library file is itself compiled from C source files (typically glibc).
-So, we *can't* really say that the self-compiled TCC was built from scratch. And there could be malicious
+So, we can't really say that the self-compiled TCC was built from scratch. And there could be malicious
self-replicating code in glibc!
So, why not just compile glibc with TCC?
-Well, it's not actually possible. glibc can pretty much only be compiled with GCC. And we can't compile GCC
-without a libc. Hmm...
+Well, it's not actually possible. glibc can pretty much only be compiled with GCC.
+This stage's C compiler definitely can't compile GCC, so we'll need a libc implementation to
+compile GCC. Hmm...
Other libc implementations don't seem to like TCC either, so it seems that the only option left is to
make a new libc implementation, use that to compile GCC (probably an old version of it which TCC can compile),
diff --git a/05/diffs.txt b/05/diffs.txt
new file mode 100644
index 0000000..869d9bb
--- /dev/null
+++ b/05/diffs.txt
@@ -0,0 +1,647 @@
+---- arm-asm.c ----
+---- arm-gen.c ----
+---- arm-link.c ----
+---- arm64-gen.c ----
+---- arm64-link.c ----
+---- c67-gen.c ----
+---- c67-link.c ----
+---- conftest.c ----
+---- i386-asm.c ----
+209c209
+< 0x0f, /* g */
+---
+> 0x0f /* g */
+238c238
+< { 0, },
+---
+> { 0 }
+252a253,254
+> /* last operation */
+> 0
+1576,1578d1577
+< default:
+< reg = TOK_ASM_eax + reg;
+< break;
+1583a1583,1585
+> default:
+> reg = TOK_ASM_eax + reg;
+> break;
+---- i386-gen.c ----
+---- i386-link.c ----
+---- il-gen.c ----
+---- libtcc.c ----
+27c27
+< ST_DATA int gnu_ext = 1;
+---
+> //ST_DATA int gnu_ext = 1;
+30c30
+< ST_DATA int tcc_ext = 1;
+---
+> //ST_DATA int tcc_ext = 1;
+33c33
+< ST_DATA struct TCCState *tcc_state;
+---
+> //ST_DATA struct TCCState *tcc_state;
+820c820
+< # if defined(__linux__)
+---
+> //# if defined(__linux__)
+823c823
+< # endif
+---
+> //# endif
+1177c1177
+< ELFW(ST_INFO)(STB_GLOBAL, STT_NOTYPE), 0,
+---
+> ELF64_ST_INFO(STB_GLOBAL, STT_NOTYPE), 0,
+1552c1552
+< { NULL, 0, 0 },
+---
+> { NULL, 0, 0 }
+1555c1555
+< static const FlagDef options_W[] = {
+---
+> static FlagDef options_W[] = {
+1557,1562c1557,1561
+< { offsetof(TCCState, warn_unsupported), 0, "unsupported" },
+< { offsetof(TCCState, warn_write_strings), 0, "write-strings" },
+< { offsetof(TCCState, warn_error), 0, "error" },
+< { offsetof(TCCState, warn_gcc_compat), 0, "gcc-compat" },
+< { offsetof(TCCState, warn_implicit_function_declaration), WD_ALL,
+< "implicit-function-declaration" },
+---
+> { 0, 0, "unsupported" },
+> { 0, 0, "write-strings" },
+> { 0, 0, "error" },
+> { 0, 0, "gcc-compat" },
+> { 0, WD_ALL, "implicit-function-declaration" },
+1566,1572c1565,1571
+< static const FlagDef options_f[] = {
+< { offsetof(TCCState, char_is_unsigned), 0, "unsigned-char" },
+< { offsetof(TCCState, char_is_unsigned), FD_INVERT, "signed-char" },
+< { offsetof(TCCState, nocommon), FD_INVERT, "common" },
+< { offsetof(TCCState, leading_underscore), 0, "leading-underscore" },
+< { offsetof(TCCState, ms_extensions), 0, "ms-extensions" },
+< { offsetof(TCCState, dollars_in_identifiers), 0, "dollars-in-identifiers" },
+---
+> static FlagDef options_f[] = {
+> { 0, 0, "unsigned-char" },
+> { 0, FD_INVERT, "signed-char" },
+> { 0, FD_INVERT, "common" },
+> { 0, 0, "leading-underscore" },
+> { 0, 0, "ms-extensions" },
+> { 0, 0, "dollars-in-identifiers" },
+1576,1577c1575,1576
+< static const FlagDef options_m[] = {
+< { offsetof(TCCState, ms_bitfields), 0, "ms-bitfields" },
+---
+> static FlagDef options_m[] = {
+> { 0, 0, "ms-bitfields" },
+1579c1578
+< { offsetof(TCCState, nosse), FD_INVERT, "sse" },
+---
+> { 0, FD_INVERT, "sse" },
+1582a1582,1599
+>
+> void _init_options(void) {
+> options_W[1].offset = offsetof(TCCState, warn_unsupported);
+> options_W[2].offset = offsetof(TCCState, warn_write_strings);
+> options_W[3].offset = offsetof(TCCState, warn_error);
+> options_W[4].offset = offsetof(TCCState, warn_gcc_compat);
+> options_W[5].offset = offsetof(TCCState, warn_implicit_function_declaration);
+> options_f[0].offset = offsetof(TCCState, char_is_unsigned);
+> options_f[1].offset = offsetof(TCCState, char_is_unsigned);
+> options_f[2].offset = offsetof(TCCState, nocommon);
+> options_f[3].offset = offsetof(TCCState, leading_underscore);
+> options_f[4].offset = offsetof(TCCState, ms_extensions);
+> options_f[5].offset = offsetof(TCCState, dollars_in_identifiers);
+> options_m[0].offset = offsetof(TCCState, ms_bitfields);
+> #ifdef TCC_TARGET_X86_64
+> options_m[1].offset = offsetof(TCCState, nosse);
+> #endif
+> }
+---- tcc.c ----
+239c239
+< #else
+---
+> #elif 0
+242a243,244
+> #else
+> return 0;
+254c256
+<
+---
+> _init_options();
+---- tccasm.c ----
+222d221
+< default:
+223a223
+> default:
+251d250
+< default:
+252a252
+> default:
+---- tcccoff.c ----
+---- tccelf.c ----
+28a29
+> #if 0
+43a45
+> #endif
+171,172c173,174
+< && ELFW(ST_BIND)(sym->st_info) == STB_LOCAL)
+< sym->st_info = ELFW(ST_INFO)(STB_GLOBAL, ELFW(ST_TYPE)(sym->st_info));
+---
+> && ELF64_ST_BIND(sym->st_info) == STB_LOCAL)
+> sym->st_info = ELF64_ST_INFO(STB_GLOBAL, ELF64_ST_TYPE(sym->st_info));
+183c185
+< int n = ELFW(R_SYM)(rel->r_info) - first_sym;
+---
+> int n = ELF64_R_SYM(rel->r_info) - first_sym;
+185c187
+< rel->r_info = ELFW(R_INFO)(tr[n], ELFW(R_TYPE)(rel->r_info));
+---
+> rel->r_info = ELF64_R_INFO(tr[n], ELF64_R_TYPE(rel->r_info));
+375c377
+< if (ELFW(ST_BIND)(sym->st_info) != STB_LOCAL) {
+---
+> if (ELF64_ST_BIND(sym->st_info) != STB_LOCAL) {
+415c417
+< if (ELFW(ST_BIND)(info) != STB_LOCAL) {
+---
+> if (ELF64_ST_BIND(info) != STB_LOCAL) {
+497,499c499,501
+< sym_bind = ELFW(ST_BIND)(info);
+< sym_type = ELFW(ST_TYPE)(info);
+< sym_vis = ELFW(ST_VISIBILITY)(other);
+---
+> sym_bind = ELF64_ST_BIND(info);
+> sym_type = ELF64_ST_TYPE(info);
+> sym_vis = ELF64_ST_VISIBILITY(other);
+511c513
+< esym_bind = ELFW(ST_BIND)(esym->st_info);
+---
+> esym_bind = ELF64_ST_BIND(esym->st_info);
+514c516
+< esym_vis = ELFW(ST_VISIBILITY)(esym->st_other);
+---
+> esym_vis = ELF64_ST_VISIBILITY(esym->st_other);
+522c524
+< esym->st_other = (esym->st_other & ~ELFW(ST_VISIBILITY)(-1))
+---
+> esym->st_other = (esym->st_other & ~ELF64_ST_VISIBILITY(-1))
+560c562
+< esym->st_info = ELFW(ST_INFO)(sym_bind, sym_type);
+---
+> esym->st_info = ELF64_ST_INFO(sym_bind, sym_type);
+570c572
+< ELFW(ST_INFO)(sym_bind, sym_type), other,
+---
+> ELF64_ST_INFO(sym_bind, sym_type), other,
+598c600
+< rel->r_info = ELFW(R_INFO)(symbol, type);
+---
+> rel->r_info = ELF64_R_INFO(symbol, type);
+737c739
+< if (ELFW(ST_BIND)(p->st_info) == STB_LOCAL) {
+---
+> if (ELF64_ST_BIND(p->st_info) == STB_LOCAL) {
+750c752
+< if (ELFW(ST_BIND)(p->st_info) != STB_LOCAL) {
+---
+> if (ELF64_ST_BIND(p->st_info) != STB_LOCAL) {
+766,767c768,769
+< sym_index = ELFW(R_SYM)(rel->r_info);
+< type = ELFW(R_TYPE)(rel->r_info);
+---
+> sym_index = ELF64_R_SYM(rel->r_info);
+> type = ELF64_R_TYPE(rel->r_info);
+769c771
+< rel->r_info = ELFW(R_INFO)(sym_index, type);
+---
+> rel->r_info = ELF64_R_INFO(sym_index, type);
+810c812
+< sym_bind = ELFW(ST_BIND)(sym->st_info);
+---
+> sym_bind = ELF64_ST_BIND(sym->st_info);
+838c840
+< sym_index = ELFW(R_SYM)(rel->r_info);
+---
+> sym_index = ELF64_R_SYM(rel->r_info);
+840c842
+< type = ELFW(R_TYPE)(rel->r_info);
+---
+> type = ELF64_R_TYPE(rel->r_info);
+873,874c875,876
+< sym_index = ELFW(R_SYM)(rel->r_info);
+< type = ELFW(R_TYPE)(rel->r_info);
+---
+> sym_index = ELF64_R_SYM(rel->r_info);
+> type = ELF64_R_TYPE(rel->r_info);
+881c883
+< rel->r_info = ELFW(R_INFO)(sym_index, R_386_RELATIVE);
+---
+> rel->r_info = ELF64_R_INFO(sym_index, R_386_RELATIVE);
+916c918
+< set_elf_sym(symtab_section, 0, 4, ELFW(ST_INFO)(STB_GLOBAL, STT_OBJECT),
+---
+> set_elf_sym(symtab_section, 0, 4, ELF64_ST_INFO(STB_GLOBAL, STT_OBJECT),
+963c965
+< if (ELFW(ST_BIND)(sym->st_info) == STB_LOCAL) {
+---
+> if (ELF64_ST_BIND(sym->st_info) == STB_LOCAL) {
+1008c1010
+< ELFW(ST_INFO)(STB_GLOBAL, STT_FUNC), 0, s1->plt->sh_num, plt_name);
+---
+> ELF64_ST_INFO(STB_GLOBAL, STT_FUNC), 0, s1->plt->sh_num, plt_name);
+1034c1036
+< type = ELFW(R_TYPE)(rel->r_info);
+---
+> type = ELF64_R_TYPE(rel->r_info);
+1036c1038
+< sym_index = ELFW(R_SYM)(rel->r_info);
+---
+> sym_index = ELF64_R_SYM(rel->r_info);
+1068,1070c1070,1072
+< && (ELFW(ST_TYPE)(esym->st_info) == STT_FUNC
+< || (ELFW(ST_TYPE)(esym->st_info) == STT_NOTYPE
+< && ELFW(ST_TYPE)(sym->st_info) == STT_FUNC)))
+---
+> && (ELF64_ST_TYPE(esym->st_info) == STT_FUNC
+> || (ELF64_ST_TYPE(esym->st_info) == STT_NOTYPE
+> && ELF64_ST_TYPE(sym->st_info) == STT_FUNC)))
+1083,1085c1085,1087
+< (ELFW(ST_VISIBILITY)(sym->st_other) != STV_DEFAULT ||
+< ELFW(ST_BIND)(sym->st_info) == STB_LOCAL)) {
+< rel->r_info = ELFW(R_INFO)(sym_index, R_X86_64_PC32);
+---
+> (ELF64_ST_VISIBILITY(sym->st_other) != STV_DEFAULT ||
+> ELF64_ST_BIND(sym->st_info) == STB_LOCAL)) {
+> rel->r_info = ELF64_R_INFO(sym_index, R_X86_64_PC32);
+1105c1107
+< rel->r_info = ELFW(R_INFO)(attr->plt_sym, type);
+---
+> rel->r_info = ELF64_R_INFO(attr->plt_sym, type);
+1140c1142
+< ELFW(ST_INFO)(STB_GLOBAL, STT_NOTYPE), 0,
+---
+> ELF64_ST_INFO(STB_GLOBAL, STT_NOTYPE), 0,
+1144c1146
+< ELFW(ST_INFO)(STB_GLOBAL, STT_NOTYPE), 0,
+---
+> ELF64_ST_INFO(STB_GLOBAL, STT_NOTYPE), 0,
+1168c1170
+< ELFW(ST_INFO)(STB_GLOBAL, STT_NOTYPE), 0,
+---
+> ELF64_ST_INFO(STB_GLOBAL, STT_NOTYPE), 0,
+1172c1174
+< ELFW(ST_INFO)(STB_GLOBAL, STT_NOTYPE), 0,
+---
+> ELF64_ST_INFO(STB_GLOBAL, STT_NOTYPE), 0,
+1221c1223
+< ELFW(ST_INFO)(STB_GLOBAL, STT_NOTYPE), 0,
+---
+> ELF64_ST_INFO(STB_GLOBAL, STT_NOTYPE), 0,
+1225c1227
+< ELFW(ST_INFO)(STB_GLOBAL, STT_NOTYPE), 0,
+---
+> ELF64_ST_INFO(STB_GLOBAL, STT_NOTYPE), 0,
+1229c1231
+< ELFW(ST_INFO)(STB_GLOBAL, STT_NOTYPE), 0,
+---
+> ELF64_ST_INFO(STB_GLOBAL, STT_NOTYPE), 0,
+1260c1262
+< ELFW(ST_INFO)(STB_GLOBAL, STT_NOTYPE), 0,
+---
+> ELF64_ST_INFO(STB_GLOBAL, STT_NOTYPE), 0,
+1265c1267
+< ELFW(ST_INFO)(STB_GLOBAL, STT_NOTYPE), 0,
+---
+> ELF64_ST_INFO(STB_GLOBAL, STT_NOTYPE), 0,
+1314c1316
+< int sym_index = ELFW(R_SYM) (rel->r_info);
+---
+> int sym_index = ELF64_R_SYM (rel->r_info);
+1344c1346
+< switch (ELFW(R_TYPE) (rel->r_info)) {
+---
+> switch (ELF64_R_TYPE (rel->r_info)) {
+1363,1364c1365,1366
+< if (ELFW(R_TYPE)(rel->r_info) == R_RELATIVE) {
+< int sym_index = ELFW(R_SYM) (rel->r_info);
+---
+> if (ELF64_R_TYPE(rel->r_info) == R_RELATIVE) {
+> int sym_index = ELF64_R_SYM (rel->r_info);
+1370c1372
+< rel->r_info = ELFW(R_INFO)(0, R_RELATIVE);
+---
+> rel->r_info = ELF64_R_INFO(0, R_RELATIVE);
+1400c1402
+< type = ELFW(ST_TYPE)(esym->st_info);
+---
+> type = ELF64_ST_TYPE(esym->st_info);
+1411c1413
+< ELFW(ST_INFO)(STB_GLOBAL,STT_FUNC), 0, 0,
+---
+> ELF64_ST_INFO(STB_GLOBAL,STT_FUNC), 0, 0,
+1428c1430
+< if (ELFW(ST_BIND)(esym->st_info) == STB_WEAK) {
+---
+> if (ELF64_ST_BIND(esym->st_info) == STB_WEAK) {
+1431c1433
+< && (ELFW(ST_BIND)(dynsym->st_info) == STB_GLOBAL)) {
+---
+> && (ELF64_ST_BIND(dynsym->st_info) == STB_GLOBAL)) {
+1450c1452
+< if (ELFW(ST_BIND)(sym->st_info) == STB_WEAK ||
+---
+> if (ELF64_ST_BIND(sym->st_info) == STB_WEAK ||
+1456c1458
+< } else if (s1->rdynamic && ELFW(ST_BIND)(sym->st_info) != STB_LOCAL) {
+---
+> } else if (s1->rdynamic && ELF64_ST_BIND(sym->st_info) != STB_LOCAL) {
+1481c1483
+< && ELFW(ST_BIND)(sym->st_info) != STB_LOCAL) {
+---
+> && ELF64_ST_BIND(sym->st_info) != STB_LOCAL) {
+1486c1488
+< if (ELFW(ST_BIND)(esym->st_info) != STB_WEAK)
+---
+> if (ELF64_ST_BIND(esym->st_info) != STB_WEAK)
+1503c1505
+< if (ELFW(ST_BIND)(sym->st_info) != STB_LOCAL) {
+---
+> if (ELF64_ST_BIND(sym->st_info) != STB_LOCAL) {
+1909,1913d1910
+< default:
+< case TCC_OUTPUT_EXE:
+< ehdr.e_type = ET_EXEC;
+< ehdr.e_entry = get_elf_sym_addr(s1, "_start", 1);
+< break;
+1920a1918,1922
+> case TCC_OUTPUT_EXE:
+> default:
+> ehdr.e_type = ET_EXEC;
+> ehdr.e_entry = get_elf_sym_addr(s1, "_start", 1);
+> break;
+2481c2483
+< if (ELFW(ST_BIND)(sym->st_info) != STB_LOCAL) {
+---
+> if (ELF64_ST_BIND(sym->st_info) != STB_LOCAL) {
+2520,2521c2522,2523
+< type = ELFW(R_TYPE)(rel->r_info);
+< sym_index = ELFW(R_SYM)(rel->r_info);
+---
+> type = ELF64_R_TYPE(rel->r_info);
+> sym_index = ELF64_R_SYM(rel->r_info);
+2537c2539
+< rel->r_info = ELFW(R_INFO)(sym_index, type);
+---
+> rel->r_info = ELF64_R_INFO(sym_index, type);
+2766c2768
+< sym_bind = ELFW(ST_BIND)(sym->st_info);
+---
+> sym_bind = ELF64_ST_BIND(sym->st_info);
+---- tccgen.c ----
+24a25,26
+> #define NODATA_WANTED (nocode_wanted > 0) /* no static data output wanted either */
+> #define STATIC_DATA_WANTED (nocode_wanted & 0xC0000000) /* only static data output */
+31c33,39
+< ST_DATA int rsym, anon_sym, ind, loc;
+---
+> static int local_scope;
+> static int in_sizeof;
+> static int section_sym;
+>
+> ST_DATA int vlas_in_scope; /* number of VLAs that are currently in scope */
+> ST_DATA int vla_sp_root_loc; /* vla_sp_loc for SP before any VLAs were pushed */
+> ST_DATA int vla_sp_loc; /* Pointer to variable holding location to store stack pointer on the stack when modifying stack pointer */
+32a41,42
+> #if 0
+> ST_DATA int rsym, anon_sym, ind, loc;
+42,48d51
+< static int local_scope;
+< static int in_sizeof;
+< static int section_sym;
+<
+< ST_DATA int vlas_in_scope; /* number of VLAs that are currently in scope */
+< ST_DATA int vla_sp_root_loc; /* vla_sp_loc for SP before any VLAs were pushed */
+< ST_DATA int vla_sp_loc; /* Pointer to variable holding location to store stack pointer on the stack when modifying stack pointer */
+54,55d56
+< #define NODATA_WANTED (nocode_wanted > 0) /* no static data output wanted either */
+< #define STATIC_DATA_WANTED (nocode_wanted & 0xC0000000) /* only static data output */
+63,64c64,66
+<
+< ST_DATA CType char_pointer_type, func_old_type, int_type, size_type, ptrdiff_type;
+---
+> ST_DATA CType char_pointer_type, func_old_type, int_type, size_type;
+> #endif
+> ST_DATA CType ptrdiff_type;
+161c163
+< ELFW(ST_INFO)(STB_LOCAL, STT_SECTION), 0,
+---
+> ELF64_ST_INFO(STB_LOCAL, STT_SECTION), 0,
+179c181
+< ELFW(ST_INFO)(STB_LOCAL, STT_FILE), 0,
+---
+> ELF64_ST_INFO(STB_LOCAL, STT_FILE), 0,
+302c304
+< esym->st_other = (esym->st_other & ~ELFW(ST_VISIBILITY)(-1))
+---
+> esym->st_other = (esym->st_other & ~ELF64_ST_VISIBILITY(-1))
+311c313
+< old_sym_bind = ELFW(ST_BIND)(esym->st_info);
+---
+> old_sym_bind = ELF64_ST_BIND(esym->st_info);
+313c315
+< esym->st_info = ELFW(ST_INFO)(sym_bind, ELFW(ST_TYPE)(esym->st_info));
+---
+> esym->st_info = ELF64_ST_INFO(sym_bind, ELF64_ST_TYPE(esym->st_info));
+410c412
+< info = ELFW(ST_INFO)(sym_bind, sym_type);
+---
+> info = ELF64_ST_INFO(sym_bind, sym_type);
+1904d1905
+< default: l1 = gen_opic_sdiv(l1, l2); break;
+1907a1909
+> default: l1 = gen_opic_sdiv(l1, l2); break;
+2458a2461,2470
+> static long double negate_ld(long double d) {
+> #if LDBL_MANT_DIG == 64
+> register unsigned long long *p = (unsigned long long *)&d;
+> p[1] ^= 1ul<<15;
+> return *(long double *)p;
+> #else
+> return -d;
+> #endif
+> }
+>
+2500c2512
+< vtop->c.ld = -(long double)-vtop->c.i;
+---
+> vtop->c.ld = negate_ld((long double)-vtop->c.i);
+2505c2517
+< vtop->c.ld = -(long double)-(uint32_t)vtop->c.i;
+---
+> vtop->c.ld = negate_ld((long double)-(uint32_t)vtop->c.i);
+6517,6518c6529,6530
+< ELFW(R_TYPE)(rel->r_info),
+< ELFW(R_SYM)(rel->r_info),
+---
+> ELF64_R_TYPE(rel->r_info),
+> ELF64_R_SYM(rel->r_info),
+---- tccpe.c ----
+---- tccpp.c ----
+25a26
+> #if 0
+39a41
+> #endif
+62c64
+< #define DEF(id, str) str "\0"
+---
+> #define DEF(id, str) str "\n"
+1506c1508
+< if (varg < TOK_IDENT)
+---
+> if (varg < TOK_IDENT) {
+1508a1511
+> }
+1554c1557
+< if (3 == spc)
+---
+> if (3 == spc) {
+1556a1560
+> }
+3671c3675
+< if (c == '\0')
+---
+> if (c == '\n')
+---- tccrun.c ----
+---- tcctools.c ----
+---- x86_64-gen.c ----
+111,141d110
+< ST_DATA const int reg_classes[NB_REGS] = {
+< /* eax */ RC_INT | RC_RAX,
+< /* ecx */ RC_INT | RC_RCX,
+< /* edx */ RC_INT | RC_RDX,
+< 0,
+< 0,
+< 0,
+< 0,
+< 0,
+< RC_R8,
+< RC_R9,
+< RC_R10,
+< RC_R11,
+< 0,
+< 0,
+< 0,
+< 0,
+< /* xmm0 */ RC_FLOAT | RC_XMM0,
+< /* xmm1 */ RC_FLOAT | RC_XMM1,
+< /* xmm2 */ RC_FLOAT | RC_XMM2,
+< /* xmm3 */ RC_FLOAT | RC_XMM3,
+< /* xmm4 */ RC_FLOAT | RC_XMM4,
+< /* xmm5 */ RC_FLOAT | RC_XMM5,
+< /* xmm6 an xmm7 are included so gv() can be used on them,
+< but they are not tagged with RC_FLOAT because they are
+< callee saved on Windows */
+< RC_XMM6,
+< RC_XMM7,
+< /* st0 */ RC_ST0
+< };
+<
+633c602
+< greloca(cur_text_section, vtop->sym, ind + 1, R_X86_64_PLT32, (int)(vtop->c.i-4));
+---
+> greloca(cur_text_section, vtop->sym, ind + 1, R_X86_64_PC32, (int)(vtop->c.i-4)); // tcc's PLT code doesn't seem to work with static builds
+1194a1164,1166
+> enum __va_arg_type {
+> __va_gen_reg, __va_float_reg, __va_stack
+> };
+1198,1200d1169
+< enum __va_arg_type {
+< __va_gen_reg, __va_float_reg, __va_stack
+< };
+1204d1172
+< default: return __va_stack;
+1206a1175
+> default: return __va_stack;
+1244c1213
+< char _onstack[nb_args], *onstack = _onstack;
+---
+> char _onstack[/*nb_args*/1000/*fucking vlas*/], *onstack = _onstack;
+1461,1465d1429
+< default:
+< stack_arg:
+< seen_stack_size = ((seen_stack_size + align - 1) & -align) + size;
+< break;
+<
+1476a1441,1445
+> default:
+> stack_arg:
+> seen_stack_size = ((seen_stack_size + align - 1) & -align) + size;
+> break;
+>
+1940,1943d1908
+< default:
+< case '+':
+< a = 0;
+< break;
+1956a1922,1925
+> case '+':
+> default:
+> a = 0;
+> break;
+2016,2019d1984
+< default:
+< case '+':
+< a = 0;
+< break;
+2027a1993,1996
+> break;
+> case '+':
+> default:
+> a = 0;
+---- x86_64-link.c ----
+177c177
+< sym_index = ELFW(R_SYM)(rel->r_info);
+---
+> sym_index = ELF64_R_SYM(rel->r_info);
+185c185
+< qrel->r_info = ELFW(R_INFO)(esym_index, R_X86_64_64);
+---
+> qrel->r_info = ELF64_R_INFO(esym_index, R_X86_64_64);
+190c190
+< qrel->r_info = ELFW(R_INFO)(0, R_X86_64_RELATIVE);
+---
+> qrel->r_info = ELF64_R_INFO(0, R_X86_64_RELATIVE);
+202c202
+< qrel->r_info = ELFW(R_INFO)(0, R_X86_64_RELATIVE);
+---
+> qrel->r_info = ELF64_R_INFO(0, R_X86_64_RELATIVE);
+216c216
+< qrel->r_info = ELFW(R_INFO)(esym_index, R_X86_64_PC32);
+---
+> qrel->r_info = ELF64_R_INFO(esym_index, R_X86_64_PC32);
+249c249
+< qrel->r_info = ELFW(R_INFO)(esym_index, R_X86_64_PC64);
+---
+> qrel->r_info = ELF64_R_INFO(esym_index, R_X86_64_PC64);
+---- lib/armeabi.c ----
+---- lib/armflush.c ----
+---- lib/bcheck.c ----
+---- lib/lib-arm64.c ----
+---- lib/libtcc1.c ----
+615a616,622
+>
+> static long double negate_ld(long double d) {
+> register unsigned long long *p = (unsigned long long *)&d;
+> p[1] ^= 1ul<<15;
+> return *(long double *)p;
+> }
+>
+619c626
+< ret = __fixunsxfdi((s = a1 >= 0) ? a1 : -a1);
+---
+> ret = __fixunsxfdi((s = a1 >= 0) ? a1 : negate_ld(a1));
+---- lib/va_list.c ----
diff --git a/05/tcc-0.9.27/config.h b/05/tcc-0.9.27/config.h
index 95ec14d..2ed06d9 100644
--- a/05/tcc-0.9.27/config.h
+++ b/05/tcc-0.9.27/config.h
@@ -1,10 +1,7 @@
#define TCC_VERSION "0.9.27"
#define CONFIG_TCC_STATIC 1
-//#define CONFIG_TCC_ELFINTERP "/XXX"
-//#define CONFIG_TCC_CRT_PREFIX "/XXX"
-//#define CONFIG_SYSROOT "/XXX"
-#define inline
#define TCC_TARGET_X86_64 1
#define ONE_SOURCE 1
#define CONFIG_LDDIR "lib/x86_64-linux-gnu"
#define CONFIG_TCCDIR "/usr/local/lib/tcc-bootstrap"
+#define inline
diff --git a/Makefile b/Makefile
index 20d8dcb..323bff9 100644
--- a/Makefile
+++ b/Makefile
@@ -5,6 +5,8 @@ all: markdown README.html
$(MAKE) -C 03
$(MAKE) -C 04
$(MAKE) -C 04a
+ # don't compile all of 05 because it takes a while
+ $(MAKE) -C 05 README.html
clean:
$(MAKE) -C 00 clean
$(MAKE) -C 01 clean
diff --git a/README.md b/README.md
index e4e47f6..6509198 100644
--- a/README.md
+++ b/README.md
@@ -27,7 +27,7 @@ command codes.
- [stage 03](03/README.md) - a language with longer labels, better error messages, and less register manipulation
- [stage 04](04/README.md) - a language with nice functions and local variables
- [stage 04a](04a/README.md) - (interlude) a simple preprocessor
-- more coming soon (hopefully)
+- [stage 05](05/README.md) - a C compiler capable of compiling TCC
## prerequisite knowledge
@@ -59,21 +59,21 @@ If you're unfamiliar with x86-64 assembly, you should check out the instruction
Bootstrapping a compiler is not an easy task, so we're trying to make it as easy
as possible. We don't even necessarily need a standard-compliant C compiler, we
-only need enough to compile someone else's C compiler, specifically we'll be
+only need enough to compile someone else's C compiler. Specifically, we'll be
using [TCC](https://bellard.org/tcc/) since it's written (mostly) in standard C89.
- efficiency is not a concern
We will create big and slow executables, and that's okay. It doesn't really
-matter if compiling TCC takes 8 as opposed to 0.01 seconds; once we compile TCC
-with itself, we'll get the same executable either way.
+matter if compiling TCC takes 30 as opposed to 0.01 seconds; once the process
+is finished, we'll get the same executable either way.
## reflections on trusting trust
In 1984, Ken Thompson wrote the well-known article
[Reflections on Trusting Trust](http://users.ece.cmu.edu/~ganger/712.fall02/papers/p761-thompson.pdf).
-This is one of the inspirations for this project. To summarize
-the article: it is possible to create a malicious C compiler which will
+This is one of the inspirations for this project. A brief summary is:
+it's possible to create a malicious C compiler which will
replicate its own malicious functionalities (e.g. detecting password-checking
routines to make them also accept another password the attacker knows) when used
to compile other C compilers. For all we know, such a compiler was used to
@@ -224,10 +224,10 @@ Arguments are passed in
The return value is placed in rax.
```
-More will be added in the future as needed.
-
## license
+Note that this does not apply to TCC's source code (`05/tcc-0.9.27`).
+
```
This project is in the public domain. Any copyright protections from any law
are forfeited by the author(s). No warranty is provided, and the author(s)
diff --git a/bootstrap.sh b/bootstrap.sh
index 2597065..cdbb261 100755
--- a/bootstrap.sh
+++ b/bootstrap.sh
@@ -88,5 +88,14 @@ if [ "$(sed '/^#/d;/^$/d' out04a)" != 'Hello, world!' ]; then
fi
cd ..
+echo 'Processing stage 05 (this will take some time)...'
+cd 05
+rm -f test.out out04 in04 *.o tcc-0.9.27/tcc0
+make -s test.out > /dev/null
+if [ "$(./test.out)" != 'Hello, world!' ]; then
+ echo_red 'Stage 05 failed.'
+ exit 1
+fi
+cd ..
echo_green 'all stages completed successfully!'