summaryrefslogtreecommitdiff
path: root/05/README.md
diff options
context:
space:
mode:
Diffstat (limited to '05/README.md')
-rw-r--r--05/README.md116
1 files changed, 74 insertions, 42 deletions
diff --git a/05/README.md b/05/README.md
index 0244a4b..a68cbbb 100644
--- a/05/README.md
+++ b/05/README.md
@@ -1,42 +1,42 @@
# [bootstrap](../README.md) stage 05
-This stage consists of a C compiler capable of compiling TCC (after some modifications
-to TCC's source code).
+This stage consists of a C compiler capable of compiling tcc (after some modifications
+to tcc's source code).
Run
```
$ make
```
-to build our C compiler and TCC. This will take some time (approx. 25 seconds on my computer).
+to build our C compiler and tcc. This will take some time (approx. 25 seconds on my computer).
This also compiles a "Hello, world!" executable, `a.out`, with our compiler.
-We can now compile TCC with itself. But first, you'll need to install the header files and library files
-which are needed to compile (almost) any program with TCC:
+We can now compile tcc with itself. But first, you'll need to install the header files and library files
+which are needed to compile (almost) any program with tcc:
```
$ sudo make install-tcc0
```
The files will be installed to `/usr/local/lib/tcc-bootstrap`. If you want to change this, make sure to change
-both the `TCCINST` variable in the makefile, and the `CONFIG_TCCDIR` macro in `config.h`.
+both the `TCCINST` variable in the makefile, and the `CONFIG_TCCDIR` macro in `tcc-0.9.27/config.h`.
Anyways, once this installation is done, you should be able to compile any C program with `tcc-0.9.27/tcc0`,
-including TCC itself:
+including tcc itself:
```
$ cd tcc-0.9.27
$ ./tcc0 tcc.c -o tcc1
```
-Now, let's try doing the same thing, but starting with GCC instead of our C compiler:
+Now, let's try doing the same thing, but starting with gcc instead of our C compiler:
```
$ gcc tcc.c -o tcc0a
$ ./tcc0a tcc.c -o tcc1a
```
-In theory, these should produce the same files, since the output of TCC shouldn't depend on which compiler it was compiled with.
-If they are different, then perhaps a bug *was* introduced in some early version of GCC, and replicated in all C compilers since then!
+In theory, these should produce the same files, since the output of tcc shouldn't depend on which compiler it was compiled with.
+If they are different, then perhaps a bug *was* introduced in some early version of gcc, and replicated in all C compilers since then!
Well, only one way to find out:
```
@@ -53,9 +53,9 @@ $ diff tcc2 tcc1a
$
```
-Yes, after compiling TCC with itself one more time, we get the same executable as the GCC-TCC one.
+Yes, after compiling tcc with itself one more time, we get the same executable as the gcc-tcc one.
I'm not sure why `tcc1` differs from `tcc2`, but there you go. Turns out there isn't some malicious
-self-replicating code hiding in GCC after all.\*
+self-replicating code hiding in gcc after all.\*
## the C compiler
@@ -69,7 +69,7 @@ idents.b - functions for creating mappings from identifiers to arbitrary 6
preprocess.b - preprocesses C files
tokenize.b - turns preprocessing tokens into tokens (see explanation below)
parse.b - turns tokens into a nice representation of the program
-codegen.b - turns parse.b's representation into actual code
+codegen.b - turns parse.b's representation into CPU instructions
main.b - puts everything together
```
@@ -290,7 +290,7 @@ Here is a (probably incomplete) list of things we do wrong:
- You can't have a variable/function/etc. called `defined`.
- Various little things about when macros are evaluated in some contexts.
- The horrible, horrible function `setjmp`, which surely no one uses, is not properly supported.
-Oh wait, TCC uses it. Fortunately it's not critically important to TCC.
+Oh wait, tcc uses it. Fortunately it's not critically important to tcc.
- Wide characters and wide character strings are not supported.
- The `localtime()` function assumes you are in the UTC+0 timezone.
- `mktime()` always fails.
@@ -317,17 +317,16 @@ rounds down, but
0.09999999999999999861222121921855432447046041488647460937501
rounds up.
```
-Good luck writing a function which handles that!
+Good luck writing code which handles that!
- Originally, there was a bug where negative powers of 2 were
being interpreted as half of their actual value, e.g. `x = 0.25;` would set `x` to
`0.125`, but `x = 4;`, `x = 0.3;`, etc. would all work just fine.
-- Writing the functions in `math.h`, although probably not necessary for compiling TCC,
+- Writing the functions in `math.h`, although probably not necessary for compiling tcc,
was fun! There are quite a few interesting optimizations you can make, and little
tricks for avoiding losses in floating-point accuracy.
-- The <s>first</s> second non-trivial program I successfully compiled worked perfectly the first time I ran it!
-- A very difficult to track down bug happened the first time I ran `tcc`: there was a declaration along
+- A very difficult to track down bug happened the first time I ran tcc: there was a declaration along
the lines of `char x[] = "a\0b\0c";` but it got compiled as `char x[] = "a";`!
-- Originally, I was just treating labels the same as any other statements, but `tcc` actually has code like:
+- Originally, I was just treating labels the same as any other statements, but tcc actually has code like:
```
...
goto lbl;
@@ -337,7 +336,7 @@ if (some_condition)
```
so the `do_something();` was not being considered as part of the `if` statement.
- The first time I compiled tcc with itself (and then with itself again), I actually got a different
-executable from the GCC one. After spending a long time looking at disassemblies, I found the culprit:
+executable from the gcc one. After spending a long time looking at disassemblies, I found the culprit:
```
# if defined(__linux__)
tcc_define_symbol(s, "__linux__", NULL);
@@ -345,8 +344,9 @@ executable from the GCC one. After spending a long time looking at disassemblies
# endif
```
If the `__linux__` macro is defined (to indicate that the target OS is linux),
-TCC will also define the `__linux__` macro. Unlike GCC, our compiler doesn't define the `__linux__` macro,
-so when it's used to compile TCC, TCC won't define it either, no matter how many times you compile it
+tcc will also define the `__linux__` macro in any programs it compiles.
+Unlike gcc, our compiler doesn't define the `__linux__` macro,
+so when it's used to compile tcc, tcc won't define it either, no matter how many times you compile it
with itself!
## modifications of tcc's source code
@@ -359,7 +359,7 @@ here.
- First, we (and C89) don't allow a comma after the last member in an initializer. In several places,
the last comma in an initializer/enum definition was removed, or an irrelevant entry was added to the end.
- Global variables were sometimes declared twice, which we don't support.
-So, a bunch of duplicate declarations were removed.
+So a bunch of duplicate declarations were removed.
- The `# if defined(__linux__)` and `# endif` mentioned above were removed.
- In a bunch of places, `ELFW(something)` had to be replaced with `ELF64_something` due to
subtleties of how we evaluate macros.
@@ -368,12 +368,12 @@ some initializers were replaced by functions called at the top of `main`.
- In several places, `default:` had to be moved to after every `case` label.
- In two places, `-some_long_double_expression` had to be replaced with
a function call to `negate_ld` (a function I wrote for negating long doubles).
-This is because TCC only supports negating long doubles if
-the compiler used to compile it has an 80-bit long double type, which our compiler doesn't.
-- `\0` was replaced with `\n` as a separator for keyword names.
-- Forced TCC to use `R_X86_64_PC32` relocations, because its `plt` code doesn't seem to work for static
+This is because tcc only supports negating long doubles if
+the compiler which compiled it has an 80-bit long double type, and our compiler doesn't.
+- `\0` was replaced with `\n` as a separator for keyword names in the `tcc_keywords` global variable.
+- Forced tcc to use `R_X86_64_PC32` relocations, because its `plt` code doesn't seem to work for static
executables.
-- Lastly, there's the `config.h` file, which is normally produced by TCC's `configure` script,
+- Lastly, there's the `config.h` file, which is normally produced by tcc's `configure` script,
but it's easy to write one manually:
```
#define TCC_VERSION "0.9.27"
@@ -386,30 +386,62 @@ but it's easy to write one manually:
```
The last line causes the `inline` keyword (added in C99) to be ignored.
-Fewer changes would've been needed for an older version of TCC, but older versions didn't support
-x86-64 assembly, which might end up being relevant...
-
## \*libc
-If you look in TCC's source code, you will not find implementations of any of the C standard library functions.
-So how can programs compiled with TCC use those functions?
+If you look in tcc's source code, you will not find implementations of any of the C standard library functions.
+So how can programs compiled with tcc use those functions?
-When a program compiled with TCC (under default settings) calls `printf`, say, it actually gets the instructions
+When a program compiled with tcc (under default settings) calls `printf`, say, it actually gets the instructions
for `printf` from a separate library file
(called something like `/usr/lib/x86_64-linux-gnu/libc-2.31.so`). There are very good reasons for this: for example,
if there a security bug were found in `printf`, it would be much easier to replace the library file than re-compile
every program which uses `printf`.
Now this library file is itself compiled from C source files (typically glibc).
-So, we can't really say that the self-compiled TCC was built from scratch, and there could be malicious
+So, we can't really say that the self-compiled tcc was built from scratch, and there could be malicious
self-replicating code in glibc.
-You can't compile glibc with TCC, but
-it's possible to build an old version of `musl`, an alternate libc
+### compiling glibc
+
+You can't compile glibc with tcc, but
+it's possible to build an old version of musl, an alternate libc
(you can run `CC=../tcc-0.9.27/tcc0 make` in the `musl-0.6.0` directory here).
-You should be able to use musl alongside TCC to build an old version of GCC (git revision
-`79a6d9b7ff3822675ee44d8d6cad86027dadd664` seems workable). This also requires
-building several tools needed to compile GCC. You should then be able to build (possibly an old version of)
-glibc, and with that, a modern version of GCC.
-This is all extremely tedious, though, so I'm not planning on doing it anytime soon.
+You should be able to use musl alongside tcc to build an old version of gcc. This also requires
+building several tools needed to compile gcc. You should then be able to build an old version of
+glibc, and with that, a modern version of gcc.
+
+Well, I tried this. And it is an absolute nightmare.
+GNU has created a horrible web of programs that all depend on each other.
+According to the recommended build process, you need awk to build awk, sed to build sed,
+sed to build grep, etc. Here was a "guide" I was starting to write for how to
+get to glibc:
+
+- install tcc, musl
+- build mrsh, make, basic utilities
+- chroot
+- build & install coreutils
+- build & install dash
+- build & install sed-4.2
+- build & install ld, as (from binutils)
+- build gcc
+- build & install grep-3.7
+- build & install awk
+- build & install bash
+- build & install glibc (didn't work)
+
+Each of these programs uses a `./configure` script to set up the code and Makefiles.
+These scripts are basically impossible to use without already having
+most of these programs. So, I resorted to configuring the build with
+the ordinary binary versions of `sed`, etc. I had on my machine.
+This made broken Makefiles which I spent hours editing by hand
+-- and is it really compiled from scratch if it's built from
+computer-generated source files and Makefiles?
+And although the developers at GNU
+refrain from declaring variables after statements, and keep old-style function declarations
+to support compilers from the 80s; they *still* manage to use gcc-specific extensions, and
+not even extensions that all versions of gcc support!
+After hours and hours of fixing compiler errors, I decided to give up.
+
+THIS WAY LIES MADNESS.
+