summaryrefslogtreecommitdiffstats
path: root/_posts/2017-02-22-cozy-devnotes-machine-specs.md
blob: 990cead9046ee47bfbd167ba16da6eb78b8b734e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
---
# vim: tw=80 spell :
title: "Compiler devnotes: Machine specs"
layout: post
tags: [C, language design]
---

I have a number of long-term projects that I plan for on long timelines, on the
order of decades or more. One of these projects is cozy, a C toolchain. I
haven't talked about this project in public before, so I'll start by introducing
you to the project. The main C toolchains in the "actually usable" category are
GNU and LLVM, but I'm satisfied with neither and I want to build my own
toolchain. I see no reason why compilers should be deep magic. Here are my goals
for cozy:

- Self hosting and written in C
- An easy to grok codebase and internal design
- Focused on C. No built-in support for other languages
- Adding new targets architectures and ports should be straightforward
- Modular build pipeline with lots of opportunities for external integrations
- Trivially cross-compiles without building another version of the toolchain
- Includes a decent optimizer

Some other plans include opinionated warnings about code and minimal support for
language extensions. Ambitious goals, right? That's why this project is on my
long-term schedule. I've found that large projects are entirely feasible, so
long as you (1) start them and (2) keep working on them for a long time. I don't
need to rush this - gcc and clang may not be ideal, but they work today. In
support of these goals, I'll be writing these dev notes to explain my design
choices and gather feedback — please [email me](mailto:sir@cmpwn.com) if
you have some!

Since I want to place an emphasis on portability and retargetability, I'm
starting by designing the machine spec and its support code, which is used to
add support for new architectures. I don't like gcc's lisp specs, and I *really*
don't like LLVM's "huge pile of C++" approach. I think a really good machine
spec meets these goals:

- Easy to write and human friendly
- More about data than code, but
- Easily extended with C to support architecture-specific nuances
- Provides loads of useful metadata about the target architecture
- Exposes information about the speed and side-effects of each instruction
- Can also be used to generate an assembler and disassembler
- Easily reused to create derivative architectures

Adding a new architecture should be a weekend project, and when you're done the
entire toolchain should both support and run on your new architecture. I set out
to come up with a new syntax that could potentially meet these goals. I started
with the Z80 architecture in mind because it's simple, I'm intimately familiar
with it, and I want cozy to be able to target 8-bit machines just as easily as
32 or 64 bit.

For reference, here are the gcc and LLVM guides on adding new targets:

- [gcc - Anatomy of a Target Back End](https://gcc.gnu.org/onlinedocs/gccint/Back-End.html)
- [Writing an LLVM Backend](http://llvm.org/docs/WritingAnLLVMBackend.html)

The cozy machine spec is a cross between ini files, yaml, and a custom syntax.
The format is somewhat complex, but once understood is intuitive and flexible.
At the top level, it looks like an ini file:

```yaml
[metadata]
# ...

[registers]
# ...

[macros]
# ...

[instructions]
# ...
```

### Metadata

The **metadata** section contains some high-level information about the
architecture design, and is the simplest section to understand. It currently
looks like this for z80:

```yaml
[metadata]
name: z80
bits: 8
endianness: little
signedness: twos-complement
cache: none
pipeline: none
```

This isn't comprehensive, and I'll be adding more metadata as it becomes
necessary. On LLVM, this sort of information is encoded into a string that looks
something like this: `"e-p:16:8:8-i8:8:8-i16:8:8-n8:16"`. This string is passed
into the `LLVMTargetMachine` base constructor in C++. I think we can do a hell
of a lot better than that!

### Registers

The **registers** section describes the registers on this architecture.

```yaml
[registers]
BC: 16
    B: 8
    C: 8; offset=8
DE: 16
    D: 8
    E: 8; offset=8
HL: 16
    H: 8
    L: 8; offset=8
SP: 16; stack
PC: 16; program
```

Here we can start to see some interesting syntax and get an idea of the design
of cozy machine specs. The contents of each section are keys, which have values,
attributes, and children. The format looks like this:

```yaml
key: value; attributes, ...
    children...
```

In this example, we've defined the BC, DE, HL, SP, and PC registers. HL, DE, and
BC are general purpose 16-bit registers, and each can also be used as two
separate 8-bit registers. The attributes for these sub-registers indicates their
offsets in the parent register. We also define the stack and program registers,
SP and PC, which use the stack and program attributes to indicate their special
purposes.

We can also describe CPU flags in this section:

```yaml
[registers]
AF: 16; special
    A: 8; accumulator
    F: 8; flags, offset 8;; flag
        _C:  1
        _N:  1; offset 1
        _PV: 1; offset 2
        _3:  1; offset 3, undocumented
        _H:  1; offset 4
        _5:  1; offset 5, undocumented
        _Z:  1; offset 6
        _S:  1; offset 7
```

Here we introduce another feature of cozy specs with `F: 8; flags, offset 8;;
flag`. Using `;;` adds those attributes to all children of this key, so each of
\_C, \_N, etc have the `flag` attribute.

Take note of the "undocumented" attribute here. Some of the metadata included
in a spec can be applied to cozy tools. Some of it, however, is there for other
tools to utilize. We have a good opportunity to make a machine-readable
description of the architecture, so I've opted to include a lot of extra details
in machine specs that third parties could utilize (though there might be a
-fno-undocumented compiler flag some day, I guess).

### Macros

The **macros** section is heavily tied to the instructions section. Most instruction
sets are quite large, and I don't want to burden spec authors with writing out
the entire thing. We can speed up their work by providing macros.

z80 instructions have a few sets of common patterns in their encodings. Register
groups are often represented by the same set of bits, and we can make our
instruction set specification more concise by taking advantage of this. For
example, here's a macro that we can use for instructions that can use either the
BC, DE, HL, or SP registers:

```yaml
[macros]
reg_BCDEHLSP:
    BC: 00
    DE: 01
    HL: 10
    SP: 11
```

We have the name of the macro as the top-level key, in this case `reg_BCDEHLSP`.
We can later refer to this macro with `@reg_BCDEHLSP`. Then, we have each of the
cases it can match on, and the binary values these correspond to when encoded in
an instruction.

### Instructions

The instructions section brings everything together and defines the actual
instructions available on this architecture. Instructions can be organized into
groups at the spec author's pleasure, which can be referenced by derivative
architectures. Here we can take a look at the "load" group:

```yaml
[instructions]
.load:
    ld:
        @reg_BCDEHLSP, @imm[16]: 00 $1 0001 $2
```

On z80, the `ld` instruction is similar to the `mov` instruction on Intel
architectures. It assigns the second argument to the first. This could be used
to assign registers to each other (e.g. `ld a, b` to set A = B), to set
registers to constants, and so on. Our example here uses our macro from earlier
to match instructions like this:

    ld hl, 0x1234

The value for this key may reference the arguments with variables. $1 here
equals `10`, from the macro. The `imm` built-in is implemented in C to match
constants and provides $2. An assembler could use this information to assemble
our example instruction into this machine code:

    00100001 00110100 00010010

Which will load HL with the value 0x1234 when executed.

### Lots more metadata

Now that we have the basics down, let's dive into some deeper details. Cozy
specs are designed to provide most of the information the *entire toolchain*
needs to support an architecture. The information we have so far could be used
to generate assemblers and disassemblers, but I want this file to be able to
generate things like optimizers as well. You can add the necessary metadata to
each instruction by utilizing attributes.

Consider the z80 instruction LDIR, which stands for
"load/decrement/increment/repeat". This instruction is used for memcpy
operations. To use it, you set the HL register to a source address, the DE
register to a destination address, and BC to a length. This instruction looks
like this in the spec:

```yaml
ldir: 11101101 10110000; uses[HL, DE, BC], \
    affects[HL[+BC], DE[+BC], BC[0]], \
    flags[_H:0,_N:0,_PV:0], cycles[16 + BC * 5]
````

That's a lot of attributes! The purpose of these attributes are to give the
toolchain insights into the registers this instruction uses, its side effects,
and how fast it is. These attributes can help us compare the efficiency of
different approaches and understand the how the state of registers evolves
during a function, which leads to all sorts of useful optimizations.

The `affects` attribute, for example, tells us how each register is affected by
this instruction. We can see that after this instruction, HL and DE will have
had BC added to them, and BC will have been set to 0. We can make all sorts of
optimizations based on this knowledge. Here are some examples:

```c
char *dest, *src;
int len = 10;
memcpy(dest, src, len);
src += len;
```

The compiler can assign `src` to HL, `dest` to DE, and `len` to BC. We can then
optimize out the final statement entirely because we know that the LDIR
instruction will have already added BC to HL for us.

```c
char *dest, *src;
int len = 10;
memcpy(dest, src, len);
int foobar = 0;
```

In this case, the register allocator can just assign BC to `foobar` and avoid
initializing it because we know it's already going to be zero. Many other
optimizations are made possible when we are keeping track of the side effects of
each instruction.

## Next steps

I've iterated over this spec design for a while now, and I'm pretty happy with
it. I would love to hear your feedback. Assuming that this looks good, my next
step is writing more specs, and a tool that parses and compiles them to C. These
C files are going to be linked into `libcozyspec`, which will provide an API to
access all of this metadata from C. It will also include an instruction matcher,
which will be utilized by the next step - writing the assembler.

The assembler is going to take a while, because I don't want to go the gas route
of making a half-baked assembler that's more useful for compiling the C
compiler's output than for anything else. I want to make an assembler that
assembly programmers would *want* to use.

I have not yet designed an intermediate bytecode for the compiler to use, but
one will have to be made. The machine spec will likely change somewhat to
accommodate this. Some of the conversion from internal bytecode to target
assembly can likely be inferred from metadata, but some will have to be done
manually for each architecture.

[Here's the entire z80 spec](https://sr.ht/7_Pe.txt) I've been working on, for
your reading pleasure.