Commit 1e8a345a authored by sewardj's avatar sewardj
Browse files

[project @ 2002-01-31 18:01:34 by sewardj]

Make a quite-large start on native code generator documentation.
parent b27560c4
......@@ -56,6 +56,7 @@
<li><a href="the-beast/simplifier.html">The Mighty Simplifier</a>
<li><a href="the-beast/mangler.html">The Evil Mangler</a>
<li><a href="the-beast/alien.html">Alien Functions</a>
<li><a href="the-beast/ncg.html">The Native Code Generator</a>
</ul>
<h2>RTS &amp; Libraries</h2>
......
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
<title>The GHC Commentary - The Native Code Generator</title>
</head>
<body BGCOLOR="FFFFFF">
<h1>The GHC Commentary - The Native Code Generator</h1>
<p>
On x86 and sparc platforms, GHC can generate assembly code
directly, without having to go via C. This can sometimes almost
halve compilation time, and avoids the fragility and
horribleness of the mangler. The NCG is enabled by default for
non-optimising compilation on x86 and sparc. For most programs
it generates code which runs only about 1% slower than that
created by gcc on x86s, so it is well worth using even with
optimised compilation. FP-intensive x86 programs see a bigger
slowdown, and all sparc code runs about 5% slower due to
us not filling branch delay slots.
<p>
In the distant past
the NCG could also generate Alpha code, and that machinery
is still there, but will need extensive refurbishment to
get it going again, due to underlying infrastructural changes.
Budding hackers thinking of doing a PowerPC port would do well
to use the sparc bits as a starting point.
<p>
The NCG has always been something of a second-class citizen
inside GHC, an unloved child, rather. This means that its
integration into the compiler as a whole is rather clumsy, which
brings some problems described below. That apart, the NCG
proper is fairly cleanly designed, as target-independent as it
reasonably can be, and so should not be difficult to retarget.
<p>
The following details are correct as per the CVS head of end-Jan
2002.
<h2>Overview</h2>
The top-level code generator fn is
<p>
<code>absCtoNat :: AbstractC -> UniqSM (SDoc, Pretty.Doc)</code>
<p>
The returned <code>SDoc</code> is for debugging, so is empty unless
you specify <code>-ddump-stix</code>. The <code>Pretty.Doc</code>
bit is the final assembly code. Translation involves three main
phases, the first and third of which are target-independent.
<ul>
<li><b>Translation into the <code>Stix</code> representation.</b> Stix
is a simple tree-like RTL-style language, in which you can
mention:
<p>
<ul>
<li>An infinite number of temporary, virtual registers.
<li>The STG "magic" registers (<code>MagicId</code>), such as
the heap and stack pointers.
<li>Literals and low-level machine ops (<code>MachOp</code>).
<li>Simple address computations.
<li>Reads and writes of: memory, virtual regs, and various STG
regs.
<li>Labels and <code>if ... goto ...</code> style control-flow.
</ul>
<p>
Stix has two main associated types:
<p>
<ul>
<li><code>StixStmt</code> -- trees executed for their side
effects: assignments, control transfers, and auxiliary junk
such as segment changes and literal data.
<li><code>StixExpr</code> -- trees which denote a value.
</ul>
<p>
Translation into Stix is almost completely
target-independent. Needed dependencies are knowledge of
word size and endianness, used when generating code to do
deal with half-word fields in info tables. This could be
abstracted out easily enough. Also, the Stix translation
needs to know which <code>MagicId</code>s map to registers
on the given target, and which are stored in offsets from
<code>BaseReg</code>.
<p>
After initial Stix generation, the trees are cleaned up with
constant-folding and a little copy-propagation ("Stix
inlining", as the code misleadingly calls it). We take
the opportunity to translate <code>MagicId</code>s which are
stored in memory on the given target, into suitable memory
references. Those which are stored in registers are left
alone. There is also a half-hearted attempt to lift literal
strings to the top level in cases where nested strings have
been observed to give incorrect code in the past.
<p>
Primitive machine-level operations will already be phrased in
terms of <code>MachOp</code>s in the presented Abstract C, and
these are passed through unchanged. We comment only that the
<code>MachOp</code>s have been chosen so as to be easy to
implement on all targets, and their meaning is intended to be
unambiguous, and the same on all targets, regardless of word
size or endianness.
<p>
<li><b>Instruction selection.</b> This is the only majorly
target-specific phase. It turns Stix statements and
expressions into sequences of <code>Instr</code>, a data
type which is different for each architecture.
<code>Instr</code>, unsurprisingly, has various supporting
types, such as <code>Reg</code>, <code>Operand</code>,
<code>Imm</code>, etc. The generated instructions may refer
to specific machine registers, or to arbitrary virtual
registers, either those created within the instruction
selector, or those mentioned in the Stix passed to it.
<p>
The instruction selectors live in <code>MachCode.lhs</code>.
The core functions, for each target, are:
<p>
<code>
getAmode :: StixExpr -> NatM Amode
<br>getRegister :: StixExpr -> NatM Register
<br>assignMem_IntCode :: PrimRep -> StixExpr -> StixExpr -> NatM InstrBlock
<br>assignReg_IntCode :: PrimRep -> StixReg -> StixExpr -> NatM InstrBlock
</code>
<p>
The insn selectors use the "maximal munch" property. The
bizarrely-misnamed <code>getRegister</code> translates
expressions. A simplified version of its type is:
<p>
<code>getRegister :: StixExpr -> NatM (OrdList Instr, Reg)</code>
<p>
That is: it (monadically) turns a <code>StixExpr</code> into a
sequence of instructions, and a register, with the meaning
that after executing the (possibly empty) sequence of
instructions, the (possibly virtual) register will
hold the resulting value. The real situation is complicated
by the presence of fixed registers, and is detailed below.
<p>
Maximal munch is a greedy algorithm and is known not to give
globally optimal code sequences, but it is good enough, and
fast and simple. Early incarnations of the NCG used something
more sophisticated, but that is long gone now.
<p>
Similarly, <code>getAmode</code> translates a value, intended
to denote an address, into a sequence of insns leading up to
a (processor-specific) addressing mode. This stuff could be
done using the general <code>getRegister</code> selector, but
would necessarily generate poorer code, because the calculated
address would be forced into a register, which might be
unnecessary if it could partially or wholly be calculated
using an addressing mode.
<p>
Finally, <code>assignMem_IntCode</code> and
<code>assignReg_IntCode</code> create instruction sequences to
calculate a value and store it in the given register, or at
the given address. Because these guys translate a statement,
not a value, they just return a sequence of insns and no
associated register. Floating-point and 64-bit integer
assignments have analogous selectors.
<p>
Apart from the complexities of fixed vs floating registers,
discussed below, the instruction selector is as simple
as it can be. It looks long and scary but detailed
examination reveals it to be fairly straightforward.
<p>
<li><b>Register allocation.</b> The register allocator,
<code>AsmRegAlloc.lhs</code> takes sequences of
<code>Instr</code>s which mention a mixture of real and
virtual registers, and returns a modified sequence referring
only to real ones. It is gloriously and entirely
target-independent. Well, not exactly true. Instead it
regards <code>Instr</code> (instructions) and <code>Reg</code>
(virtual and real registers) as abstract types, to which it has
the following interface:
<p>
<code>
insnFuture :: Instr -> InsnFuture
<br>regUsage :: Instr -> RegUsage
<br>patchRegs :: Instr -> (Reg -> Reg) -> Instr
</code>
<p>
<code>insnFuture</code> is used to (re)construct the graph of
all possible control transfers between the insns to be
allocated. <code>regUsage</code> returns the sets of registers
read and written by an instruction. And
<code>patchRegs</code> is used to apply the allocator's final
decision on virtual-to-real reg mapping to an instruction.
<p>
Clearly these 3 fns have to be written anew for each
architecture. They are defined in
<code>RegAllocInfo.lhs</code>. Think twice, no, thrice,
before modifying them: making false claims about insn
behaviour will lead to hard-to-find register allocation
errors.
<p>
<code>AsmRegAlloc.lhs</code> contains detailed comments about
how the allocator works. Here is a summary. The head honcho
<p>
<code>allocUsingTheseRegs :: [Instr] -> [Reg] -> (Bool, [Instr])</code>
<p>
takes a list of instructions and a list of real registers
available for allocation, and maps as many of the virtual regs
in the input into real ones as it can. The returned
<code>Bool</code> indicates whether or not it was
successful. If so, that's the end of it. If not, the caller
of <code>allocUsingTheseRegs</code> will attempt spilling.
More of that later. What <code>allocUsingTheseRegs</code>
does is:
<p>
<ul>
<li>Implicitly number each instruction by its position in the
input list.
<p>
<li>Using <code>insnFuture</code>, create the set of all flow
edges -- possible control transfers -- within this set of
insns.
<p>
<li>Using <code>regUsage</code> and iterating around the flow
graph from the previous step, calculate, for each virtual
register, the set of flow edges on which it is live.
<p>
<li>Make a real-register committment map, which gives the set
of edges for which each real register is committed (in
use). These sets are initially empty. For each virtual
register, attempt to find a real register whose current
committment does not intersect that of the virtual
register -- ie, is uncommitted on all edges that the
virtual reg is live. If successful, this means the vreg
can be assigned to the realreg, so add the vreg's set to
the realreg's committment.
<p>
<li>If all the vregs were assigned to a realreg, use
<code>patchInstr</code> to apply the mapping to the insns themselves.
</ul>
<p>
<b>Spilling</b>
<p>
If <code>allocUsingTheseRegs</code> fails, a baroque
mechanism comes into play. We now know that much simpler
schemes are available to do the same thing. Anyways:
<p>
The logic above <code>allocUsingTheseRegs</code>, in
<code>doGeneralAlloc</code> and <code>runRegAllocate</code>,
observe that allocation has failed with some set R of real
registers. So they apply <code>runRegAllocate</code> a second
time to the code, but remove (typically) two registers from R
before doing so. This naturally fails too, but returns a
partially-allocated sequence. <code>doGeneralAlloc</code>
then inserts spill code into the sequence, and finally re-runs
<code>allocUsingTheseRegs</code>, but supplying the original,
unadulterated R. This is guaranteed to succeed since the two
registers previously removed from R are sufficient to allocate
all the spill/restore instructions added.
<p>
Because x86 is very short of registers, and in the worst case
needs three removed from R, a softly-softly approach is used.
<code>doGeneralAlloc</code> first tries with zero regs removed
from R, then if that fails one, then two, etc. This means
<code>allocUsingTheseRegs</code> may get run several times
before a successful arrangement is arrived at.
<code>findReservedRegs</code> cooks up the sets of spill
registers to try with.
<p>
The resulting machinery is complicated and the generated spill
code is appalling. The saving grace is that spills are very
rare so it doesn't matter much. I did not invent this -- I inherited it.
<p>
<b>Dealing with common cases fast</b>
<p>
The entire reg-alloc mechanism described so far is general and
correct, but expensive overkill for many simple code blocks.
So to begin with we use start off with
<code>doSimpleAlloc</code>, which attempts to do something
simple. It exploits the observation that if the total number
of virtual registers does not exceed the number of real ones
available, we can simply dole out a new realreg each time we
see mention of a new vreg, with no regard for control flow.
<code>doSimpleAlloc</code> therefore attempts this in a
single pass over the code. It gives up if it runs out of real
regs or sees any condition which renders the above observation
invalid (fixed reg uses, for example).
<p>
This clever hack handles the majority of code blocks quickly.
It was copied from the previous reg-allocator (the
Mattson/Partain/Marlow/Gill one).
</ul>
<p>
<h2>Complications, observations, and possible improvements</h2>
<h3>How to debug the NCG without losing your sanity</h3>
<h3>How to debug the NCG without losing your sanity</h3>
<p><small>
<!-- hhmts start -->
Last modified: Fri Aug 10 11:47:41 EST 2001
<!-- hhmts end -->
</small>
</body>
</html>
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment