Rep
swamp")
Types in the back end (aka "The I have completed a major representation change, affecting both old and new code generators, of the various Rep
types. It's pervasive in that it touches a lot of files; and in the native code-gen very many lines are changed. The new situation is much cleaner.
Here are the highlights of the new design.
CmmType
There is a new type CmmType
, defined in module CmmExpr
, which is just what it sounds like: it's the type of a CmmExpr
or a CmmReg
.
- A
CmmType
is abstract: its representation is private toCmmExpr
. That makes it easy to change representation. - A
CmmType
is actually just a pair of aWidth
and a category (CmmCat
). - The
Width
type is exported and widely used in pattern-matching, but it does what it says on the tin: width only. - In contrast, the
CmmCat
type is entirely private toCmmExpr
. It is just an enumeration that allows us to distinguish: floats, gc pointers, and other.
Other important points are these:
-
Each
LocalReg
has aCmmType
attached; this replaces the previous unsavoury combination ofMachRep
andCmmKind
. Indeed, both of the latter are gone entirely. -
Notice that a
CmmType
accurately knows about gc-pointer-hood. Ultimately we will abandon static-reference-table generation in STG syntax, and instead generate SRTs from the Cmm code. We'll need to update the RTS.cmm
files to declare pointer-hood. -
The type
TyCon.PrimRep
remains; it enumerates the representations that a Haskell value can take. Differences fromCmmType
:-
PrimRep
containsVoidRep
, butCmmType
has no zero-width form. -
CmmType
includes sub-word width values (e.g. 8-bit) whichPrimRep
does not. The functionprimRepCmmType
converts a non-voidPrimRep
to aCmmType
.
-
-
CmmLint
is complains if you assign a gc-ptr to a non-gc-ptr and vice versa. It treats "gc-ptr + constant" as a gc-ptr.- NB: you'd better not make an interior pointer live across a call, else we'll save it on the stack and treat it as a GC root. It's not clear how to guarantee this doesn't happen as the result of some optimisation.
Parsing .cmm
RTS files. The global register P0
is a gc-pointer version of R0
. They both map to the same physical register, though!
MachOp
type
The The MachOp
type enumerates (in machine-independent form) the available machine instructions. The principle they embody is that everything except the width is embodied in the opcode. In particular, we have
-
MO_S_Lt
,MO_U_Lt
, andMO_F_Lt
for comparison (signed, unsigned, and float). -
MO_SS_Conv
,MO_SF_Conv
etc, for conversion (SS
is signed-to-signed,SF
is signed-to-float, etc).
These constructor all take Width
arguments.
The MachOp
data type is defined in CmmExpr
, not in a separate MachOp
module.
Foreign calls and hints
In the new Cmm representation (ZipCfgCmmRep
), but not the old one, arguments and results to all calls, including foreign ones, are ordinary CmmExpr
or CmmReg
respectively. The extra information we need for foreign calls (is this signed? is this an address?) are kept in the calling convention. Specifically:
-
MidUnsafeCall
calls aMidCallTarget
-
MidCallTarget
is either aCallishMachOp
or aForeignTarget
- In the latter case we supply a
CmmExpr
(the function to call) and aForeignConvention
- A
ForeignConvention
contains the C calling convention (stdcall, ccall etc), and a list ofForiegnHints
for arguments and for results. (We might want to rename this type.)
This simple change was horribly pervasive. The old Cmm rep (and Michael Adams's stuff) still has arguments and results being (argument,hint) pairs, as before.
Size
type
Native code generation and the The native code generator has an instruction data type for each architecture. Many of the instructions in these data types used to have a MachRep
argument, but now have a Size
argument instead. In fact, so far as the native code generators are concerned, these Size
types (which can be machine-specific) are simply a plug-in replacement for MachRep
, with one big difference: Size
is completely local to the native code generator and hence can be changed at will without affecting the rest of the compiler.
Size
is badly named, but I inherited the name from the previous code.
I rather think that many instructions should have a Width
parameter, not a Size
parameter. But I didn't feel confident to change this. Generally speaking the NCG is a huge swamp and needs re-factoring.