Old Code Generator (prior to GHC 7.8)
Material below describes old code generator that was used up to GHC 7.6 and was retired in 2012. This page is not maintained and is here only for historical purposes. See Code generator page for an up to date description of the current code generator.
Storage manager representations
See The Storage Manager for the Layout of the stack.
The code generator needs to know the layout of heap objects, because it generates code that accesses and constructs those heap objects. The runtime also needs to know about the layout of heap objects, because it contains the garbage collector. How can we share the definition of storage layout such that the code generator and the runtime both have access to it, and so that we don't have to keep two independent definitions in sync?
Currently we solve the problem this way:
-
C types representing heap objects are defined in the C header files, see for example includes/rts/storage/Closures.h.
-
A C program, includes/mkDerivedConstants.c,
#includes
the runtime headers. This program is built and run when you typemake
ormake boot
inincludes/
. It is run twice: once to generateincludes/DerivedConstants.h
, and again to generateincludes/GHCConstants.h
. -
The file
DerivedConstants.h
contains lots of#defines
like this:#define OFFSET_StgTSO_why_blocked 18
which says that the offset to the why_blocked field of an
StgTSO
is 18 bytes. This file is#included
into includes/Cmm.h, so these offests are available to the hand-written .cmm files. -
The file
GHCConstants.h
contains similar definitions:oFFSET_StgTSO_why_blocked = 18::Int
This time the definitions are in Haskell syntax, and this file is
#included
directly into compiler/main/Constants.lhs. This is the way that these offsets are made available to GHC's code generator.
Generated Cmm Naming Convention
See compiler/GHC/Cmm/CLabel.hs
Labels generated by the code generator are of the form <name>_<type>
where <name>
is <Module>_<name>
for external names and <unique>
for
internal names. <type>
is one of the following:
info | Info table |
---|---|
srt | Static reference table |
srtd | Static reference table descriptor |
entry | Entry code (function, closure) |
slow | Slow entry code (if any) |
ret | Direct return address |
vtbl | Vector table |
n_alt | Case alternative (tag n) |
dflt | Default case alternative |
btm | Large bitmap vector |
closure | Static closure |
con_entry | Dynamic Constructor entry code |
con_info | Dynamic Constructor info table |
static_entry | Static Constructor entry code |
static_info | Static Constructor info table |
sel_info | Selector info table |
sel_entry | Selector entry code |
cc | Cost centre |
ccs | Cost centre stack |
Many of these distinctions are only for documentation reasons. For example, _ret is only distinguished from _entry to make it easy to tell whether a code fragment is a return point or a closure/function entry.
Modules
CodeGen
Top level, only exports codeGen
.
Called from HscMain
for each module that needs to be converted from Stg to Cmm.
For each such module codeGen
does three things:
-
cgTopBinding
for theStgBinding
-
cgTyCon
for theTyCon
(These are constructors not constructor calls). -
mkModuleInit
for the module
mkModuleInit
generates several boilerplate initialization functions
that:
- regiser the module,
- creates an Hpc table,
- setup its profiling info (
InitConstCentres
, code coverage infoinitHpc
), and - calls the initialization functions of the modules it imports.
If neither SCC profiling or HPC are used, then the initialization code short circuits to return.
If the module has already been initialized, the initialization function just returns.
The Ghc.TopHandler
and Ghc.Prim
modules get special treatment.
cgTopBinding
is a small wrapper around cgTopRhs
which in turn disptaches to:
-
cgTopRhsCons
forStgRhsCons
(these are bindings of constructor applications not constructors themselves) and -
cgTopRhsClosure
forStgRhsClosure
.
cgTopRhsCons
and cgTopRhsClosure
are located in CgCon
and CgClosure
which are the primary modules called by CodeGen
.
CgCon
TODO
CgClosure
TODO
CgMonad
The monad that most of codeGen operates inside
- Reader
- State
- (could be Writer?)
- fork
- flatten
CgExpr
Called by CgClosure
and CgCon
.
Since everything in STG is an expression, almost everything branches off from here.
This module exports only one function cgExpr
,
which for the most part just dispatches
to other functions to handle each specific constructor in StgExpr
.
Here are the core functions that each constructor is disptached to (though some may have little helper functions called in addition to the core function):
StgApp | Calls to cgTailCall in CgTailCall |
---|---|
StgConApp | Calls to cgReturnDataCon in CgCon |
StgLit | Calls to cgLit in CgUtil and performPrimReturn in CgTailCall |
StgOpApp | Is a bit more complicated see below. |
StgCase | Calls to cgCase in CgCase |
StgLet | Calls to cgRhs in CgExpr |
StgLetNoEscape | Calls to cgLetNoEscapeBindings in CgExpr, but with a little bit of wrapping by nukeDeadBindings and saveVolatileVarsAndRegs. |
StgSCC | Calls to emitSetCCC in CgProf |
StgTick | Calls to cgTickBox in CgHpc |
StgLam | Does not have a case because it is only for CoreToStg's work. |
Some of these cases call to functions defined in cgExpr
.
This is because they need a little bit of wrapping and processing
before calling out to their main worker function.
cgRhs |
|
---|
cgLetNoEscapeBindings |
|
---|
StgOpApp
has a number of sub-cases.
StgFCallOp
-
StgPrimOp
of a TagToEnumOp -
StgPrimOp
that is primOpOutOfLine -
StgPrimOp
that returns Void -
StgPrimOp
that returns a single primitive -
StgPrimOp
that returns an unboxed tuple -
StgPrimOp
that returns an enumeration type
(It appears that non-foreign-call, inline PrimOps are not allowed to return complex data types (e.g. a |Maybe|), but this fact needs to be verified.)
Each of these cases centers around one of these three core calls:
-
emitForeignCall
inCgForeignCall
-
tailCallPrimOp
inCgTailCall
-
cgPrimOp
inCgPrimOp
There is also a little bit of argument and return marshelling with the following functions
Argument marshelling | shimForeignCallArg, getArgAmods |
---|---|
Return marshelling | dataReturnConvPrim, primRepToCgRep, newUnboxedTupleRegs |
Performing the return | emitReturnInstr, performReturn, returnUnboxedTuple, ccallReturnUnboxedTuple |
In summary the modules that get called in order to handle a specific expression case are:
CodeGen
Also called for top level bindings by CgCon | for StgConApp and the StgRhsCon part of StgLet |
---|---|
CgClosure | for the StgRhsClosure part of StgLet |
Core code generation
CgTailCall | for StgApp, StgLit, and StgOpApp |
---|---|
CgPrimOp | for StgOpApp |
CgLetNoEscapeClosure | for StgLetNoEscape |
CgCase | for StgCase |
Profiling and Code coverage related
CgProf | for StgSCC |
---|---|
CgHpc | for StgTick |
Utility modules that happen to have the functions for code generation
CgForeignCall | for StgOpApp |
---|---|
CgUtil | for cgLit |
Note that the first two are
the same modules that are called for top level bindings by CodeGen
,
and the last two are really utility modules,
but they happen to have the functions
needed for those code generation cases.
Memory and Register Management
CgBindery | Module for CgBindings which maps variable names to all the volitile or stable locations where they are stored (e.g. register, stack slot, computed from other expressions, etc.) Provides the addBindC, modifyBindC and getCgIdInfo functions for adding, modifying and looking up bindings. |
---|
CgStackery | Mostly utility functions for allocating and freeing stack slots. But also has things on setting up update frames. |
---|
CgHeapery | Functions for allocating objects that appear on the heap such as closures and constructors. Also includes code for stack and heap checks and emitSetDynHdr. |
---|
Function Calls and Parameter Passing
(Note: these will largely go away once CPS conversion is fully implemented.)
CgPrimOp, CgTailCall, CgForeignCall | Handle different types of calls. |
---|---|
CgCallConv | Use by the others in this category to determine liveness and to select in what registers and stack locations arguments and return values get stored. |
Misc utilities
Bitmap | Utility functions for making bitmaps (e.g. mkBitmap with type [Bool] -> Bitmap) |
---|---|
ClosureInfo | Stores info about closures and bindings. Includes information about memory layout, how to call a binding (LambdaFormInfo) and information used to build the info table (ClosureInfo). |
SMRep | Storage manager representation of closures. Part of ClosureInfo but kept separate to "keep nhc happy." |
CgUtils | TODO |
CgInfoTbls | TODO |
Special runtime support
CgTicky | Ticky-ticky profiling |
---|---|
CgProf | Cost-centre profiling |
CgHpc | Support for the Haskell Program Coverage (hpc) toolkit, inside GHC. |
CgParallel | Code generation for GranSim (GRAN) and parallel (PAR). All the functions are dead stubs except granYield and granFetchAndReschedule. |