Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • GHC GHC
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 4,827
    • Issues 4,827
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 450
    • Merge requests 450
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Releases
  • Analytics
    • Analytics
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
    • Value stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Glasgow Haskell Compiler
  • GHCGHC
  • Wiki
  • Commentary
  • Compiler
  • old code gen

Last edited by Norman Ramsey Oct 13, 2021
Page history New page

old code gen

Old Code Generator (prior to GHC 7.8)

Material below describes old code generator that was used up to GHC 7.6 and was retired in 2012. This page is not maintained and is here only for historical purposes. See Code generator page for an up to date description of the current code generator.

Storage manager representations

See The Storage Manager for the Layout of the stack.

The code generator needs to know the layout of heap objects, because it generates code that accesses and constructs those heap objects. The runtime also needs to know about the layout of heap objects, because it contains the garbage collector. How can we share the definition of storage layout such that the code generator and the runtime both have access to it, and so that we don't have to keep two independent definitions in sync?

Currently we solve the problem this way:

  • C types representing heap objects are defined in the C header files, see for example includes/rts/storage/Closures.h.

  • A C program, includes/mkDerivedConstants.c, #includes the runtime headers. This program is built and run when you type make or make boot in includes/. It is run twice: once to generate includes/DerivedConstants.h, and again to generate includes/GHCConstants.h.

  • The file DerivedConstants.h contains lots of #defines like this:

    #define OFFSET_StgTSO_why_blocked 18

    which says that the offset to the why_blocked field of an StgTSO is 18 bytes. This file is #included into includes/Cmm.h, so these offests are available to the hand-written .cmm files.

  • The file GHCConstants.h contains similar definitions:

    oFFSET_StgTSO_why_blocked = 18::Int

    This time the definitions are in Haskell syntax, and this file is #included directly into compiler/main/Constants.lhs. This is the way that these offsets are made available to GHC's code generator.

Generated Cmm Naming Convention

See compiler/GHC/Cmm/CLabel.hs

Labels generated by the code generator are of the form <name>_<type> where <name> is <Module>_<name> for external names and <unique> for internal names. <type> is one of the following:

info Info table
srt Static reference table
srtd Static reference table descriptor
entry Entry code (function, closure)
slow Slow entry code (if any)
ret Direct return address
vtbl Vector table
n_alt Case alternative (tag n)
dflt Default case alternative
btm Large bitmap vector
closure Static closure
con_entry Dynamic Constructor entry code
con_info Dynamic Constructor info table
static_entry Static Constructor entry code
static_info Static Constructor info table
sel_info Selector info table
sel_entry Selector entry code
cc Cost centre
ccs Cost centre stack

Many of these distinctions are only for documentation reasons. For example, _ret is only distinguished from _entry to make it easy to tell whether a code fragment is a return point or a closure/function entry.

Modules

CodeGen

Top level, only exports codeGen.

Called from HscMain for each module that needs to be converted from Stg to Cmm.

For each such module codeGen does three things:

  • cgTopBinding for the StgBinding
  • cgTyCon for the TyCon (These are constructors not constructor calls).
  • mkModuleInit for the module

mkModuleInit generates several boilerplate initialization functions that:

  • regiser the module,
  • creates an Hpc table,
  • setup its profiling info (InitConstCentres, code coverage info initHpc), and
  • calls the initialization functions of the modules it imports.

If neither SCC profiling or HPC are used, then the initialization code short circuits to return.

If the module has already been initialized, the initialization function just returns.

The Ghc.TopHandler and Ghc.Prim modules get special treatment.

cgTopBinding is a small wrapper around cgTopRhs which in turn disptaches to:

  • cgTopRhsCons for StgRhsCons (these are bindings of constructor applications not constructors themselves) and
  • cgTopRhsClosure for StgRhsClosure.

cgTopRhsCons and cgTopRhsClosure are located in CgCon and CgClosure which are the primary modules called by CodeGen.

CgCon

TODO

CgClosure

TODO

CgMonad

The monad that most of codeGen operates inside

  • Reader
  • State
  • (could be Writer?)
  • fork
  • flatten

CgExpr

Called by CgClosure and CgCon.

Since everything in STG is an expression, almost everything branches off from here.

This module exports only one function cgExpr, which for the most part just dispatches to other functions to handle each specific constructor in StgExpr.

Here are the core functions that each constructor is disptached to (though some may have little helper functions called in addition to the core function):

StgApp Calls to cgTailCall in CgTailCall
StgConApp Calls to cgReturnDataCon in CgCon
StgLit Calls to cgLit in CgUtil and performPrimReturn in CgTailCall
StgOpApp Is a bit more complicated see below.
StgCase Calls to cgCase in CgCase
StgLet Calls to cgRhs in CgExpr
StgLetNoEscape Calls to cgLetNoEscapeBindings in CgExpr, but with a little bit of wrapping by nukeDeadBindings and saveVolatileVarsAndRegs.
StgSCC Calls to emitSetCCC in CgProf
StgTick Calls to cgTickBox in CgHpc
StgLam Does not have a case because it is only for CoreToStg's work.

Some of these cases call to functions defined in cgExpr. This is because they need a little bit of wrapping and processing before calling out to their main worker function.

cgRhs
  • For StgRhsCon calls out to buildDynCon in CgCon.
  • For StgRhsClosure calls out to mkRhsClosure. In turn, mkRhsClosure calls out to cgStdRhsClosure for selectors and thunks, and calls out to cgRhsClosure in the default case. Both these are defined in CgClosure.
cgLetNoEscapeBindings
  • Wraps a call to cgLetNoEscapeRhs with addBindsC depending on whether it is called on a recursive or a non-recursive binding. In turn cgLetNoEscapeRhs wraps cgLetNoEscapeClosure defined in CgLetNoEscapeClosure.

StgOpApp has a number of sub-cases.

  • StgFCallOp
  • StgPrimOp of a TagToEnumOp
  • StgPrimOp that is primOpOutOfLine
  • StgPrimOp that returns Void
  • StgPrimOp that returns a single primitive
  • StgPrimOp that returns an unboxed tuple
  • StgPrimOp that returns an enumeration type

(It appears that non-foreign-call, inline PrimOps are not allowed to return complex data types (e.g. a |Maybe|), but this fact needs to be verified.)

Each of these cases centers around one of these three core calls:

  • emitForeignCall in CgForeignCall
  • tailCallPrimOp in CgTailCall
  • cgPrimOp in CgPrimOp

There is also a little bit of argument and return marshelling with the following functions

Argument marshelling shimForeignCallArg, getArgAmods
Return marshelling dataReturnConvPrim, primRepToCgRep, newUnboxedTupleRegs
Performing the return emitReturnInstr, performReturn, returnUnboxedTuple, ccallReturnUnboxedTuple

In summary the modules that get called in order to handle a specific expression case are:

Also called for top level bindings by CodeGen

CgCon for StgConApp and the StgRhsCon part of StgLet
CgClosure for the StgRhsClosure part of StgLet

Core code generation

CgTailCall for StgApp, StgLit, and StgOpApp
CgPrimOp for StgOpApp
CgLetNoEscapeClosure for StgLetNoEscape
CgCase for StgCase

Profiling and Code coverage related

CgProf for StgSCC
CgHpc for StgTick

Utility modules that happen to have the functions for code generation

CgForeignCall for StgOpApp
CgUtil for cgLit

Note that the first two are the same modules that are called for top level bindings by CodeGen, and the last two are really utility modules, but they happen to have the functions needed for those code generation cases.

Memory and Register Management

CgBindery Module for CgBindings which maps variable names to all the volitile or stable locations where they are stored (e.g. register, stack slot, computed from other expressions, etc.) Provides the addBindC, modifyBindC and getCgIdInfo functions for adding, modifying and looking up bindings.
CgStackery Mostly utility functions for allocating and freeing stack slots. But also has things on setting up update frames.
CgHeapery Functions for allocating objects that appear on the heap such as closures and constructors. Also includes code for stack and heap checks and emitSetDynHdr.

Function Calls and Parameter Passing

(Note: these will largely go away once CPS conversion is fully implemented.)

CgPrimOp, CgTailCall, CgForeignCall Handle different types of calls.
CgCallConv Use by the others in this category to determine liveness and to select in what registers and stack locations arguments and return values get stored.

Misc utilities

Bitmap Utility functions for making bitmaps (e.g. mkBitmap with type [Bool] -> Bitmap)
ClosureInfo Stores info about closures and bindings. Includes information about memory layout, how to call a binding (LambdaFormInfo) and information used to build the info table (ClosureInfo).
SMRep Storage manager representation of closures. Part of ClosureInfo but kept separate to "keep nhc happy."
CgUtils TODO
CgInfoTbls TODO

Special runtime support

CgTicky Ticky-ticky profiling
CgProf Cost-centre profiling
CgHpc Support for the Haskell Program Coverage (hpc) toolkit, inside GHC.
CgParallel Code generation for GranSim (GRAN) and parallel (PAR). All the functions are dead stubs except granYield and granFetchAndReschedule.
Clone repository Edit sidebar

GHC Home
GHC User's Guide

Joining In

Newcomers info
Mailing Lists & IRC
The GHC Team

Documentation

GHC Status Info
Working conventions
Building Guide
Debugging
Commentary

Wiki

Title Index
Recent Changes