... | ... | @@ -85,16 +85,32 @@ data GenCmmTop d h g |
|
|
Section -- Type
|
|
|
[d] -- Data
|
|
|
|
|
|
data BlockId = BlockId Unique
|
|
|
data GenBasicBlock i = BasicBlock BlockId [i]
|
|
|
type CmmBasicBlock = GenBasicBlock CmmStmt
|
|
|
|
|
|
newtype ListGraph i = ListGraph [[GenBasicBlock i]
|
|
|
|
|
|
type RawCmmTop = GenCmmmTop CmmStatic [CmmStatic] (ListGraph CmmStmt)
|
|
|
-- new type RawCmm = Cmm [RawCmmTop] : A list version of RawCmmTop, actual code is different, but its effectively this.
|
|
|
```
|
|
|
|
|
|
|
|
|
That is, it consists of two types, static data and functions. Each can largely be handled separately. Just enough information is needed such that pointers can be constructed to them and in many cases this information can be gathered from assumptions and constraints on Cmm.
|
|
|
|
|
|
|
|
|
After all the polymorphic types are bound we get this:
|
|
|
|
|
|
```wiki
|
|
|
RawCmm = [
|
|
|
CmmProc [CmmSatic] CLabel [LocalReg] [BlockId [CmmStmt]]
|
|
|
| CmmData Section [CmmStatic]
|
|
|
]
|
|
|
|
|
|
data Section = Text | Data | ReadOnlyData | RelocatableReadOnlyData | UninitialisedData | ReadOnlyData16 | OtherSection String
|
|
|
```
|
|
|
|
|
|
|
|
|
The code generator lives in `llvmGen` with the driver being `llvmGen/LlvmCodeGen.lhs`.
|
|
|
|
|
|
|
... | ... | @@ -113,6 +129,8 @@ data CmmReg |
|
|
= CmmLocal LocalReg
|
|
|
| CmmGlobal GlobalReg
|
|
|
deriving( Eq, Ord )
|
|
|
|
|
|
data LocalReg = LocalReg Unique CmmType
|
|
|
```
|
|
|
|
|
|
|
... | ... | @@ -228,6 +246,8 @@ data CmmLit |
|
|
| CmmLabelDiffOff CLabel CLabel Int -- &l1 - &l2 + offset
|
|
|
| CmmBlock BlockId -- address of code label
|
|
|
| CmmHighStackMark -- max stack space used during a procedure
|
|
|
|
|
|
data Width = W8 | W16 | W32 | W64 | W80 | W128
|
|
|
```
|
|
|
|
|
|
|
... | ... | @@ -286,4 +306,103 @@ Where i32 is the pointer size. (i64 if on 64 bit). |
|
|
|
|
|
## CmmProc
|
|
|
|
|
|
TODO |
|
|
\ No newline at end of file |
|
|
|
|
|
A Cmm procedure is made up of a list of basic blocks, with each basic block being comprised of a list of CmmStmt’s.
|
|
|
|
|
|
|
|
|
Code generation takes place mainly in `llvmGen/LlvmCodeGen/CodeGen.hs`, driven by the main Llvm compiler driver, {{llvmGen/LlvmCodeGen.lhs}}}.
|
|
|
|
|
|
|
|
|
While Cmm procedures include a specification for arguments and a return type there is in fact only one type used, that is a procedure which takes no arguments and returns void. The reason for this is that the STG registers are instead used for the purpose of argument passing and the returning of results.Another detail of the Cmm code produced by GHC is that
|
|
|
it doesn’t contain any return statements. Instead a style of code called continuation passing is used in which the control is explicitly passed in the form of a continuation, and all Cmm procedures produced by GHC are instead terminated by tail calls.
|
|
|
|
|
|
|
|
|
Below is the Haskell definition for Cmm statements and expressions.
|
|
|
|
|
|
```wiki
|
|
|
data CmmStmt
|
|
|
= CmmNop
|
|
|
| CmmComment FastString
|
|
|
| CmmAssign CmmReg CmmExpr
|
|
|
| CmmStore CmmExpr CmmExpr
|
|
|
| CmmCall CmmCallTarget HintedCmmFormals HintedCmmActuals CmmSaftey CmmReturnInfo
|
|
|
| CmmBranch BlockId
|
|
|
| CmmCondBranch CmmExpr BlockId
|
|
|
| CmmSwitch CmmExpr [Maybe BlockId]
|
|
|
| CmmJump CmmExpr HintedCmmActuals
|
|
|
|
|
|
data CmmExpr,
|
|
|
= CmmLit CmmLit
|
|
|
| CmmLoad CmmExpr CmmType
|
|
|
| CmmReg CmmReg
|
|
|
| CmmMachOp MachOp [CmmExpr]
|
|
|
| CmmStackSlot Area Int
|
|
|
| CmmRegOff CmmReg Int
|
|
|
|
|
|
type CmmFormals = [CmmFormal]
|
|
|
type CmmFormal = LocalReg
|
|
|
```
|
|
|
|
|
|
### CmmExpr
|
|
|
|
|
|
|
|
|
CmmExpr’s are handled in a relatively straight-forward manner. The most interesting aspect of their compilation to LLVM is the return type of functions in the LLVM back-end which
|
|
|
compile CmmExpr’s. This gives an idea of the compilation process, as while each expression must be handled differently, they all return the same type when compiled to LLVM code by
|
|
|
the back-end.
|
|
|
|
|
|
```wiki
|
|
|
-- Return type of LLVM fucntions that compile CmmExpr's
|
|
|
type ExprData = (LlvmEnv , LlvmVar , LlvmStatements , [LlvmCmmTop] )
|
|
|
```
|
|
|
|
|
|
- **LlvmEnv**: During code generation for an expression, an external Cmm Label may be encountered for the first time. An external reference for it will be created and return as part of the \[LlvmCmmTop\] list. It is also added to the current environment.
|
|
|
- **LlvmVar**: All expressions share the property that there execution results in a single value which can be stored in a variable. This LLVM local variable holds the result of the CmmExpr. This allows for statements to very easily use and access the result of an expression.
|
|
|
- **LlvmStatements**: A CmmExpr may require several LLVM statements to implement, they are returned in this list and must be executed before the LlvmVar is accessed.
|
|
|
- **\[LlvmCmmTop\]**: An externally declared Cmm Label can be encountered at any point as Cmm requires no external declaration. LLVM though requires that these labels do have an external declaration and in this list such declarations are returned. They add new global variables to the LLVM module.
|
|
|
|
|
|
### CmmStmt
|
|
|
|
|
|
|
|
|
Statements are also handled in a fairly straight-forward manner process involved can be detailed most simply by studying the return type of functions in the LLVM back-end which deal with compiling CmmStmt’s. Statements just as expressions also all return the same basic type when compiled to LLVM code by the back-end. This type is shown below.
|
|
|
|
|
|
```wiki
|
|
|
type StmtData = (LlvmEnv , [ LlvmStatement ] , [LlvmCmmTop ] )
|
|
|
```
|
|
|
|
|
|
- **LlvmEnv**: As compiling a Cmm statement usually involves also compiling a Cmm expression, this LLVM Environment performs the same purpose of returning an updated environment if new external Cmm Label’s have been encountered. This first case updates the environments global map, as a new global variable has been created. In the case of a CmmStore statement though, a Cmm local register may be encountered for the first time. It will be allocated on the stack and added to the local map of the environment.
|
|
|
- **LlvmStatements**: A CmmStatment is compiled to a list of LLVM Statements.
|
|
|
- **\[LlvmCmmTop\]:** Serves the same purpose as it does for Cmm expression code generation.
|
|
|
|
|
|
### Handling LLVM's SSA Form
|
|
|
|
|
|
|
|
|
Handling LLVM’s SSA Form One of the main difference between Cmm and LLVM Assembly is the requirement that LLVM Assembly be in single static assignment form. Thankfully, this is actually quite easy to handle. LLVM allows for data to be explicitly allocated on the stack, using its alloca instruction. This instruction provides an alternative to producing SSA formed code. If a mutable variable is needed, then it is allocated on the stack with alloca. The value returned from this instruction is a pointer to the stack memory and this memory location can be read from and written to just like any other memory location in LLVM by using the load and store instructions respectively. While this initially allocates all these variables on the stack and doesn’t use any registers, LLVM includes an optimisation pass called mem2reg which is designed to correct this, changing explicit stack allocation into SSA form instead which can use machine registers when compiled to native code. This approach to handling LLVM’s SSA form is in fact the method that the LLVM developers themselves recommend.
|
|
|
|
|
|
### Handling Registered Code
|
|
|
|
|
|
|
|
|
Handling registerised Cmm Code involves handling the pinning of the STG virtual registers and the TABLES_NEXT_TO_CODE optimisation.
|
|
|
|
|
|
|
|
|
To handle the TABLES_NEXT_TO_CODE optimisation, the LLVM back-end simply disables it. This can be done independent of enabling or disabling all of registered mode. This is done through by putting the following in your build.mk:
|
|
|
|
|
|
```wiki
|
|
|
GhcEnableTablesNextToCode = NO
|
|
|
```
|
|
|
|
|
|
|
|
|
To handle the pinning of the STG registers the LLVM back-end uses a custom calling convention that passes the first n arguments
|
|
|
of a function call in the specific registers that the STG registers should be pinned to. Then, whenever there is function call, then LLVM back-end generates a call with the correct STG
|
|
|
virtual registers as the first n arguments to that call. Why does this work? It works as it guarantees that on the entrance to any function, the STG registers are currently stored in the correct hardware registers. It also guarantees this on a function exit since all Cmm functions that GHC generates are exited by tail calls. In the function itself, the STG registers can be treated just like normal variables, read and written to at will.
|
|
|
|
|
|
|
|
|
The new calling convention was included by the LLVM developers in LLVM 2.7. It uses calling convention number 10. At the moment it supports x86-32/64.
|
|
|
|
|
|
## After Code Generation
|
|
|
|
|
|
|
|
|
After code generation there are three more stages, they are simply calls to the LLVM tools though:
|
|
|
|
|
|
- **LLVM Asssembler**: This is a very simple stage in which the human readable text version of LLVM assembly code is translated to the binary bitcode format. This is done by simply invoking the LLVM llvm-as tool on the stage input file.
|
|
|
- **LLVM Optimisation**: In this section a range of LLVM’s optimisations are applied to the bitcode file, resulting in a new optimised bitcode file. This is done by simply invoking the LLVM opt tool on the stage input file. The optimisations are selected using the standard optimisation groups of ’-O1’, ’-O2’, ’-O3’ provided by opt, depending on the level of optimisation requested by the user when they invoked GHC.
|
|
|
- **LLVM Compiler**: This is the final stage in which the input LLVM bitcode file is compiled to native assembly for the target machine. This is done by simply invoking the LLVM llc tool on the stage input file. |