|
|
# I know kung fu: learning STG by example
|
|
|
|
|
|
|
|
|
The STG machine is an essential part of GHC, the world's leading
|
|
|
Haskell compiler. It defines how the Haskell evaluation model
|
|
|
should be efficiently implemented on standard hardware. Despite
|
|
|
this key role, it is generally poorly understood amongst GHC users.
|
|
|
This document aims to provide an overview of the STG machine in
|
|
|
it's modern, eval/apply-based, pointer-tagged incarnation by a
|
|
|
series of simple examples showing how Haskell source code is
|
|
|
compiled.
|
|
|
|
|
|
## What is STG, exactly?
|
|
|
|
|
|
|
|
|
Haskell code being sucked through GHC has a complex lifecycle.
|
|
|
Broadly speaking, it transitions between five representations:
|
|
|
|
|
|
```wiki
|
|
|
|
|
|
+---------+
|
|
|
LLVM backend /--->| LLVM IR |--\
|
|
|
| +---------+ | LLVM
|
|
|
| v
|
|
|
+------------+ Desugar +------+ STGify +-----+ CodeGen +-----+ | NCG +----------+
|
|
|
| Parse tree |--------->| Core |-------->| STG |--------->| C-- |----+-------->| Assembly |
|
|
|
+------------+ +------+ +-----+ +-----+ | +----------+
|
|
|
| ^
|
|
|
| +---+ | GCC
|
|
|
C backend \---->| C |--/
|
|
|
+---+
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
The path from C-- to assembly varies: the three possible backends
|
|
|
are C (`-fvia-c`), LLVM (`-fllvm`), and the default backend -- the
|
|
|
native code genarator (or NCG), which generates assembly directly
|
|
|
from the GHC-internal C-- data type.
|
|
|
|
|
|
|
|
|
STG is a simple functional language, rather like the more famous
|
|
|
Core language. It differs in the following main respects:
|
|
|
|
|
|
1. In its current incarnation, it isn't typed in the Haskell sense,
|
|
|
though it does know about *representation* types
|
|
|
1. It is in administrative normal form (ANF), which is where every
|
|
|
subexpression is given a name
|
|
|
1. Every $\\lambda$, constructor application, and primitive operator
|
|
|
is $\\eta$-expanded
|
|
|
1. It is annotated with a ton of information that the code
|
|
|
generator is interested in knowing
|
|
|
|
|
|
|
|
|
STG expressions can be one of the following:
|
|
|
|
|
|
1. Atoms (i.e. literals and variables)
|
|
|
1. `let`-bindings (both recursive and non-recursive) over another
|
|
|
expression, where let-bound things are one of:
|
|
|
|
|
|
- A function value with explicit lambdas
|
|
|
- An unsaturated application
|
|
|
- A constructor applied to atoms
|
|
|
- A thunk (i.e. any expression not fitting into one of the above
|
|
|
categories)
|
|
|
|
|
|
1. Saturated primitive application of a primitive to variables
|
|
|
1. Application of a variable to one or more atoms
|
|
|
1. Case deconstruction of an expression, where each branch may also
|
|
|
be an expression
|
|
|
|
|
|
|
|
|
The job of the *STG machine* is to evaluate these expressions in
|
|
|
a way which is efficiently implementable on standard hardware. This
|
|
|
document will look at how exactly this is achieved by looking at
|
|
|
real examples of the C-- code GHC generates for various Haskell
|
|
|
expressions.
|
|
|
|
|
|
|
|
|
This document will take a very low-level view of the machine, so if
|
|
|
you want to get comfortable with how the STG machine executes at a
|
|
|
more abstract level before reading this document, you might want to
|
|
|
read the paper
|
|
|
[ How to make a fast curry: push/enter vs. eval/apply](http://research.microsoft.com/en-us/um/people/simonpj/papers/eval-apply/).
|
|
|
It presents the STG machine without reference to an explicit stack
|
|
|
or registers, but instead as a transition system. This transition
|
|
|
system has also been implemented as a Haskell program called
|
|
|
[ ministg](http://hackage.haskell.org/package/ministg) by
|
|
|
[ Bernie Pope](http://ww2.cs.mu.oz.au/~bjpop/), for those who wish to
|
|
|
see it in action on some simple examples.
|
|
|
|
|
|
## An overview of the STG machine
|
|
|
|
|
|
|
|
|
Before we dive in, a note: this document will describe the STG
|
|
|
machine as it is implemented on x86-style architectures. I will use
|
|
|
the terms "the STG machine" and "the STG machine as implemented on
|
|
|
x86 by GHC" interchangeably. The implementation is somewhat
|
|
|
different on x64, not least due to the greater number of available
|
|
|
registers.
|
|
|
|
|
|
|
|
|
This overview section is rather bare. Readers might be able to fill
|
|
|
in any gaps in my explanation by using some of the following
|
|
|
sources:
|
|
|
|
|
|
- [ The Haskell Execution Model](http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution)
|
|
|
- [ Storage](http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage)
|
|
|
- [ The Spineless Tagless G-machine](http://research.microsoft.com/en-us/um/people/simonpj/Papers/spineless-tagless-gmachine.ps.gz)
|
|
|
|
|
|
- now sadly rather out of date
|
|
|
- [ Faster laziness through dynamic pointer tagging](http://research.microsoft.com/en-us/um/people/simonpj/papers/ptr-tag/ptr-tagging.pdf)
|
|
|
|
|
|
### Components of the machine
|
|
|
|
|
|
|
|
|
In its bare essentials, the STG machine consists of three parts:
|
|
|
|
|
|
1. The STG registers:
|
|
|
|
|
|
- There are rather a lot of registers here: more than can be
|
|
|
practicably stored in actual available processor registers on most
|
|
|
architectures.
|
|
|
- To deal with the lack of processor registers, most of the STG
|
|
|
registers are actually kept on the stack in a block of memory
|
|
|
pointed to by a special STG register called the "base register" (or
|
|
|
`BaseReg`). To get or set values of registers which are not kept in
|
|
|
processor registers, the STG machine generates an instruction to
|
|
|
load or store from an address relative to the `BaseReg`.
|
|
|
- The most important four registers are the `BaseReg`, the stack
|
|
|
pointer (`Sp`), the heap pointer (`Hp`), and the general purpose
|
|
|
register `R1` which is used for intermediate values, as well as for
|
|
|
returning evaluated values when unwinding the stack. These are the
|
|
|
four registers which are assigned actual processor registers when
|
|
|
implementing the STG machine on x86.
|
|
|
- The STG stack:
|
|
|
|
|
|
- Stores function arguments and continuations (i.e. the stack
|
|
|
frames which are executed when a function returns)
|
|
|
- Grows downwards in memory
|
|
|
- The top of the stack is pointed to by the STG register `Sp`, and
|
|
|
the maximum available stack pointer is stored in `SpLim`. There is
|
|
|
no frame pointer.
|
|
|
|
|
|
- The heap:
|
|
|
|
|
|
- Used to store many different sorts of heap object: notably
|
|
|
functions, thunks and data constructors
|
|
|
- Grows upwards in memory, towards the stack
|
|
|
- All allocation occurs using a bump-allocator: the heap pointer is
|
|
|
simply incremented by the number of bytes desired (subject to to a
|
|
|
check that this does not exhaust available memory). The garbage
|
|
|
collector is responsible for moving objects out of the area of the
|
|
|
heap managed by the bump allocator and into the care of its
|
|
|
generational collector.
|
|
|
- The last address in the bump-allocated part of the heap that has
|
|
|
been used is pointed to by the STG register `Hp`, with `HpLim`
|
|
|
holding the maximum address available for bump-allocation.
|
|
|
|
|
|
### Important concepts in the machine
|
|
|
|
|
|
|
|
|
Some of the key concepts in the STG machine include *closures*,
|
|
|
*info tables* and *entry code*. We tackle them in reverse
|
|
|
order:
|
|
|
|
|
|
<table><tr><th>Entry code</th>
|
|
|
<td>
|
|
|
The actual machine code that the STG machine will execute upon
|
|
|
"entry". Entry means different things for different heap objects.
|
|
|
</td></tr></table>
|
|
|
|
|
|
- For *thunks*, entry is when the thunk is forced by some demand
|
|
|
for its value, such as a `case` expression scrutinising it
|
|
|
- For *functions*, entry is when the function is applied to as
|
|
|
many arguments as are demanded by the arity recorded in its info
|
|
|
table
|
|
|
- For *continuations*, entry occurs when a value is returned from
|
|
|
a nested call, and hence the need arises to consume the value and
|
|
|
continue evaluation
|
|
|
|
|
|
<table><tr><th>Info table</th>
|
|
|
<td>
|
|
|
A block of memory allocated statically, which contains metadata
|
|
|
about a closure. The most important fields for our purposes are the
|
|
|
entry code pointer and the arity information (if this is the info
|
|
|
table for a thunk, function or partial application)
|
|
|
</td></tr></table>
|
|
|
|
|
|
<table><tr><th>Closure</th>
|
|
|
<td>
|
|
|
Essentially a heap-allocated pair of the free variables of some
|
|
|
code, and a pointer to its info table (i.e. its info pointer).
|
|
|
</td></tr></table>
|
|
|
|
|
|
|
|
|
For an example of how these parts work together, consider the
|
|
|
following code
|
|
|
|
|
|
```wiki
|
|
|
my_fun x zs = map (\y -> y + x) zs
|
|
|
```
|
|
|
|
|
|
|
|
|
The nested lambda will give rise to all of the above objects.
|
|
|
|
|
|
|
|
|
The closure will store a pointer to `x`'s closure (as it is a free
|
|
|
variable of the lambda), along with a pointer to an info table.
|
|
|
That info table will contain information relevant to a function
|
|
|
value, recording information such as the fact that it has an arity
|
|
|
of 1 (i.e. the binding for `y`), and the pointer to the entry code
|
|
|
for the function `\y -> y + x` itself. This entry code will
|
|
|
implement the addition by combining the closure for the free
|
|
|
variable `x` (taken from the closure) with the stack-passed `y`
|
|
|
variable's closure.
|
|
|
|
|
|
|
|
|
Upon entry to some code, pointers to closures are made available in
|
|
|
`R1`. That is to say, before entry code is jumped to, `R1` is set
|
|
|
up to point to the associated closure, so that the entry code can
|
|
|
access free variables (if any).
|
|
|
|
|
|
|
|
|
Closures for code which contain no free variables (such as the
|
|
|
closure for `True` and `False`, and functions applied to no
|
|
|
arguments such as `(:)` and `id`) are allocated statically by the
|
|
|
compiler in the same manner as info tables are.
|
|
|
|
|
|
### Overview of execution model of the machine
|
|
|
|
|
|
|
|
|
This will be covered in more detail in the examples below, so I
|
|
|
will use this section to make some general points.
|
|
|
|
|
|
|
|
|
The goal of the STG machine is to reduce the current expression to
|
|
|
a value. When it has done so, it:
|
|
|
|
|
|
1. Stores a tagged pointer to evaluated closure in the STG register
|
|
|
`R1`
|
|
|
1. Jumps to the entry code of the info table pointed to by the
|
|
|
value at the top of the STG stack
|
|
|
|
|
|
- This may also be called the info table of the *continuation* of
|
|
|
the expression
|
|
|
|
|
|
|
|
|
The continuation code is responsible for popping its info pointer
|
|
|
(and stack-allocated free variables, if any) from the stack before
|
|
|
returning.
|
|
|
|
|
|
|
|
|
Arguments are passed on the stack, and are popped by the callee.
|
|
|
Upon a jump to the entry code for a function, there are always
|
|
|
precisely as many arguments on the stack as the (statically known)
|
|
|
arity of that function, and those arguments will be followed by the
|
|
|
info pointer of a continuation.
|
|
|
|
|
|
## Saturated application to known functions
|
|
|
|
|
|
|
|
|
Handling application in the STG machine is a big topic, and so in
|
|
|
this first section we only look at the case of *saturated*
|
|
|
applications to *known* functions - i.e. those functions that the
|
|
|
compiler statically knows information such as the entry code
|
|
|
pointer and arity for.
|
|
|
|
|
|
### Example 1: function application with sufficient stack space
|
|
|
|
|
|
|
|
|
Application of functions is the bread and butter of the STG
|
|
|
machine. Correspondingly, this first Haskell program
|
|
|
|
|
|
```wiki
|
|
|
{-# NOINLINE known_fun #-}
|
|
|
known_fun :: a -> a
|
|
|
known_fun x = x
|
|
|
|
|
|
known_app :: () -> Int
|
|
|
known_app _ = known_fun 10
|
|
|
```
|
|
|
|
|
|
|
|
|
compiles to very simple C-- code
|
|
|
|
|
|
```
|
|
|
Main_knownzuapp_entry(){cl3:
|
|
|
I32[Sp +0]= stg_INTLIKE_closure+209;
|
|
|
jump Main_knownzufun_entry();}
|
|
|
```
|
|
|
|
|
|
```wiki
|
|
|
_Main_knownzuapp_entry:
|
|
|
Lcl3:
|
|
|
movl L_stg_INTLIKE_closure$non_lazy_ptr,%eax
|
|
|
addl $209,%eax
|
|
|
movl %eax,(%ebp)
|
|
|
jmp _Main_knownzufun_entry
|
|
|
```
|
|
|
|
|
|
|
|
|
The STG machine passes arguments to functions on the STG stack, and
|
|
|
a pointer to the stack top is stored in the STG register `Sp`.
|
|
|
Furthermore, because GHC currently uses the eval/apply variant of
|
|
|
the STG machine, exactly as many arguments as the function expects
|
|
|
to receive are guaranteed to present on the stack.
|
|
|
|
|
|
|
|
|
Therefore, upon entry to the `known_app` function, we are
|
|
|
guaranteed that the STG stack has a pointer to a closure of type
|
|
|
`()` on top of it. In order to call `known_fun`, we just modify the
|
|
|
top of the stack to replace that pointer with a pointer to the
|
|
|
statically allocated closure for the literal `10`, and then
|
|
|
tail-call into the entry code of `known_fun`.
|
|
|
|
|
|
### Example 2: function application that needs to grow the stack
|
|
|
|
|
|
|
|
|
This Haskell code is apparently little more complicated than the
|
|
|
previous example
|
|
|
|
|
|
```wiki
|
|
|
{-# NOINLINE known_fun_2 #-}
|
|
|
known_fun_2 :: a -> a -> a
|
|
|
known_fun_2 x _ = x
|
|
|
|
|
|
known_app_2 :: () -> Int
|
|
|
known_app_2 _ = known_fun_2 10 10
|
|
|
```
|
|
|
|
|
|
|
|
|
however, it generates radically different C-- code:
|
|
|
|
|
|
```
|
|
|
Main_knownzuappzu2_entry(){clE:if(Sp -4< SpLim)goto clH;
|
|
|
I32[Sp +0]= stg_INTLIKE_closure+209;
|
|
|
I32[Sp -4]= stg_INTLIKE_closure+209;
|
|
|
Sp = Sp -4;
|
|
|
jump Main_knownzufunzu2_entry();clH:
|
|
|
R1 = Main_knownzuappzu2_closure;
|
|
|
jump stg_gc_fun();}
|
|
|
```
|
|
|
|
|
|
```wiki
|
|
|
_Main_knownzuappzu2_entry:
|
|
|
LclE:
|
|
|
leal -4(%ebp),%eax
|
|
|
cmpl 84(%ebx),%eax
|
|
|
jb LclH
|
|
|
movl L_stg_INTLIKE_closure$non_lazy_ptr,%eax
|
|
|
addl $209,%eax
|
|
|
movl %eax,(%ebp)
|
|
|
movl L_stg_INTLIKE_closure$non_lazy_ptr,%eax
|
|
|
addl $209,%eax
|
|
|
movl %eax,-4(%ebp)
|
|
|
addl $-4,%ebp
|
|
|
jmp _Main_knownzufunzu2_entry
|
|
|
LclH:
|
|
|
movl $_Main_knownzuappzu2_closure,%esi
|
|
|
jmp *-4(%ebx)
|
|
|
```
|
|
|
|
|
|
|
|
|
As before, upon entry the STG stack is guaranteed to have a single
|
|
|
closure pointer at its top. However, in order to call into
|
|
|
known_fun_2 we need at least two free stack slots at the top for
|
|
|
arguments, which means that we have to grow the stack by one word
|
|
|
before we can make the call.
|
|
|
|
|
|
#### Checking for sufficient stack space
|
|
|
|
|
|
|
|
|
First, we check to see if growing the stack would overflow
|
|
|
allocated stack space, by comparing the STG stack pointer register
|
|
|
`Sp` with the stack limit register `SpLim`:
|
|
|
|
|
|
```
|
|
|
if(Sp -4< SpLim)goto clH;
|
|
|
```
|
|
|
|
|
|
|
|
|
(The stack grows downwards, hence the *subtraction* of 4 from the
|
|
|
current `Sp`). If the stack check fails, we branch to `clH`:
|
|
|
|
|
|
```
|
|
|
clH:
|
|
|
R1 = Main_knownzuappzu2_closure;
|
|
|
jump stg_gc_fun();
|
|
|
```
|
|
|
|
|
|
|
|
|
This stores the closure of the current function in `R1`, and then
|
|
|
jumps into the hand-written garbage collector code to force it to
|
|
|
grow the stack. After the stack has been grown, the collector will
|
|
|
call back into `Main_knownzuappzu2_entry` by using the information
|
|
|
stored in the (statically-allocated) `Main_knownzuappzu2_closure`
|
|
|
closure pointed to by `R1`, and the stack check will be run again -
|
|
|
hopefully succeeding this time!
|
|
|
|
|
|
#### Making the known call
|
|
|
|
|
|
|
|
|
Given that the stack check succeeds, it is easy to make the actual
|
|
|
call we are after. We simply grow the stack by the required amount,
|
|
|
and write the two arguments to `known_fun_2` into the top two stack
|
|
|
slots (overwriting our own first argument in the process, of
|
|
|
course):
|
|
|
|
|
|
```
|
|
|
I32[Sp +0]= stg_INTLIKE_closure+209;
|
|
|
I32[Sp -4]= stg_INTLIKE_closure+209;
|
|
|
Sp = Sp -4;
|
|
|
```
|
|
|
|
|
|
|
|
|
A simple tail call to the new function finishes us off:
|
|
|
|
|
|
```
|
|
|
jump Main_knownzufunzu2_entry();
|
|
|
```
|
|
|
|
|
|
## Example 3: Unsaturated applications to known functions
|
|
|
|
|
|
|
|
|
Despite describing an undersaturated call, this Haskell code
|
|
|
|
|
|
```wiki
|
|
|
{-# NOINLINE known_fun_2 #-}
|
|
|
known_fun_2 :: a -> a -> a
|
|
|
known_fun_2 x _ = x
|
|
|
|
|
|
known_undersaturated_app :: () -> Int -> Int
|
|
|
known_undersaturated_app _ = known_fun_2 10
|
|
|
```
|
|
|
|
|
|
|
|
|
compiles to straightforward C-- as follows
|
|
|
|
|
|
```
|
|
|
Main_knownzuundersaturatedzuapp_entry(){cmd:
|
|
|
I32[Sp +0]= stg_INTLIKE_closure+209;
|
|
|
jump Main_knownzufunzu2_entry();}
|
|
|
```
|
|
|
|
|
|
```wiki
|
|
|
_Main_knownzuundersaturatedzuapp_entry:
|
|
|
Lcmd:
|
|
|
movl L_stg_INTLIKE_closure$non_lazy_ptr,%eax
|
|
|
addl $209,%eax
|
|
|
movl %eax,(%ebp)
|
|
|
jmp _Main_knownzufunzu2_entry
|
|
|
```
|
|
|
|
|
|
|
|
|
The reason that there is no special magic to deal with
|
|
|
undersaturated applications to known functions is simple: GHC
|
|
|
simply gives `known_undersaturated_app` an arity of 2, so by the
|
|
|
time we jump to the entry code the stack must already contain any
|
|
|
arguments required by `known_fun_2`.
|
|
|
|
|
|
## Example 4: Applications to unknown functions
|
|
|
|
|
|
|
|
|
We aren't going to tackle oversaturated calls to known functions
|
|
|
until we've considered happens to calls to statically-unknown
|
|
|
functions. To see what these look like, we are going to use the
|
|
|
following Haskell code
|
|
|
|
|
|
```wiki
|
|
|
unknown_app :: (Int -> Int) -> Int -> Int
|
|
|
unknown_app f x = f x
|
|
|
```
|
|
|
|
|
|
|
|
|
Which compiles to this C-- function
|
|
|
|
|
|
```
|
|
|
Main_unknownzuapp_entry(){cnO:
|
|
|
R1 = I32[Sp +0];
|
|
|
Sp = Sp +4;
|
|
|
jump stg_ap_p_fast();}
|
|
|
```
|
|
|
|
|
|
```wiki
|
|
|
_Main_unknownzuapp_entry:
|
|
|
Lcn0:
|
|
|
movl (%ebp),%esi
|
|
|
addl $4,%ebp
|
|
|
jmp _stg_ap_p_fast
|
|
|
```
|
|
|
|
|
|
|
|
|
Unlike the previous cases we have looked at, we are compiling an
|
|
|
application where we don't statically know either the arity or the
|
|
|
info pointer of the function being applied. To deal with such
|
|
|
cases, the STG machine uses several pre-compiled "generic apply"
|
|
|
functions which inspect the info-table for the function in question
|
|
|
and decide how the available arguments should be applied to it.
|
|
|
|
|
|
### Dealing with generic application
|
|
|
|
|
|
|
|
|
There are three cases the generic apply functions have to deal
|
|
|
with:
|
|
|
|
|
|
1. The function's arity (recorder in the function closure's info
|
|
|
table) exactly matches the number of arguments available on the
|
|
|
stack
|
|
|
|
|
|
- This is the best case. In this case, the generic apply function
|
|
|
simply makes a tail call into the function's entry code
|
|
|
|
|
|
1. The function's arity is greater than the number of arguments
|
|
|
available on the stack
|
|
|
|
|
|
- In this case, the generic apply code allocates a PAP (partial
|
|
|
application) closure which closes over both the new arguments and
|
|
|
the function pointer, and returns that value, in the normal STGish
|
|
|
way, to the continuation on the top of the stack
|
|
|
|
|
|
1. The function's arity is less than the number of arguments
|
|
|
available on the stack
|
|
|
|
|
|
- In this case, a number of arguments matching the arity are pushed
|
|
|
on top of the stack, followed by a continuation which uses another
|
|
|
of the generic apply functions to apply the remaining arguments.
|
|
|
The code for the original function is then entered
|
|
|
- Eventually the code for the continuation is entered and another
|
|
|
generic apply function will be tail-called to deal with the
|
|
|
result
|
|
|
|
|
|
|
|
|
Potentially, one generic apply function is required for every
|
|
|
"argument pattern". Some example argument patterns are:
|
|
|
|
|
|
|
|
|
Because the number of patterns is large (actually unbounded,
|
|
|
because functions might be of any arity), GHC only generates
|
|
|
generic apply functions for enough patterns so that 99.9% of all
|
|
|
calls observed in practice have a generic apply function. Generic
|
|
|
apply functions for calls of larger arity can be simulated by
|
|
|
chaining together several smaller generic apply functions, in a
|
|
|
similar manner as when dealing with oversaturated function
|
|
|
applications.
|
|
|
|
|
|
### Making the call to the generic application code
|
|
|
|
|
|
|
|
|
Let's remind ourselves of the original code:
|
|
|
|
|
|
```
|
|
|
R1 = I32[Sp +0];
|
|
|
Sp = Sp +4;
|
|
|
jump stg_ap_p_fast();
|
|
|
```
|
|
|
|
|
|
|
|
|
Knowing about generic apply functions, the call itself is easy to
|
|
|
understand. We pop the top of the stack (the function argument)
|
|
|
into `R1` and then jump into the generic application code for the
|
|
|
case where the stack contains a single pointer argument, which
|
|
|
deals with all the cases for `f` described above.
|
|
|
|
|
|
## Example 5: oversaturated applications to known functions
|
|
|
|
|
|
|
|
|
This Haskell code
|
|
|
|
|
|
```wiki
|
|
|
{-# NOINLINE known_fun_2 #-}
|
|
|
known_fun_2 :: a -> a -> a
|
|
|
known_fun_2 x _ = x
|
|
|
|
|
|
known_oversat_app :: () -> Int
|
|
|
known_oversat_app _ = known_fun_2 id id 10
|
|
|
```
|
|
|
|
|
|
|
|
|
compiles to the following C-- function
|
|
|
|
|
|
```
|
|
|
Main_knownzuoversatzuapp_entry(){cmj:if(Sp -12< SpLim)goto cmm;
|
|
|
I32[Sp +0]= stg_INTLIKE_closure+209;
|
|
|
I32[Sp -4]= stg_ap_p_info;
|
|
|
I32[Sp -8]= base_GHCziBase_id_closure;
|
|
|
I32[Sp -12]= base_GHCziBase_id_closure;
|
|
|
Sp = Sp -12;
|
|
|
jump Main_knownzufunzu2_entry();cmm:
|
|
|
R1 = Main_knownzuoversatzuapp_closure;
|
|
|
jump stg_gc_fun();}
|
|
|
```
|
|
|
|
|
|
```wiki
|
|
|
_Main_knownzuoversatzuapp_entry:
|
|
|
Lcmj:
|
|
|
leal -12(%ebp),%eax
|
|
|
cmpl 84(%ebx),%eax
|
|
|
jb Lcmm
|
|
|
movl L_stg_INTLIKE_closure$non_lazy_ptr,%eax
|
|
|
addl $209,%eax
|
|
|
movl %eax,(%ebp)
|
|
|
movl L_stg_ap_p_info$non_lazy_ptr,%eax
|
|
|
movl %eax,-4(%ebp)
|
|
|
movl $_base_GHCziBase_id_closure,-8(%ebp)
|
|
|
movl $_base_GHCziBase_id_closure,-12(%ebp)
|
|
|
addl $-12,%ebp
|
|
|
jmp _Main_knownzufunzu2_entry
|
|
|
Lcmm:
|
|
|
movl $_Main_knownzuoversatzuapp_closure,%esi
|
|
|
jmp *-4(%ebx)
|
|
|
```
|
|
|
|
|
|
|
|
|
As you might see, despite being a call to a known function, this
|
|
|
code makes use of the generic apply functions we discussed in the
|
|
|
last section. Let's pick the function apart and see how it works.
|
|
|
|
|
|
|
|
|
First, we do the usual stack check. What differs from the last time
|
|
|
we saw this check is that we are not only allocating space for
|
|
|
arguments on the stack, but also for a *continuation*. We set up
|
|
|
these new stack entries as follows:
|
|
|
|
|
|
```
|
|
|
I32[Sp +0]= stg_INTLIKE_closure+209;
|
|
|
I32[Sp -4]= stg_ap_p_info;
|
|
|
I32[Sp -8]= base_GHCziBase_id_closure;
|
|
|
I32[Sp -12]= base_GHCziBase_id_closure;
|
|
|
Sp = Sp -12;
|
|
|
```
|
|
|
|
|
|
|
|
|
i.e. the final stack looks as follows (note that the code
|
|
|
overwrites the old pointer to a closure of type ()):
|
|
|
|
|
|
```wiki
|
|
|
|
|
|
/----\ +---------------------------+
|
|
|
| Sp |--->| base_GHCziBase_id_closure |
|
|
|
\----/ +---------------------------+
|
|
|
| base_GHCziBase_id_closure |
|
|
|
+---------------------------+
|
|
|
| stg_ap_p_info |
|
|
|
+---------------------------+
|
|
|
| stg_INTLIKE_closure+209 |
|
|
|
+---------------------------+
|
|
|
| ... |
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
Because `known_fun_2` is of arity 2, when we jump to it's entry
|
|
|
code, it will only consume the top two arguments from the stack:
|
|
|
i.e. the two pointers to `base_GHCziBase_id_closure`. It will then
|
|
|
evaluate to some sort of value and transfer control to the entry
|
|
|
code for `stg_ap_p_info`.
|
|
|
|
|
|
|
|
|
This is where the magic happens: the entry code for `stg_ap_p_info`
|
|
|
will apply the function value that was returned from `known_fun_2`
|
|
|
to the (pointer) argument in the "free variable" of its (stack
|
|
|
allocated) closure -- and we have arranged that that is
|
|
|
`stg_INTLIKE_closure+209`, i.e. the closure for the `Int` literal
|
|
|
`10`. This code is shared with the generic application functions
|
|
|
for calls to unknown functions, so this will make use of the
|
|
|
`stg_ap_p_fast` function we saw before.
|
|
|
|
|
|
|
|
|
Finally, control will be transferred back to the caller for
|
|
|
`known_oversat_app`, and all will be well.
|
|
|
|
|
|
## Example 6: allocation of thunks and data
|
|
|
|
|
|
|
|
|
Something that happens all the time in Haskell is allocation. There
|
|
|
are three principal types of thing that get allocated: function
|
|
|
closures, thunks, and data. These are all treated pretty much the
|
|
|
same in the STG machine for the simple reason that they share many
|
|
|
common characteristics:
|
|
|
|
|
|
- Entry code which the STG machine jumps to, in order to evaluate
|
|
|
them
|
|
|
|
|
|
- Note that for constructors, the entry code is trivial, as they
|
|
|
are always already evaluated! In this case, control will be
|
|
|
transferred directly back to the caller's continuation.
|
|
|
|
|
|
- Free variables stored in a closure
|
|
|
|
|
|
- For data, these "free variables" will be the values in the fields
|
|
|
of the particular data constructor
|
|
|
|
|
|
- Info-tables containing various miscellaneous metadata about the
|
|
|
heap object, such as function arity
|
|
|
|
|
|
|
|
|
Let us look at how a thunk and a data constructor get allocated in
|
|
|
a simple setting:
|
|
|
|
|
|
```wiki
|
|
|
build_data :: Int -> Maybe Int
|
|
|
build_data x = Just (x + 1)
|
|
|
```
|
|
|
|
|
|
|
|
|
This compiles into the following C--:
|
|
|
|
|
|
```
|
|
|
Main_buildzudata_entry(){clE:
|
|
|
Hp = Hp +20;if(Hp > HpLim)goto clH;
|
|
|
I32[Hp -16]= slk_info;
|
|
|
I32[Hp -8]= I32[Sp +0];
|
|
|
I32[Hp -4]= base_DataziMaybe_Just_con_info;
|
|
|
I32[Hp +0]= Hp -16;
|
|
|
R1 = Hp -2;
|
|
|
Sp = Sp +4;
|
|
|
jump (I32[I32[Sp +0]])();clI:
|
|
|
R1 = Main_buildzudata_closure;
|
|
|
jump stg_gc_fun();clH:
|
|
|
HpAlloc =20;goto clI;}
|
|
|
```
|
|
|
|
|
|
```wiki
|
|
|
_Main_buildzudata_entry:
|
|
|
LclE:
|
|
|
addl $20,%edi
|
|
|
cmpl 92(%ebx),%edi
|
|
|
ja LclH
|
|
|
movl $_slk_info,-16(%edi)
|
|
|
movl (%ebp),%eax
|
|
|
movl %eax,-8(%edi)
|
|
|
movl $_base_DataziMaybe_Just_con_info,-4(%edi)
|
|
|
leal -16(%edi),%eax
|
|
|
movl %eax,(%edi)
|
|
|
leal -2(%edi),%esi
|
|
|
addl $4,%ebp
|
|
|
movl (%ebp),%eax
|
|
|
jmp *(%eax)
|
|
|
LclH:
|
|
|
movl $20,112(%ebx)
|
|
|
LclI:
|
|
|
movl $_Main_buildzudata_closure,%esi
|
|
|
jmp *-4(%ebx)
|
|
|
```
|
|
|
|
|
|
|
|
|
Let's break this function down slowly.
|
|
|
|
|
|
### Checking for sufficient heap space
|
|
|
|
|
|
|
|
|
Any function that needs to allocate memory might find that the heap
|
|
|
has been exhausted. If that happens, it needs to call into the
|
|
|
garbage collector in order to get the heap cleaned up and
|
|
|
(possibly) enlarged.
|
|
|
|
|
|
|
|
|
Hence, the first thing any such function does is check to see if
|
|
|
enough memory is available for its purposes:
|
|
|
|
|
|
```
|
|
|
clE:
|
|
|
Hp = Hp +20;if(Hp > HpLim)goto clH;...clI:
|
|
|
R1 = Main_buildzudata_closure;
|
|
|
jump stg_gc_fun();clH:
|
|
|
HpAlloc =20;goto clI;
|
|
|
```
|
|
|
|
|
|
|
|
|
This is simple enough. The function needs to allocate 20 bytes (the
|
|
|
data constructor takes up 2 words, and the thunk will take up 3),
|
|
|
so it speculatively increments Hp and then checks the STG registers
|
|
|
`Hp` and `HpLim` (the pointer to the top of the available heap
|
|
|
space) against each other.
|
|
|
|
|
|
|
|
|
If memory is insufficient (i.e. we have moved `Hp` past the top of
|
|
|
the available heap), the code deals with it by setting the
|
|
|
`HpAlloc` register to the number of bytes needed and `R1` to the
|
|
|
closure for the function in which the heap check failed, before
|
|
|
jumping into the hand-written garbage collector code for the
|
|
|
cleanup. The garbage collector will resume execution of the code by
|
|
|
using the information from `R1`, after it has freed up enough
|
|
|
memory.
|
|
|
|
|
|
|
|
|
Side note: I believe that the line setting `R1` is unnecessary
|
|
|
here, because `R1` should anyway always be set to the address of
|
|
|
the closure when executing the closure entry code. I could be
|
|
|
wrong, though.
|
|
|
|
|
|
### Performing the actual allocation
|
|
|
|
|
|
|
|
|
Once the heap check succeeds, we will be able to enter the body of
|
|
|
the function proper. Since the `Hp` has already been incremented,
|
|
|
we can just construct the new heap objects directly:
|
|
|
|
|
|
```
|
|
|
I32[Hp -16]= slk_info;
|
|
|
I32[Hp -8]= I32[Sp +0];
|
|
|
I32[Hp -4]= base_DataziMaybe_Just_con_info;
|
|
|
I32[Hp +0]= Hp -16;
|
|
|
```
|
|
|
|
|
|
|
|
|
So we get something like this:
|
|
|
|
|
|
```wiki
|
|
|
|
|
|
| ... |
|
|
|
+--------------------------------+
|
|
|
| slk_info |<-\ Pointer to thunk info table
|
|
|
+--------------------------------+ |
|
|
|
| (undefined) | |
|
|
|
+--------------------------------+ |
|
|
|
| x | | The "x + 1" thunk's free variable
|
|
|
+--------------------------------+ |
|
|
|
| base_DataziMaybe_Just_con_info | | Pointer to Just info table
|
|
|
/----\ +--------------------------------+ |
|
|
|
| Hp |--->| |--/ Free variable of Just constructor
|
|
|
\----/ +--------------------------------+
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
The bottom two words are the allocated `Just` value, and the three
|
|
|
above that correspond to the `x + 1` closure.
|
|
|
|
|
|
### Returning an allocated value to the caller
|
|
|
|
|
|
|
|
|
Now that we have allocated the data we entered the function in
|
|
|
order to construct, we need to return it to the caller. This is
|
|
|
achieved by the following code:
|
|
|
|
|
|
```
|
|
|
R1 = Hp -2;
|
|
|
Sp = Sp +4;
|
|
|
jump (I32[I32[Sp +0]])();
|
|
|
```
|
|
|
|
|
|
|
|
|
To return, the STG machine:
|
|
|
|
|
|
1. Sets `R1` to the pointer to the result of evaluation
|
|
|
1. Pops all the arguments to the function from the stack
|
|
|
1. Jumps to the entry code for the continuation. This is always
|
|
|
found at the top of the STG stack, logically below any arguments
|
|
|
that were pushed to make the call.
|
|
|
|
|
|
|
|
|
This is indeed exactly what happens here, with two interesting
|
|
|
points: pointer tagging, and the double-deference of the stack
|
|
|
pointer. These will be discussed in the next two subsections.
|
|
|
|
|
|
#### Pointer tagging
|
|
|
|
|
|
|
|
|
One exciting feature is that the code setting `R1`, i.e.
|
|
|
`R1 = Hp - 2`. This is setting `R1` to point to the `Just`, we just
|
|
|
allocated, but simultaneously tagging that pointer with the value
|
|
|
|
|
|
1. The fact that the tag is non-zero indicates to users of the
|
|
|
|
|
|
|
|
|
pointer that the thing pointed to is already evaluated.
|
|
|
Furthermore, because `Maybe` has only two constructors, we are able
|
|
|
to use the pointer tags to record which constructor it evaluated
|
|
|
to: in this case, the 2 indicates the `Just` constructor.
|
|
|
|
|
|
|
|
|
It is compulsory to tag pointers before jumping to the address of
|
|
|
the continuation entry code: the entry code can and will rely on
|
|
|
those tags being present!
|
|
|
|
|
|
#### `TABLES_NEXT_TO_CODE`
|
|
|
|
|
|
|
|
|
Because I have compiled GHC without `TABLES_NEXT_TO_CODE`, the
|
|
|
entry code for the continuation is found by dereferencing the
|
|
|
pointer to the info table we found at the top of the STG stack -
|
|
|
i.e. a double-dereference.
|
|
|
|
|
|
|
|
|
The layout of heap objects without `TABLES_NEXT_TO_CODE` is as
|
|
|
follows:
|
|
|
|
|
|
```wiki
|
|
|
|
|
|
Closure Info table Entry code
|
|
|
+--------------+ +--------------+ +------------+
|
|
|
x --->| Info pointer |--->| Code pointer |-->| ... |
|
|
|
+--------------+ +--------------+
|
|
|
| .. FVs .. | | ... |
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
With `TABLES_NEXT_TO_CODE` on, the situation looks more like this:
|
|
|
|
|
|
```wiki
|
|
|
|
|
|
Closure | .. Info table .. |
|
|
|
+--------------+ +------------------+
|
|
|
x --->| Info pointer |--->| .. Entry code .. |
|
|
|
+--------------+ | |
|
|
|
| .. FVs .. |
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
The `TABLES_NEXT_TO_CODE` optimisation removes the need for that
|
|
|
second dereference during the return, because the entry code is
|
|
|
always right next to the info table. However, it requires special
|
|
|
support from the backend for ensuring that data (i.e. the info
|
|
|
table) and code are contiguous in memory, so it cannot always be
|
|
|
used.
|
|
|
|
|
|
## Example 7: `case` expressions
|
|
|
|
|
|
|
|
|
Let us now examine how `case` expressions are handled. Compiling
|
|
|
the following Haskell
|
|
|
|
|
|
```wiki
|
|
|
case_scrut :: Maybe Int -> Int
|
|
|
case_scrut x = case x of Just x -> x; Nothing -> 10
|
|
|
```
|
|
|
|
|
|
|
|
|
Produces this C-- code
|
|
|
|
|
|
```
|
|
|
Main_casezuscrut_entry(){ccx:
|
|
|
R1 = I32[Sp +0];
|
|
|
I32[Sp +0]= scj_info;if(R1 &3!=0)goto ccA;
|
|
|
jump (I32[I32[R1]])();ccA:
|
|
|
jump (I32[scj_info])();}
|
|
|
|
|
|
scj_ret(){cct:
|
|
|
_ccu::I32 = R1 &3;if(_ccu::I32 >=2)goto ccv;
|
|
|
R1 = stg_INTLIKE_closure+209;
|
|
|
Sp = Sp +4;
|
|
|
jump (I32[I32[Sp +0]])();ccv:
|
|
|
R1 = I32[R1 +2];
|
|
|
Sp = Sp +4;
|
|
|
R1 = R1 &(-4);
|
|
|
jump (I32[I32[R1]])();}
|
|
|
```
|
|
|
|
|
|
```wiki
|
|
|
_Main_casezuscrut_entry:
|
|
|
Lccx:
|
|
|
movl (%ebp),%esi
|
|
|
movl $_scj_info,(%ebp)
|
|
|
testl $3,%esi
|
|
|
jne LccA
|
|
|
movl (%esi),%eax
|
|
|
jmp *(%eax)
|
|
|
LccA:
|
|
|
jmp *_scj_info
|
|
|
|
|
|
_scj_ret:
|
|
|
Lcct:
|
|
|
movl %esi,%eax
|
|
|
andl $3,%eax
|
|
|
cmpl $2,%eax
|
|
|
jae Lccv
|
|
|
movl L_stg_INTLIKE_closure$non_lazy_ptr,%eax
|
|
|
leal 209(%eax),%esi
|
|
|
addl $4,%ebp
|
|
|
movl (%ebp),%eax
|
|
|
jmp *(%eax)
|
|
|
Lccv:
|
|
|
movl 2(%esi),%esi
|
|
|
addl $4,%ebp
|
|
|
andl $-4,%esi
|
|
|
movl (%esi),%eax
|
|
|
jmp *(%eax)
|
|
|
```
|
|
|
|
|
|
|
|
|
Notice that GHC has generated *two* functions:
|
|
|
`Main_casezuscrut_entry` and `scj_ret` correspond to the code for
|
|
|
forcing the argument to the `case`, and for the *continuation* of
|
|
|
the `case` respectively. Let's pick them apart and see how they
|
|
|
work!
|
|
|
|
|
|
### Forcing the scrutinee of the `case`
|
|
|
|
|
|
|
|
|
When we first call the `case_scrut` function, its entry code begins
|
|
|
executing:
|
|
|
|
|
|
```
|
|
|
ccx:
|
|
|
R1 = I32[Sp +0];
|
|
|
I32[Sp +0]= scj_info;if(R1 &3!=0)goto ccA;
|
|
|
jump (I32[I32[R1]])();ccA:
|
|
|
jump (I32[scj_info])();
|
|
|
```
|
|
|
|
|
|
|
|
|
This is a function of arity 1 (i.e. with a single argument), so
|
|
|
upon entry the machine state looks like this:
|
|
|
|
|
|
```wiki
|
|
|
|
|
|
/----\ +--------------------------+
|
|
|
| R1 |---->| Main_casezuscrut_closure |
|
|
|
\----/ +--------------------------+
|
|
|
|
|
|
/----\ +------------+
|
|
|
| Sp |---->| x |
|
|
|
\----/ +------------+
|
|
|
| ... |
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
Because this is a top level function, the closure is statically
|
|
|
allocated and contains no free variables. However, as discussed
|
|
|
previously, the single argument to the function is guaranteed to be
|
|
|
present at the top of the stack.
|
|
|
|
|
|
|
|
|
The code starts off by saving this argument (the `x`) temporarily
|
|
|
into `R1`:
|
|
|
|
|
|
```
|
|
|
ccx:
|
|
|
R1 = I32[Sp +0];
|
|
|
```
|
|
|
|
|
|
|
|
|
The next thing the code does is overwrites this argument on the
|
|
|
stack with a pointer to the info-table of the continuation code.
|
|
|
This is the code that will be invoked after `x` has been evaluated
|
|
|
into WHNF, and which will do the test to decide whether to continue
|
|
|
as the `Nothing` or as the `Just` branch of the case:
|
|
|
|
|
|
```
|
|
|
I32[Sp +0]= scj_info;
|
|
|
```
|
|
|
|
|
|
|
|
|
As we saw earlier, any time that the STG machine decides that it
|
|
|
has a value in it's hand, it will continue evaluation by
|
|
|
tail-calling the entry code found by dereferencing the info-table
|
|
|
pointer at the top of the stack. So by putting the address of our
|
|
|
continuation in here, we ensure that the entry code for `scj_info`
|
|
|
is executed after `x` becomes a value.
|
|
|
|
|
|
|
|
|
Now, what we need to do is to start the evaluation of `x`. We could
|
|
|
just jump into `x`'s entry code and hope for the best, but thanks
|
|
|
to GHC's pointer tagging we can sometimes avoid doing this indirect
|
|
|
branch.
|
|
|
|
|
|
|
|
|
So, instead, we test to see if the `x` pointer has a tag. If it is
|
|
|
tagged, then we know that it is already evaluated and hence jump
|
|
|
directly to the code for the continuation. If it is not tagged, we
|
|
|
are forced to make the jump into the entry code for `x`. This
|
|
|
choice is embodied by the following code:
|
|
|
|
|
|
```
|
|
|
if(R1 &3!=0)goto ccA;
|
|
|
jump (I32[I32[R1]])();ccA:
|
|
|
jump (I32[scj_info])();
|
|
|
```
|
|
|
|
|
|
|
|
|
Note the test `R1 & 3 != 0`: this reflects the fact that pointer
|
|
|
tags are stored in the lower 2 bits of the pointer on 32 bit
|
|
|
machines. Another interesting feature is how the `jump`
|
|
|
instructions find the entry code: again, we see a deference of the
|
|
|
info pointer because `TABLES_NEXT_TO_CODE` is turned off.
|
|
|
|
|
|
|
|
|
As we saw, the `case` scrutinisation code ended with one of two
|
|
|
things happening: 1. A direct call into the continuation code
|
|
|
`scj_ret` if the scrutinee was already evaluated 2. A call into the
|
|
|
entry code for the scrutinee, if the scrutinee was not evaluated
|
|
|
(or it *was* evaluated, but the pointer was somehow not tagged
|
|
|
with that information) - Because we pushed `scj_info` onto the STG
|
|
|
stack, control will eventually return to `scj_ret` after the
|
|
|
evaluation of `x` has finished
|
|
|
|
|
|
|
|
|
It is now time to examine the continuation code to see what happens
|
|
|
after `x` becomes a value.
|
|
|
|
|
|
### Dealing with the forced scrutinee
|
|
|
|
|
|
|
|
|
The continuation code is a little more complicated:
|
|
|
|
|
|
```
|
|
|
cct:
|
|
|
_ccu::I32 = R1 &3;if(_ccu::I32 >=2)goto ccv;
|
|
|
R1 = stg_INTLIKE_closure+209;
|
|
|
Sp = Sp +4;
|
|
|
jump (I32[I32[Sp +0]])();ccv:
|
|
|
R1 = I32[R1 +2];
|
|
|
Sp = Sp +4;
|
|
|
R1 = R1 &(-4);
|
|
|
jump (I32[I32[R1]])();
|
|
|
```
|
|
|
|
|
|
|
|
|
Whenever the STG machine evaluates to a value it will return the
|
|
|
value by jumping to the entry point at the top of the stack. In
|
|
|
this case, `R1` is guaranteed to be a (tagged) pointer to the thing
|
|
|
that was just evaluated. Because we are scrutinising a `Maybe` type
|
|
|
(which has fewer than 4 constructors) the code for the `case`
|
|
|
continuation is able to use the tag bits on the returned pointer to
|
|
|
decide which of the two branches to take:
|
|
|
|
|
|
```
|
|
|
cct:
|
|
|
_ccu::I32 = R1 &3;if(_ccu::I32 >=2)goto ccv;
|
|
|
```
|
|
|
|
|
|
|
|
|
If we were scrutinising a data type with more constructors, the tag
|
|
|
bits would only tell us that the thing was evaluated, not which
|
|
|
constructor it was evaluated to. In this case, we would have to
|
|
|
read the constructor tag by dereferencing `R1` and testing the
|
|
|
resulting info table pointer against all possibilities.
|
|
|
|
|
|
|
|
|
If the tag was greater than or equal to 2, we go to the `ccv`
|
|
|
branch, which deals with what happens if we had a `Just`. In this
|
|
|
case, we need to continue by forcing the thunk inside the `Just`
|
|
|
and returning that value to our caller, which is what these lines
|
|
|
are doing:
|
|
|
|
|
|
```
|
|
|
ccv:
|
|
|
R1 = I32[R1 +2];
|
|
|
Sp = Sp +4;
|
|
|
R1 = R1 &(-4);
|
|
|
jump (I32[I32[R1]])();
|
|
|
```
|
|
|
|
|
|
|
|
|
To access the thing inside the `Just`, the code assumes that the
|
|
|
`R1` pointer is tagged with the 2 that indicates a `Just`
|
|
|
constructor, and hence finds the first free variable (stored 4
|
|
|
bytes into the closure) using `I32[R1 + 2]`, which is then saved
|
|
|
into `R1`. It pops the address of `scj_info` that was pushed onto
|
|
|
the stack in `Main_casezuscrut_entry` by moving `Sp` up 4 bytes
|
|
|
(remember that the STG stack grows downwards) and then untags and
|
|
|
jumps into the entry code for the `R1` thunk, using the same
|
|
|
double-dereference pattern discussed earlier.
|
|
|
|
|
|
|
|
|
There seems to be a small missed opportunity here: the code could
|
|
|
check the pointer tag on `R1`, and then return directly if it is
|
|
|
set. I imagine that this isn't being done in order to reduce
|
|
|
possible code bloat.
|
|
|
|
|
|
## Example 8: thunks and thunk update
|
|
|
|
|
|
|
|
|
You might be wondering how the `x + 1` thunk we saw allocated in a
|
|
|
previous section will behave when it is actually forced. To remind
|
|
|
you, the thunk we saw was constructed by the following Haskell
|
|
|
code:
|
|
|
|
|
|
```wiki
|
|
|
build_data :: Int -> Maybe Int
|
|
|
build_data x = Just (x + 1)
|
|
|
```
|
|
|
|
|
|
|
|
|
So how does the `x + 1` thunk work? An excellent question! Let's
|
|
|
take a look at the C-- for its entry code and find out:
|
|
|
|
|
|
```
|
|
|
slk_entry(){cph:if(Sp -12< SpLim)goto cpj;
|
|
|
I32[Sp -8]= stg_upd_frame_info;
|
|
|
I32[Sp -4]= R1;
|
|
|
R1 = I32[R1 +8];
|
|
|
I32[Sp -12]= soN_info;
|
|
|
Sp = Sp -12;if(R1 &3!=0)goto cpk;
|
|
|
jump (I32[I32[R1]])();cpj: jump stg_gc_enter_1 ();cpk: jump (I32[soN_info])();}
|
|
|
|
|
|
soN_ret(){cp7:
|
|
|
Hp = Hp +8;if(Hp > HpLim)goto cpc;
|
|
|
_soL::I32 = I32[R1 +3]+1;
|
|
|
I32[Hp -4]= ghczmprim_GHCziTypes_Izh_con_info;
|
|
|
I32[Hp +0]= _soL::I32;
|
|
|
R1 = Hp -3;
|
|
|
Sp = Sp +4;
|
|
|
jump (I32[stg_upd_frame_info])();cpd: jump stg_gc_enter_1 ();cpc:
|
|
|
HpAlloc =8;goto cpd;}
|
|
|
```
|
|
|
|
|
|
```wiki
|
|
|
_sst_entry:
|
|
|
Lcph:
|
|
|
leal -12(%ebp),%eax
|
|
|
cmpl 84(%ebx),%eax
|
|
|
jb Lcpj
|
|
|
movl L_stg_upd_frame_info$non_lazy_ptr,%eax
|
|
|
movl %eax,-8(%ebp)
|
|
|
movl %esi,-4(%ebp)
|
|
|
movl 8(%esi),%esi
|
|
|
movl $_soN_info,-12(%ebp)
|
|
|
addl $-12,%ebp
|
|
|
testl $3,%esi
|
|
|
jne Lcpk
|
|
|
movl (%esi),%eax
|
|
|
jmp *(%eax)
|
|
|
Lcpj:
|
|
|
jmp *-8(%ebx)
|
|
|
Lcpk:
|
|
|
jmp *_soN_info
|
|
|
|
|
|
_soN_ret:
|
|
|
Lcp7:
|
|
|
addl $8,%edi
|
|
|
cmpl 92(%ebx),%edi
|
|
|
ja Lcpc
|
|
|
movl 3(%esi),%eax
|
|
|
incl %eax
|
|
|
movl $_ghczmprim_GHCziTypes_Izh_con_info,-4(%edi)
|
|
|
movl %eax,(%edi)
|
|
|
leal -3(%edi),%esi
|
|
|
addl $4,%ebp
|
|
|
movl L_stg_upd_frame_info$non_lazy_ptr,%eax
|
|
|
jmp *(%eax)
|
|
|
Lcpc:
|
|
|
movl $8,112(%ebx)
|
|
|
Lcpd:
|
|
|
jmp *-8(%ebx)
|
|
|
```
|
|
|
|
|
|
|
|
|
The original Haskell code read `x + 1`, but GHC has inlined the
|
|
|
actual code for the addition operation on `Int`s, which looks
|
|
|
something like:
|
|
|
|
|
|
```wiki
|
|
|
plusInt (I# a) (I# b) = I# (a + b)
|
|
|
```
|
|
|
|
|
|
|
|
|
The second pattern match (to get `b`) has been performed statically
|
|
|
by GHC, obtaining the machine literal 1, which shows up directly in
|
|
|
the generated code. Therefore, the code only need to evaluate and
|
|
|
case-decompose the unknown free variable `x` of our closure, to get
|
|
|
the `a` argument to `plusInt`.
|
|
|
|
|
|
### Thunk entry point
|
|
|
|
|
|
|
|
|
This evaluation is what is being done by the thunk entry code
|
|
|
`slk_entry`. Ignoring the stack check, the C-- begins thusly:
|
|
|
|
|
|
```
|
|
|
I32[Sp -8]= stg_upd_frame_info;
|
|
|
I32[Sp -4]= R1;
|
|
|
R1 = I32[R1 +8];
|
|
|
I32[Sp -12]= soN_info;
|
|
|
Sp = Sp -12;
|
|
|
```
|
|
|
|
|
|
|
|
|
Remembering that upon entry to the thunk entry code, `R1` points to
|
|
|
the thunk's closure, the new stack looks as follows:
|
|
|
|
|
|
```wiki
|
|
|
|
|
|
/----\ +---------------------+
|
|
|
| Sp |--->| soN_info |
|
|
|
\----/ +---------------------+
|
|
|
| stg_upd_frame_info |
|
|
|
+---------------------+ +-------------+
|
|
|
| (thunk closure ptr) |---->| slk_info |
|
|
|
+---------------------+ +-------------+
|
|
|
| ... | | (undefined) |
|
|
|
+-------------+
|
|
|
| x | Free variable of thunk
|
|
|
+-------------+
|
|
|
```
|
|
|
|
|
|
|
|
|
The C-- statement `R1 = I32[R1 + 8]` is pulling out the pointer to
|
|
|
the free variable of the thunk (which was set up in
|
|
|
`Main_buildzudata_entry`) into `R1`.
|
|
|
|
|
|
|
|
|
Finally, the entry code evaluates that free variable (checking the
|
|
|
tag bits of the pointer first, as usual):
|
|
|
|
|
|
```
|
|
|
if(R1 &3!=0)goto cpk;
|
|
|
jump (I32[I32[R1]])();cpk: jump (I32[soN_info])();
|
|
|
```
|
|
|
|
|
|
|
|
|
Because we put `soN_info` at the top of the stack, when evaluation
|
|
|
of `x` is complete the STG machine will continue by executing the
|
|
|
`soN_ret` code.
|
|
|
|
|
|
|
|
|
The most interesting feature of this code is the extra stuff that
|
|
|
has been pushed onto the stack below `soN_ret`: an info pointer
|
|
|
called `stg_upd_frame_info`, and a pointer to the thunk currently
|
|
|
being evaluated.
|
|
|
|
|
|
|
|
|
This is all part of the STG machine's thunk update mechanism. When
|
|
|
the `soN_ret` continuation returns, it will transfer control
|
|
|
*not* to the code forcing the thunk, but to some code which
|
|
|
overwrites the contents of the current thunk closure with a closure
|
|
|
representing an "indirection". The entry code for such an
|
|
|
indirection closure is trivial: it immediately returns a pointer to
|
|
|
the thing that was returned from the `soN_ret` continuation in
|
|
|
`R1`.
|
|
|
|
|
|
|
|
|
These indirections are the mechanism which ensures that the STG
|
|
|
machine never repeats the work of evaluating a thunk more than
|
|
|
once: after the first evaluation, any code forcing the thunk jumps
|
|
|
into the indirection entry code rather than `slk_entry`.
|
|
|
|
|
|
|
|
|
That being said, let us look at how the continuation responsible
|
|
|
for actually finding the value of `x + 1` works:
|
|
|
|
|
|
### Continuation of the thunk
|
|
|
|
|
|
|
|
|
Upon entry to the continuation code, we have the evaluated `x` in
|
|
|
`R1`: it now needs to do the addition and allocate a `I#`
|
|
|
constructor to hold the result of the addition. Because of the
|
|
|
allocation, `soN_ret` begins with a heap check. Ignoring that
|
|
|
check, we have the following code:
|
|
|
|
|
|
```
|
|
|
_soL::I32 = I32[R1 +3]+1;
|
|
|
I32[Hp -4]= ghczmprim_GHCziTypes_Izh_con_info;
|
|
|
I32[Hp +0]= _soL::I32;
|
|
|
R1 = Hp -3;
|
|
|
Sp = Sp +4;
|
|
|
jump (I32[stg_upd_frame_info])();
|
|
|
```
|
|
|
|
|
|
|
|
|
This is mostly standard stuff. Because the `R1` pointer is
|
|
|
guaranteed tagged, and there is only one possible constructor, the
|
|
|
tag must be 1 and so the `Int#` value inside the `Int` is pulled
|
|
|
out using `I32[R1 + 3]`. This is then put into a newly
|
|
|
heap-allocated `I#` constructor, which is returned in `R1` after we
|
|
|
pop the `soN_info` pointer from the stack.
|
|
|
|
|
|
|
|
|
The only interesting point is where we return to: rather than
|
|
|
dereference `Sp` to find the info pointer at the top of the STG
|
|
|
stack, GHC has generated code that takes advantage of the fact that
|
|
|
the `Sp` is guaranteed to point to `stg_upd_frame_info`. This
|
|
|
avoids one pointer dereference.
|
|
|
|
|
|
## Conclusion
|
|
|
|
|
|
|
|
|
This document has left much of the detail of how STG is implemented
|
|
|
out: notable omissions include CAFs, and the precise behaviour of
|
|
|
the garbage collector. Nonetheless, my hope is that it has helped
|
|
|
you to gain some more insight into the weird and wonderful way the
|
|
|
Haskell evaluation model is implemented. |