GHC creates redundant constructor wrappers.
In GHC we have the Hoopl Block
type:
data Block n e x where
BlockCO :: n C O -> Block n O O -> Block n C O
BlockCC :: n C O -> Block n O O -> n O C -> Block n C C
BlockOC :: Block n O O -> n O C -> Block n O C
BNil :: Block n O O
BMiddle :: n O O -> Block n O O
BCat :: Block n O O -> Block n O O -> Block n O O
BSnoc :: Block n O O -> n O O -> Block n O O
BCons :: n O O -> Block n O O -> Block n O O
For BMiddle
we end up with this wrapper:
-- RHS size: {terms: 4, types: 12, coercions: 2, joins: 0/0}
Hoopl.Block.$WBMiddle [InlPrag=INLINE[0]]
:: forall (n :: * -> * -> *). n O O -> Block n O O
[GblId[DataConWrapper],
Arity=1,
Caf=NoCafRefs,
Str=<L,U>m5,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
Tmpl= \ (@ (n_ayl :: * -> * -> *))
(dt_a2Cs [Occ=Once] :: n_ayl O O) ->
Hoopl.Block.BMiddle
@ n_ayl
@ O
@ O
@~ (<O>_N :: O ghc-prim-0.6.1:GHC.Prim.~# O)
@~ (<O>_N :: O ghc-prim-0.6.1:GHC.Prim.~# O)
dt_a2Cs}]
Hoopl.Block.$WBMiddle
= \ (@ (n_ayl :: * -> * -> *)) (dt_a2Cs [Occ=Once] :: n_ayl O O) ->
Hoopl.Block.BMiddle
@ n_ayl
@ O
@ O
@~ (<O>_N :: O ghc-prim-0.6.1:GHC.Prim.~# O)
@~ (<O>_N :: O ghc-prim-0.6.1:GHC.Prim.~# O)
dt_a2Cs
There is really only one argument here which remains at runtime. As I understand it coercionTokens are really void arguments so in STG this takes one argument and applies it to the worker BMiddle.
Hoopl.Block.$WBMiddle [InlPrag=INLINE[0]]
:: forall (n :: * -> * -> *).
n Hoopl.Block.O Hoopl.Block.O
-> Hoopl.Block.Block n Hoopl.Block.O Hoopl.Block.O
[GblId[DataConWrapper],
Arity=1,
Caf=NoCafRefs,
Str=<L,U>m5,
Unf=OtherCon []] =
\r [dt_s4tZ]
Hoopl.Block.BMiddle [GHC.Prim.coercionToken#
GHC.Prim.coercionToken#
dt_s4tZ];
We can see this in the definition of the worker (again STG code):
Hoopl.Block.BMiddle
:: forall (n :: * -> * -> *) e x.
(e GHC.Prim.~# Hoopl.Block.O)
-> (x GHC.Prim.~# Hoopl.Block.O)
-> n Hoopl.Block.O Hoopl.Block.O
-> Hoopl.Block.Block n e x
[GblId[DataCon],
Arity=3,
Caf=NoCafRefs,
Str=<L,U><L,U><L,U>m5,
Unf=OtherCon []] =
\r [void_0E void_0E eta_B1] Hoopl.Block.BMiddle [eta_B1];
Now as @osa1 explained it to me (I might have understood it wrong!) it's the case that:
- We always (try to?) inline the wrapper
- We only use the worker if it's not fully applied, otherwise we turn this into an StgConApp which means the constructor will be allocated inline.
This is the wrapper code. Which is actually identical with the code allocated for the Worker-
[Hoopl.Block.$WBMiddle_entry() { // [R2]
{ info_tbls: [(c4NF,
label: Hoopl.Block.$WBMiddle_info
rep: HeapRep static { Fun {arity: 1 fun_type: ArgSpec 5} }
srt: Nothing)]
stack_info: arg_space: 8 updfr_space: Just 8
}
{offset
c4NF: // global
Hp = Hp + 16; // CmmAssign
if (Hp > HpLim) (likely: False) goto c4NJ; else goto c4NI; // CmmCondBranch
c4NJ: // global
HpAlloc = 16; // CmmAssign
R2 = R2; // CmmAssign
R1 = Hoopl.Block.$WBMiddle_closure; // CmmAssign
call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8; // CmmCall
c4NI: // global
// allocHeapClosure
I64[Hp - 8] = Hoopl.Block.BMiddle_con_info; // CmmStore
P64[Hp] = R2; // CmmStore
R1 = Hp - 7; // CmmAssign
call (P64[Sp])(R1) args: 8, res: 0, upd: 8; // CmmCall
}
},
Now in quite a few cases we actually call the wrapper (so it's not inlined) despite the fact that it is fully applied in STG.
Rec {
Hoopl.Block.blockSnoc [Occ=LoopBreaker]
:: forall (n :: * -> * -> *) e.
Hoopl.Block.Block n e Hoopl.Block.O
-> n Hoopl.Block.O Hoopl.Block.O
-> Hoopl.Block.Block n e Hoopl.Block.O
[GblId,
Arity=2,
Caf=NoCafRefs,
Str=<S,1*U><L,U>,
Unf=OtherCon []] =
\r [b_s4uI n1_s4uJ]
case b_s4uI of wild_s4uK [Occ=Once*] {
Hoopl.Block.BlockCO f_s4uL [Occ=Once] b1_s4uM [Occ=Once] -> ...
Hoopl.Block.BNil -> Hoopl.Block.$WBMiddle n1_s4uJ;
Hoopl.Block.BMiddle _ [Occ=Dead] -> ...
...
};
end Rec }
The alternative Hoopl.Block.$WBMiddle n1_s4uJ;
here is translated to a call the the wrapper in Cmm:
c4SL: // global
R2 = _s4uJ::P64; // CmmAssign
Sp = Sp + 16; // CmmAssign
call Hoopl.Block.$WBMiddle_info(R2) args: 8, res: 0, upd: 8; // CmmCall
What I think we would WANT to happen is $WBMiddle
get's inlined into blockSnoc
.
This exposes the worker, fully saturated.
As a consequence we allocate the Constructor inline like this:
I64[Hp - 8] = Hoopl.Block.BMiddle_con_info; // CmmStore
P64[Hp] = R2; // CmmStore
R1 = Hp - 7; // CmmAssign
call (P64[Sp])(R1) args: 8, res: 0, upd: 8; // CmmCall
What does happen is that the Wrapper doesn't get inlined. So instead of allocating inline we generate a function call into the wrapper which then allocates the constructor.
It's not horrible, but the call does have an overhead that allocating the constructor directly wouldn't have.
At least based on the runtime representation we shouldn't use a Wrapper at all. Which I assume is harder to determine at the Core level.
Either way I came across this by chance and thought I should write it down.