Pointless heap allocation for local binders binding functions without free variables.
Given this function:
{-# OPTIONS_GHC -fno-full-laziness -fno-stg-lift-lams #-}
module M where
foo x =
let g y = y + 1 :: Int
in (g, x)
GHC allocates one word for the local binder g
, in which we store a pointer to the actual (and static) function g
.
Here is the Stg + cmm:
M.foo :: forall {b}. b -> (GHC.Types.Int -> GHC.Types.Int, b)
[GblId, Arity=1, Str=<L>, Cpr=1, Unf=OtherCon []] =
{} \r [x_sDi]
let {
sat_sDn [Occ=Once1] :: GHC.Types.Int -> GHC.Types.Int
[LclId] =
{} \r [y_sDj]
case y_sDj of {
GHC.Types.I# x1_sDl [Occ=Once1] ->
case +# [x1_sDl 1#] of sat_sDm [Occ=Once1] {
__DEFAULT -> GHC.Types.I# [sat_sDm];
};
};
} in (,) [sat_sDn x_sDi];
[sat_sDn_entry() { // [R2, R1]
{ info_tbls: [(cDz,
label: block_cDz_info
rep: StackRep []
srt: Nothing),
(cDC,
label: sat_sDn_info
rep: HeapRep 1 nonptrs { Fun {arity: 1 fun_type: ArgSpec 5} }
srt: Nothing)]
stack_info: arg_space: 8
}
{offset
cDC: // global
if ((Sp + -8) < SpLim) (likely: False) goto cDJ; else goto cDK;
cDJ: // global
call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8;
cDK: // global
I64[Sp - 8] = cDz;
R1 = R2;
Sp = Sp - 8;
if (R1 & 7 != 0) goto cDz; else goto cDA;
cDA: // global
call (I64[R1])(R1) returns to cDz, args: 8, res: 8, upd: 8;
cDz: // global
Hp = Hp + 16;
if (Hp > HpLim) (likely: False) goto cDN; else goto cDM;
cDN: // global
HpAlloc = 16;
call stg_gc_unpt_r1(R1) returns to cDz, args: 8, res: 8, upd: 8;
cDM: // global
_sDm::I64 = I64[R1 + 7] + 1;
I64[Hp - 8] = GHC.Types.I#_con_info;
I64[Hp] = _sDm::I64;
R1 = Hp - 7;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
}
},
M.foo_entry() { // [R2]
{ info_tbls: [(cDP,
label: M.foo_info
rep: HeapRep static { Fun {arity: 1 fun_type: ArgSpec 5} }
srt: Nothing)]
stack_info: arg_space: 8
}
{offset
cDP: // global
Hp = Hp + 40;
if (Hp > HpLim) (likely: False) goto cDT; else goto cDS;
cDT: // global
HpAlloc = 40;
R1 = M.foo_closure;
call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8;
cDS: // global
I64[Hp - 32] = sat_sDn_info;
I64[Hp - 16] = (,)_con_info;
P64[Hp - 8] = Hp - 31;
P64[Hp] = R2;
R1 = Hp - 15;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
}
}
The interesting part here is this sequence in Cmm:
//g = pointer-to-g
I64[Hp - 32] = sat_sDn_info;
// build the tuple
I64[Hp - 16] = (,)_con_info;
P64[Hp - 8] = Hp - 31; //g is the first element of the tuple
P64[Hp] = R2; // and `x` the second
But we might as well emit:
// build the tuple
I64[Hp - 16] = (,)_con_info;
P64[Hp - 8] = sat_sDn_info
P64[Hp] = R2; // and `x` the second
I can see how the indirection via a heap object arises (we simply treat all function-binding lets the same, heap allocated or not). But it seems rather pointless for the case where there are no free variables.
But at least both -ffull-laziness and -fstg-lift-lams will fix this so it rarely happens in optimized code.