Skip to content

Pointless heap allocation for local binders binding functions without free variables.

Given this function:

{-# OPTIONS_GHC -fno-full-laziness -fno-stg-lift-lams #-}
module M where

foo x =
    let g y = y + 1 :: Int
    in (g, x)

GHC allocates one word for the local binder g, in which we store a pointer to the actual (and static) function g.

Here is the Stg + cmm:

M.foo :: forall {b}. b -> (GHC.Types.Int -> GHC.Types.Int, b)
[GblId, Arity=1, Str=<L>, Cpr=1, Unf=OtherCon []] =
    {} \r [x_sDi]
        let {
          sat_sDn [Occ=Once1] :: GHC.Types.Int -> GHC.Types.Int
          [LclId] =
              {} \r [y_sDj]
                  case y_sDj of {
                  GHC.Types.I# x1_sDl [Occ=Once1] ->
                  case +# [x1_sDl 1#] of sat_sDm [Occ=Once1] {
                  __DEFAULT -> GHC.Types.I# [sat_sDm];
                  };
                  };
        } in  (,) [sat_sDn x_sDi];
[sat_sDn_entry() { //  [R2, R1]
         { info_tbls: [(cDz,
                        label: block_cDz_info
                        rep: StackRep []
                        srt: Nothing),
                       (cDC,
                        label: sat_sDn_info
                        rep: HeapRep 1 nonptrs { Fun {arity: 1 fun_type: ArgSpec 5} }
                        srt: Nothing)]
           stack_info: arg_space: 8
         }
     {offset
       cDC: // global
           if ((Sp + -8) < SpLim) (likely: False) goto cDJ; else goto cDK;
       cDJ: // global
           call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8;
       cDK: // global
           I64[Sp - 8] = cDz;
           R1 = R2;
           Sp = Sp - 8;
           if (R1 & 7 != 0) goto cDz; else goto cDA;
       cDA: // global
           call (I64[R1])(R1) returns to cDz, args: 8, res: 8, upd: 8;
       cDz: // global
           Hp = Hp + 16;
           if (Hp > HpLim) (likely: False) goto cDN; else goto cDM;
       cDN: // global
           HpAlloc = 16;
           call stg_gc_unpt_r1(R1) returns to cDz, args: 8, res: 8, upd: 8;
       cDM: // global
           _sDm::I64 = I64[R1 + 7] + 1;
           I64[Hp - 8] = GHC.Types.I#_con_info;
           I64[Hp] = _sDm::I64;
           R1 = Hp - 7;
           Sp = Sp + 8;
           call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
     }
 },
 M.foo_entry() { //  [R2]
         { info_tbls: [(cDP,
                        label: M.foo_info
                        rep: HeapRep static { Fun {arity: 1 fun_type: ArgSpec 5} }
                        srt: Nothing)]
           stack_info: arg_space: 8
         }
     {offset
       cDP: // global
           Hp = Hp + 40;
           if (Hp > HpLim) (likely: False) goto cDT; else goto cDS;
       cDT: // global
           HpAlloc = 40;
           R1 = M.foo_closure;
           call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8;
       cDS: // global
           I64[Hp - 32] = sat_sDn_info;
           I64[Hp - 16] = (,)_con_info;
           P64[Hp - 8] = Hp - 31;
           P64[Hp] = R2;
           R1 = Hp - 15;
           call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
     }
 }

The interesting part here is this sequence in Cmm:

//g = pointer-to-g
I64[Hp - 32] = sat_sDn_info;
// build the tuple
I64[Hp - 16] = (,)_con_info;
P64[Hp - 8] = Hp - 31; //g is the first element of the tuple
P64[Hp] = R2; // and `x` the second

But we might as well emit:

// build the tuple
I64[Hp - 16] = (,)_con_info;
P64[Hp - 8] = sat_sDn_info
P64[Hp] = R2; // and `x` the second

I can see how the indirection via a heap object arises (we simply treat all function-binding lets the same, heap allocated or not). But it seems rather pointless for the case where there are no free variables.

But at least both -ffull-laziness and -fstg-lift-lams will fix this so it rarely happens in optimized code.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information