Skip to content

GHC creates redundant constructor wrappers.

In GHC we have the Hoopl Block type:

data Block n e x where
  BlockCO  :: n C O -> Block n O O          -> Block n C O
  BlockCC  :: n C O -> Block n O O -> n O C -> Block n C C
  BlockOC  ::          Block n O O -> n O C -> Block n O C

  BNil    :: Block n O O
  BMiddle :: n O O                      -> Block n O O
  BCat    :: Block n O O -> Block n O O -> Block n O O
  BSnoc   :: Block n O O -> n O O       -> Block n O O
  BCons   :: n O O       -> Block n O O -> Block n O O

For BMiddle we end up with this wrapper:

-- RHS size: {terms: 4, types: 12, coercions: 2, joins: 0/0}
Hoopl.Block.$WBMiddle [InlPrag=INLINE[0]]
  :: forall (n :: * -> * -> *). n O O -> Block n O O
[GblId[DataConWrapper],
 Arity=1,
 Caf=NoCafRefs,
 Str=<L,U>m5,
 Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True,
         Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
         Tmpl= \ (@ (n_ayl :: * -> * -> *))
                 (dt_a2Cs [Occ=Once] :: n_ayl O O) ->
                 Hoopl.Block.BMiddle
                   @ n_ayl
                   @ O
                   @ O
                   @~ (<O>_N :: O ghc-prim-0.6.1:GHC.Prim.~# O)
                   @~ (<O>_N :: O ghc-prim-0.6.1:GHC.Prim.~# O)
                   dt_a2Cs}]
Hoopl.Block.$WBMiddle
  = \ (@ (n_ayl :: * -> * -> *)) (dt_a2Cs [Occ=Once] :: n_ayl O O) ->
      Hoopl.Block.BMiddle
        @ n_ayl
        @ O
        @ O
        @~ (<O>_N :: O ghc-prim-0.6.1:GHC.Prim.~# O)
        @~ (<O>_N :: O ghc-prim-0.6.1:GHC.Prim.~# O)
        dt_a2Cs

There is really only one argument here which remains at runtime. As I understand it coercionTokens are really void arguments so in STG this takes one argument and applies it to the worker BMiddle.

Hoopl.Block.$WBMiddle [InlPrag=INLINE[0]]
  :: forall (n :: * -> * -> *).
     n Hoopl.Block.O Hoopl.Block.O
     -> Hoopl.Block.Block n Hoopl.Block.O Hoopl.Block.O
[GblId[DataConWrapper],
 Arity=1,
 Caf=NoCafRefs,
 Str=<L,U>m5,
 Unf=OtherCon []] =
    \r [dt_s4tZ]
        Hoopl.Block.BMiddle [GHC.Prim.coercionToken#
                             GHC.Prim.coercionToken#
                             dt_s4tZ];

We can see this in the definition of the worker (again STG code):

Hoopl.Block.BMiddle
  :: forall (n :: * -> * -> *) e x.
     (e GHC.Prim.~# Hoopl.Block.O)
     -> (x GHC.Prim.~# Hoopl.Block.O)
     -> n Hoopl.Block.O Hoopl.Block.O
     -> Hoopl.Block.Block n e x
[GblId[DataCon],
 Arity=3,
 Caf=NoCafRefs,
 Str=<L,U><L,U><L,U>m5,
 Unf=OtherCon []] =
    \r [void_0E void_0E eta_B1] Hoopl.Block.BMiddle [eta_B1];

Now as @osa1 explained it to me (I might have understood it wrong!) it's the case that:

  • We always (try to?) inline the wrapper
  • We only use the worker if it's not fully applied, otherwise we turn this into an StgConApp which means the constructor will be allocated inline.

This is the wrapper code. Which is actually identical with the code allocated for the Worker-

[Hoopl.Block.$WBMiddle_entry() { //  [R2]
         { info_tbls: [(c4NF,
                        label: Hoopl.Block.$WBMiddle_info
                        rep: HeapRep static { Fun {arity: 1 fun_type: ArgSpec 5} }
                        srt: Nothing)]
           stack_info: arg_space: 8 updfr_space: Just 8
         }
     {offset
       c4NF: // global
           Hp = Hp + 16;   // CmmAssign
           if (Hp > HpLim) (likely: False) goto c4NJ; else goto c4NI;   // CmmCondBranch
       c4NJ: // global
           HpAlloc = 16;   // CmmAssign
           R2 = R2;   // CmmAssign
           R1 = Hoopl.Block.$WBMiddle_closure;   // CmmAssign
           call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8;   // CmmCall
       c4NI: // global
           // allocHeapClosure
           I64[Hp - 8] = Hoopl.Block.BMiddle_con_info;   // CmmStore
           P64[Hp] = R2;   // CmmStore
           R1 = Hp - 7;   // CmmAssign
           call (P64[Sp])(R1) args: 8, res: 0, upd: 8;   // CmmCall
     }
 },

Now in quite a few cases we actually call the wrapper (so it's not inlined) despite the fact that it is fully applied in STG.

Rec {
Hoopl.Block.blockSnoc [Occ=LoopBreaker]
  :: forall (n :: * -> * -> *) e.
     Hoopl.Block.Block n e Hoopl.Block.O
     -> n Hoopl.Block.O Hoopl.Block.O
     -> Hoopl.Block.Block n e Hoopl.Block.O
[GblId,
 Arity=2,
 Caf=NoCafRefs,
 Str=<S,1*U><L,U>,
 Unf=OtherCon []] =
    \r [b_s4uI n1_s4uJ]
        case b_s4uI of wild_s4uK [Occ=Once*] {
          Hoopl.Block.BlockCO f_s4uL [Occ=Once] b1_s4uM [Occ=Once] -> ...
          Hoopl.Block.BNil -> Hoopl.Block.$WBMiddle n1_s4uJ;
          Hoopl.Block.BMiddle _ [Occ=Dead] -> ...
          ...
        };
end Rec }

The alternative Hoopl.Block.$WBMiddle n1_s4uJ; here is translated to a call the the wrapper in Cmm:

       c4SL: // global
           R2 = _s4uJ::P64;   // CmmAssign
           Sp = Sp + 16;   // CmmAssign
           call Hoopl.Block.$WBMiddle_info(R2) args: 8, res: 0, upd: 8;   // CmmCall

What I think we would WANT to happen is $WBMiddle get's inlined into blockSnoc. This exposes the worker, fully saturated. As a consequence we allocate the Constructor inline like this:

           I64[Hp - 8] = Hoopl.Block.BMiddle_con_info;   // CmmStore
           P64[Hp] = R2;   // CmmStore
           R1 = Hp - 7;   // CmmAssign
           call (P64[Sp])(R1) args: 8, res: 0, upd: 8;   // CmmCall

What does happen is that the Wrapper doesn't get inlined. So instead of allocating inline we generate a function call into the wrapper which then allocates the constructor.

It's not horrible, but the call does have an overhead that allocating the constructor directly wouldn't have.

At least based on the runtime representation we shouldn't use a Wrapper at all. Which I assume is harder to determine at the Core level.

Either way I came across this by chance and thought I should write it down.

Edited by Andreas Klebinger
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information