Skip to content

GHC 8.10 allocates heap memory for uses of constant GADT constructors

There appears to be a codegen regression around GADTs in GHC 8.10-rc1 (also present on HEAD). Here’s a program that illustrates the issue:

{-# OPTIONS_GHC -O2 -ddump-stg #-}
{-# LANGUAGE GADTs #-}
module M1 where

data T a where
  C :: T ()

f :: (T () -> IO ()) -> IO ()
f g = g C >> g C

When compiling this on GHC 8.8.2, the STG output shows that both references to C are compiled to a single statically-allocated closure, as I would expect:

M1.$WC = CCS_DONT_CARE M1.C! [];

M1.f1 =
    \r [g_s1rK void_0E]
        case g_s1rK M1.$WC GHC.Prim.void# of {
          Unit# _ -> g_s1rK M1.$WC GHC.Prim.void#;
        };

But on GHC 8.10, things go wrong, and GHC allocates two entirely new closures on the heap!

M1.f1 =
    \r [g_sHU void_0E]
        let { sat_sHW = CCCS M1.C! [];
        } in
          case g_sHU sat_sHW GHC.Prim.void# of {
          Unit# _ ->
          let { sat_sI0 = CCCS M1.C! []; } in  g_sHU sat_sI0 GHC.Prim.void#;
          };

Examining the output of -ddump-cmm confirms that these really are two heap allocations:

       cIq: // global
           Hp = Hp + 16;
           if (Hp > HpLim) (likely: False) goto cIs; else goto cIr;
       cIs: // global
           HpAlloc = 16;
           goto cIp;
       cIp: // global
           R2 = _sHU::P64;
           R1 = M1.f1_closure;
           call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8;
       cIr: // global
           I64[Hp - 8] = M1.C_con_info;
           I64[Sp - 16] = cIl;
           R2 = Hp - 7;
           R1 = _sHU::P64;
           P64[Sp - 8] = _sHU::P64;
           Sp = Sp - 16;
           call stg_ap_pv_fast(R2,
                               R1) returns to cIl, args: 8, res: 8, upd: 8;
       cIl: // global
           Hp = Hp + 16;
           if (Hp > HpLim) (likely: False) goto cIv; else goto cIu;
       cIv: // global
           HpAlloc = 16;
           R1 = R1;
           call stg_gc_unpt_r1(R1) returns to cIl, args: 8, res: 8, upd: 8;
       cIu: // global
           I64[Hp - 8] = M1.C_con_info;
           R2 = Hp - 7;
           R1 = P64[Sp + 8];
           Sp = Sp + 16;
           call stg_ap_pv_fast(R2, R1) args: 8, res: 0, upd: 8;

But that’s absurd, since C is a constant! Compare that to the far superior output on GHC 8.8:

       c1se: // global
           I64[Sp - 16] = c1sa;
           _s1rM::P64 = R2;
           R2 = M1.$WC_closure+1;
           R1 = _s1rM::P64;
           P64[Sp - 8] = _s1rM::P64;
           Sp = Sp - 16;
           call stg_ap_pv_fast(R2,
                               R1) returns to c1sa, args: 8, res: 8, upd: 8;
       c1sa: // global
           R2 = M1.$WC_closure+1;
           R1 = P64[Sp + 8];
           Sp = Sp + 16;
           call stg_ap_pv_fast(R2, R1) args: 8, res: 0, upd: 8;

This change hits extensible effects libraries particularly hard, since those libraries often define GADTs like these:

data Reader r a where
  Ask :: Reader r r

data State s a where
  Get :: State s s
  Put :: s -> State s ()

It’s definitely unexpected that each use of Ask and Get would generate a separate heap allocation!

Edited by Alexis King
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information