GHC 8.10 allocates heap memory for uses of constant GADT constructors
There appears to be a codegen regression around GADTs in GHC 8.10-rc1 (also present on HEAD). Here’s a program that illustrates the issue:
{-# OPTIONS_GHC -O2 -ddump-stg #-}
{-# LANGUAGE GADTs #-}
module M1 where
data T a where
C :: T ()
f :: (T () -> IO ()) -> IO ()
f g = g C >> g C
When compiling this on GHC 8.8.2, the STG output shows that both references to C
are compiled to a single statically-allocated closure, as I would expect:
M1.$WC = CCS_DONT_CARE M1.C! [];
M1.f1 =
\r [g_s1rK void_0E]
case g_s1rK M1.$WC GHC.Prim.void# of {
Unit# _ -> g_s1rK M1.$WC GHC.Prim.void#;
};
But on GHC 8.10, things go wrong, and GHC allocates two entirely new closures on the heap!
M1.f1 =
\r [g_sHU void_0E]
let { sat_sHW = CCCS M1.C! [];
} in
case g_sHU sat_sHW GHC.Prim.void# of {
Unit# _ ->
let { sat_sI0 = CCCS M1.C! []; } in g_sHU sat_sI0 GHC.Prim.void#;
};
Examining the output of -ddump-cmm
confirms that these really are two heap allocations:
cIq: // global
Hp = Hp + 16;
if (Hp > HpLim) (likely: False) goto cIs; else goto cIr;
cIs: // global
HpAlloc = 16;
goto cIp;
cIp: // global
R2 = _sHU::P64;
R1 = M1.f1_closure;
call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8;
cIr: // global
I64[Hp - 8] = M1.C_con_info;
I64[Sp - 16] = cIl;
R2 = Hp - 7;
R1 = _sHU::P64;
P64[Sp - 8] = _sHU::P64;
Sp = Sp - 16;
call stg_ap_pv_fast(R2,
R1) returns to cIl, args: 8, res: 8, upd: 8;
cIl: // global
Hp = Hp + 16;
if (Hp > HpLim) (likely: False) goto cIv; else goto cIu;
cIv: // global
HpAlloc = 16;
R1 = R1;
call stg_gc_unpt_r1(R1) returns to cIl, args: 8, res: 8, upd: 8;
cIu: // global
I64[Hp - 8] = M1.C_con_info;
R2 = Hp - 7;
R1 = P64[Sp + 8];
Sp = Sp + 16;
call stg_ap_pv_fast(R2, R1) args: 8, res: 0, upd: 8;
But that’s absurd, since C
is a constant! Compare that to the far superior output on GHC 8.8:
c1se: // global
I64[Sp - 16] = c1sa;
_s1rM::P64 = R2;
R2 = M1.$WC_closure+1;
R1 = _s1rM::P64;
P64[Sp - 8] = _s1rM::P64;
Sp = Sp - 16;
call stg_ap_pv_fast(R2,
R1) returns to c1sa, args: 8, res: 8, upd: 8;
c1sa: // global
R2 = M1.$WC_closure+1;
R1 = P64[Sp + 8];
Sp = Sp + 16;
call stg_ap_pv_fast(R2, R1) args: 8, res: 0, upd: 8;
This change hits extensible effects libraries particularly hard, since those libraries often define GADTs like these:
data Reader r a where
Ask :: Reader r r
data State s a where
Get :: State s s
Put :: s -> State s ()
It’s definitely unexpected that each use of Ask
and Get
would generate a separate heap allocation!