Unnecessary reboxing
This problem came up when I was debugging regressions in some variants of !7029.
Problem
Consider this function:
f x = (g, g)
where
g :: Int -> IO Int
g y = do { if y>2 then print x else return ()
; foogle y }
foogle :: Int -> IO Int
{-# NOINLINE foogle #-}
foogle n = return n
We get this code for f, compiling with GHC 9.2 and -O:
Foo.$wf :: forall {a}. Show a => a -> (# Int -> IO Int, Int -> IO Int #)
Foo.$wf
= \ (@a_s1bJ) ($dShow_s1bK :: Show a_s1bJ) (x_s1bL :: a_s1bJ) ->
let {
lvl1_s1bo :: String
[LclId]
lvl1_s1bo = show @a_s1bJ $dShow_s1bK x_s1bL } in
let {
$wg_s1bH [InlPrag=[2], Dmd=LCL(C1(!L))]
:: GHC.Prim.Int#
-> GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld, Int #)
$wg_s1bH
= \ (ww_s1bC :: GHC.Prim.Int#)
(s_s1bE :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case GHC.Prim.># ww_s1bC 2# of {
__DEFAULT -> Foo.f1 (GHC.Types.I# ww_s1bC) s_s1bE;
1# ->
case GHC.IO.Handle.Text.hPutStr2
GHC.IO.Handle.FD.stdout lvl1_s1bo GHC.Types.True s_s1bE
of
{ (# ipv_a13o, ipv1_a13p #) ->
Foo.f1 (GHC.Types.I# ww_s1bC) ipv_a13o
}
} } in
let {
g_s1bj [InlPrag=[2]]
:: Int
-> GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld, Int #)
g_s1bj
= \ (y_s1bA :: Int)
(s_s1bE :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case y_s1bA of { GHC.Types.I# ww_s1bC ->
$wg_s1bH ww_s1bC s_s1bE
} } in
(# g_s1bj `cast` <blah blah>
g_s1bj `cast` <blah blah> #)
-
Problem 1. We worker/wrapper
geven though it is never applied to anything. That makes no sense; a caller of$wfwill get back a pair of functions; when it applies that function, it'll callgwhich will call$wg. we have allocated two closures instead of one, and slowed down calls a bit. Not great. -
Problem 2. The wrapper
g_s1bjunboxes the argumenty, and passes the contents to$wgwhich reboxes it in every control path! Now things are really worse; the caller passes a boxedInt;gunboxes it; and$wgreboxes it.
Diagnosis for Problem 2
At the demand analysis stage, f looks like this:
f = \ (@a_aUw)
($dShow_a12O [Dmd=MP(A,MCM(L),A)] :: Show a_aUw)
(x_auq :: a_aUw) ->
let {
lvl_s1bo :: String
[LclId,
Unf=Unf{Src=<vanilla>, TopLvl=False, Value=False, ConLike=False,
WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 30 0}]
lvl_s1bo = show @a_aUw $dShow_a12O x_auq } in
let {
g_s1bj
:: Int
-> GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld, Int #)
[LclId,
Arity=2,
Str=<1!L><L> {s1bo->M!L},
Unf=Unf{Src=<vanilla>, TopLvl=False, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [20 0] 121 0}]
g_s1bj
= \ (y_aus [Dmd=1!L] :: Int)
(s_a13m :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case y_aus of wild_a13t [Dmd=L!L] { GHC.Types.I# x_a13u ->
case GHC.Prim.># x_a13u 2# of {
__DEFAULT -> foogle_s13h wild_a13t s_a13m;
1# ->
case GHC.IO.Handle.Text.hPutStr2
GHC.IO.Handle.FD.stdout lvl_s1bo GHC.Types.True s_a13m
of
{ (# ipv_a13o, ipv1_a13p [Dmd=A] #) ->
foogle_s13h wild_a13t ipv_a13o
}
}
} } in
(g_s1bj `cast` .., g_s1bj `cast` ..)
So g is strict in y (since it case-analyses it immediately) but both branches use it boxed, via wild_a13t. Usually a "bothDmd" between a boxed demand (from the branches) and an unboxed demand (from the scrutinee) woul dyield a boxed demand, and w/w would not unbox it.
BUT the case hPutStr2 ... of ... messes it up. It calls deferAfterPreciseException on the demand from branches (a boxed demand)
deferAfterPreciseException :: DmdType -> DmdType
deferAfterPreciseException = lubDmdType exnDmdType
Now exnDmdType behaves like an unboxed demand, and lub betwen boxed and unboxed is (currently) unboxed. Result we give an unboxed demand oy y.
But that's obviously stupid. This deferAfterPreciseException should make things lazy, but no less boxed.
Cure for Problem 2
After some discussion with @sgraf812, we decided to make a speicialised version of lubDmdType for the call in deferAfterPreciseExeption, which makes boxed-ness win.
Cure for Problem 1
Problem 1 is quite separate. One possible cure is this:
- In the last run of the simplifier, discard the stable unfolding on wrappers for non-recursive, non-exported Ids.
If we discarded g's stable unfolding, there would be only one remaining call to $wg, namely in g's RHS, so the worker would inline back into the wrapper, effectively undoing w/w in the cases where it turned out to have no benefit.
Why non-exported? Because for exported Ids we may well get external calls that beneft from the wrapper.
Why non-recursive? Because if the worker is recursive, we won't be able to inline it into the wrapper anyway.
Where this came up
All this showed up when looking at a substantial regression in nofib/shootout/reverse-complement. That ultimately turned out to be extra allocation in GHC.IO.Handle.Text.hGetBuf. I saw this code
GHC.IO.Handle.Text.$whGetBuf
= \ (@a_s4L9)
(h_s4La :: Handle)
(ww_s4Ld :: Addr#)
(ww1_s4Lh :: Int#)
(eta_s4Lj [OS=OneShot] :: State# RealWorld) ->
case ww1_s4Lh of wild_X1 {
__DEFAULT ->
let {
wild1_a3XQ :: Int
[LclId, Unf=OtherCon []]
wild1_a3XQ = GHC.Types.I# wild_X1 } in
case <# wild_X1 0# of {
__DEFAULT ->
let {
ptr_X0 :: Ptr a_s4L9
[LclId, Unf=OtherCon []]
ptr_X0 = GHC.Ptr.Ptr @a_s4L9 ww_s4Ld } in
let {
$wact1_s4L7 [InlPrag=[2], Dmd=LCL(C1(C1(C1(C1(C1(C1(C1(!L))))))))]
:: forall {dev} {enc_state} {dec_state}.
(RawIO.RawIO dev, Buffered.BufferedIO dev) =>
dev
-> MutVar# RealWorld (Buffer Word8)
-> MutVar# RealWorld (dec_state, Buffer Word8)
-> MutVar# RealWorld (Buffer CharBufElem)
-> Maybe (GHC.IO.Encoding.Types.TextDecoder dec_state)
-> State# RealWorld
-> (# State# RealWorld, Int #)
[LclId,
Arity=8,
Str=<LP(LCL(C1(C1(C1(C1(P(L,1P(1L))))))),A,A,A)><LP(A,LCL(C1(C1(P(L,1P(1P(1L),L))))),A,A,A,A)><L><L><L><L><ML><L>,
Unf=OtherCon []]
$wact1_s4L7
= \ (@dev_s4KJ)
(@enc_state_s4KK)
(@dec_state_s4KL)
(ww2_s4KM :: RawIO.RawIO dev_s4KJ)
(ww3_s4KO :: Buffered.BufferedIO dev_s4KJ)
(ww4_s4KQ
:: dev_s4KJ
Unf=OtherCon [])
(ww5_s4KS :: MutVar# RealWorld (Buffer Word8))
(ww6_s4KU :: MutVar# RealWorld (dec_state_s4KL, Buffer Word8))
(ww7_s4KV :: MutVar# RealWorld (Buffer CharBufElem))
(ww8_s4KY
:: Maybe (GHC.IO.Encoding.Types.TextDecoder dec_state_s4KL))
(eta1_s4L4 [OS=OneShot] :: State# RealWorld) ->
case GHC.IO.Handle.Internals.$wflushCharReadBuffer
@dev_s4KJ
@enc_state_s4KK
@dec_state_s4KL
ww5_s4KS
ww6_s4KU
ww7_s4KV
ww8_s4KY
eta1_s4L4
of ww17_a4cj
{ __DEFAULT ->
case readMutVar# @RealWorld @(Buffer Word8) ww5_s4KS ww17_a4cj of
{ (# ipv_a3VH, ipv1_a3VI #) ->
case ipv1_a3VI of wild2_X5
{ Buffer bx_d3HU bx1_d3HV ds_d3pV bx2_d3HW bx3_d3HX bx4_d3HY
bx5_d3HZ ->
case ==# bx4_d3HY bx5_d3HZ of {
__DEFAULT ->
GHC.IO.Handle.Text.hGetBuf15
(GHC.IO.Handle.Types.Handle__
@dev_s4KJ
@enc_state_s4KK
@dec_state_s4KL
ww2_s4KM
(GHC.IO.Handle.Text.hGetBuf14 @dev_s4KJ)
ww3_s4KO
(GHC.IO.Handle.Text.hGetBuf13 @dev_s4KJ)
ww4_s4KQ
GHC.IO.Handle.Text.hGetBuf12
ww5_s4KS
GHC.IO.Handle.Text.hGetBuf11
ww6_s4KU
ww7_s4KV
(RUBBISH(UnliftedRep)
@(MutVar# RealWorld (BufferList CharBufElem)))
(GHC.IO.Handle.Text.hGetBuf10 @enc_state_s4KK)
ww8_s4KY
GHC.IO.Handle.Text.hGetBuf9
GHC.IO.Handle.Text.hGetBuf8
GHC.IO.Handle.Text.hGetBuf8
GHC.IO.Handle.Text.hGetBuf7)
wild2_X5
(ptr_X0
`cast` ((Ptr Univ(phantom phantom <*>_N :: a_s4L9, Word8))_R
:: Ptr a_s4L9 ~R# Ptr Word8))
GHC.IO.Handle.Text.hGetBuf6
wild1_a3XQ
ipv_a3VH;
1# ->
GHC.IO.Handle.Text.hGetBuf5
(GHC.IO.Handle.Types.Handle__
@dev_s4KJ
@enc_state_s4KK
@dec_state_s4KL
ww2_s4KM
(GHC.IO.Handle.Text.hGetBuf14 @dev_s4KJ)
ww3_s4KO
(GHC.IO.Handle.Text.hGetBuf13 @dev_s4KJ)
ww4_s4KQ
GHC.IO.Handle.Text.hGetBuf12
ww5_s4KS
GHC.IO.Handle.Text.hGetBuf11
ww6_s4KU
ww7_s4KV
(RUBBISH(UnliftedRep)
@(MutVar# RealWorld (BufferList CharBufElem)))
(GHC.IO.Handle.Text.hGetBuf10 @enc_state_s4KK)
ww8_s4KY
GHC.IO.Handle.Text.hGetBuf9
GHC.IO.Handle.Text.hGetBuf8
GHC.IO.Handle.Text.hGetBuf8
GHC.IO.Handle.Text.hGetBuf7)
wild2_X5
(ptr_X0
`cast` ((Ptr Univ(phantom phantom <*>_N :: a_s4L9, Word8))_R
:: Ptr a_s4L9 ~R# Ptr Word8))
GHC.IO.Handle.Text.hGetBuf6
wild1_a3XQ
ipv_a3VH
}
}
}
} } in
let {
act1_s4gq [InlPrag=[2], Dmd=LCL(C1(L))]
:: Handle__ -> State# RealWorld -> (# State# RealWorld, Int #)
[LclId,
Arity=2,
Str=<1!P(LP(LCL(C1(C1(C1(C1(P(L,1P(1L))))))),A,A,A),A,LP(A,LCL(C1(C1(P(L,1P(1P(1L),L))))),A,A,A,A),A,L,A,L,A,L,L,A,A,ML,A,A,A,A)><L>,
Unf=Unf{Src=InlineStable, TopLvl=False, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=2,unsat_ok=True,boring_ok=False)
Tmpl= \ (h__s4KH [Occ=Once1!] :: Handle__)
(eta1_s4L4 [Occ=Once1, OS=OneShot] :: State# RealWorld) ->
case h__s4KH of
{ Handle__ @dev_s4KJ @enc_state_s4KK @dec_state_s4KL
ww2_s4KM [Occ=Once1] _ [Occ=Dead] ww4_s4KO [Occ=Once1]
_ [Occ=Dead] ww6_s4KQ [Occ=Once1] _ [Occ=Dead]
ww8_s4KS [Occ=Once1] _ [Occ=Dead] ww10_s4KU [Occ=Once1]
ww11_s4KV [Occ=Once1] _ [Occ=Dead] _ [Occ=Dead]
ww14_s4KY [Occ=Once1] _ [Occ=Dead] _ [Occ=Dead]
_ [Occ=Dead] _ [Occ=Dead] ->
$wact1_s4L7
@dev_s4KJ
@enc_state_s4KK
@dec_state_s4KL
ww2_s4KM
ww4_s4KO
ww6_s4KQ
ww8_s4KS
ww10_s4KU
ww11_s4KV
ww14_s4KY
eta1_s4L4
}}]
act1_s4gq
= \ (h__s4KH :: Handle__)
(eta1_s4L4 [OS=OneShot] :: State# RealWorld) ->
case h__s4KH of
{ Handle__ @dev_s4KJ @enc_state_s4KK @dec_state_s4KL ww2_s4KM
ww3_s4KN ww4_s4KO ww5_s4KP ww6_s4KQ ww7_s4KR ww8_s4KS ww9_s4KT
ww10_s4KU ww11_s4KV ww12_s4KW ww13_s4KX ww14_s4KY ww15_s4KZ
ww16_s4L0 ww17_s4L1 ww18_s4L2 ->
$wact1_s4L7
@dev_s4KJ
@enc_state_s4KK
@dec_state_s4KL
ww2_s4KM
ww4_s4KO
ww6_s4KQ
ww8_s4KS
ww10_s4KU
ww11_s4KV
ww14_s4KY
eta1_s4L4
} } in
case h_s4La of wild2_a457 {
FileHandle ds_a459 bx_a45a ->
GHC.IO.Handle.Internals.$wwantReadableHandle'
@Int
GHC.IO.Handle.Text.hGetBuf3
wild2_a457
bx_a45a
(act1_s4gq
`cast` (<Handle__>_R
%<'Many>_N ->_R Sym (GHC.Types.N:IO[0] <Int>_R)
:: (Handle__ -> State# RealWorld -> (# State# RealWorld, Int #))
~R# (Handle__ -> IO Int)))
eta_s4Lj;
DuplexHandle ds_a45c bx_a45d bx1_a45e ->
GHC.IO.Handle.Internals.$wwantReadableHandle'
@Int
GHC.IO.Handle.Text.hGetBuf3
wild2_a457
bx_a45d
(act1_s4gq
`cast` (<Handle__>_R
%<'Many>_N ->_R Sym (GHC.Types.N:IO[0] <Int>_R)
:: (Handle__ -> State# RealWorld -> (# State# RealWorld, Int #))
~R# (Handle__ -> IO Int)))
eta_s4Lj
};
1# ->
GHC.IO.Handle.Text.hGetBuf2
@Int h_s4La GHC.IO.Handle.Text.hGetBuf3 wild1_a3XQ eta_s4Lj
};
0# -> (# eta_s4Lj, GHC.IO.Handle.Text.hGetBuf6 #)
}
Note that
-
act1_s4gqis passed to$wwantReadableHandle'; it is not applied -
act1_s4gqis a wrapper that unboxes its Handle argument - But the worker
$wact1_s4L7reboxes it, in every control path.