Unnecessary reboxing
This problem came up when I was debugging regressions in some variants of !7029.
Problem
Consider this function:
f x = (g, g)
where
g :: Int -> IO Int
g y = do { if y>2 then print x else return ()
; foogle y }
foogle :: Int -> IO Int
{-# NOINLINE foogle #-}
foogle n = return n
We get this code for f
, compiling with GHC 9.2 and -O:
Foo.$wf :: forall {a}. Show a => a -> (# Int -> IO Int, Int -> IO Int #)
Foo.$wf
= \ (@a_s1bJ) ($dShow_s1bK :: Show a_s1bJ) (x_s1bL :: a_s1bJ) ->
let {
lvl1_s1bo :: String
[LclId]
lvl1_s1bo = show @a_s1bJ $dShow_s1bK x_s1bL } in
let {
$wg_s1bH [InlPrag=[2], Dmd=LCL(C1(!L))]
:: GHC.Prim.Int#
-> GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld, Int #)
$wg_s1bH
= \ (ww_s1bC :: GHC.Prim.Int#)
(s_s1bE :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case GHC.Prim.># ww_s1bC 2# of {
__DEFAULT -> Foo.f1 (GHC.Types.I# ww_s1bC) s_s1bE;
1# ->
case GHC.IO.Handle.Text.hPutStr2
GHC.IO.Handle.FD.stdout lvl1_s1bo GHC.Types.True s_s1bE
of
{ (# ipv_a13o, ipv1_a13p #) ->
Foo.f1 (GHC.Types.I# ww_s1bC) ipv_a13o
}
} } in
let {
g_s1bj [InlPrag=[2]]
:: Int
-> GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld, Int #)
g_s1bj
= \ (y_s1bA :: Int)
(s_s1bE :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case y_s1bA of { GHC.Types.I# ww_s1bC ->
$wg_s1bH ww_s1bC s_s1bE
} } in
(# g_s1bj `cast` <blah blah>
g_s1bj `cast` <blah blah> #)
-
Problem 1. We worker/wrapper
g
even though it is never applied to anything. That makes no sense; a caller of$wf
will get back a pair of functions; when it applies that function, it'll callg
which will call$wg
. we have allocated two closures instead of one, and slowed down calls a bit. Not great. -
Problem 2. The wrapper
g_s1bj
unboxes the argumenty
, and passes the contents to$wg
which reboxes it in every control path! Now things are really worse; the caller passes a boxedInt
;g
unboxes it; and$wg
reboxes it.
Diagnosis for Problem 2
At the demand analysis stage, f
looks like this:
f = \ (@a_aUw)
($dShow_a12O [Dmd=MP(A,MCM(L),A)] :: Show a_aUw)
(x_auq :: a_aUw) ->
let {
lvl_s1bo :: String
[LclId,
Unf=Unf{Src=<vanilla>, TopLvl=False, Value=False, ConLike=False,
WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 30 0}]
lvl_s1bo = show @a_aUw $dShow_a12O x_auq } in
let {
g_s1bj
:: Int
-> GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld, Int #)
[LclId,
Arity=2,
Str=<1!L><L> {s1bo->M!L},
Unf=Unf{Src=<vanilla>, TopLvl=False, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [20 0] 121 0}]
g_s1bj
= \ (y_aus [Dmd=1!L] :: Int)
(s_a13m :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case y_aus of wild_a13t [Dmd=L!L] { GHC.Types.I# x_a13u ->
case GHC.Prim.># x_a13u 2# of {
__DEFAULT -> foogle_s13h wild_a13t s_a13m;
1# ->
case GHC.IO.Handle.Text.hPutStr2
GHC.IO.Handle.FD.stdout lvl_s1bo GHC.Types.True s_a13m
of
{ (# ipv_a13o, ipv1_a13p [Dmd=A] #) ->
foogle_s13h wild_a13t ipv_a13o
}
}
} } in
(g_s1bj `cast` .., g_s1bj `cast` ..)
So g
is strict in y
(since it case-analyses it immediately) but both branches use it boxed, via wild_a13t
. Usually a "bothDmd" between a boxed demand (from the branches) and an unboxed demand (from the scrutinee) woul dyield a boxed demand, and w/w would not unbox it.
BUT the case hPutStr2 ... of ...
messes it up. It calls deferAfterPreciseException
on the demand from branches (a boxed demand)
deferAfterPreciseException :: DmdType -> DmdType
deferAfterPreciseException = lubDmdType exnDmdType
Now exnDmdType
behaves like an unboxed demand, and lub betwen boxed and unboxed is (currently) unboxed. Result we give an unboxed demand oy y
.
But that's obviously stupid. This deferAfterPreciseException
should make things lazy, but no less boxed.
Cure for Problem 2
After some discussion with @sgraf812, we decided to make a speicialised version of lubDmdType
for the call in deferAfterPreciseExeption
, which makes boxed-ness win.
Cure for Problem 1
Problem 1 is quite separate. One possible cure is this:
- In the last run of the simplifier, discard the stable unfolding on wrappers for non-recursive, non-exported Ids.
If we discarded g's stable unfolding, there would be only one remaining call to $wg
, namely in g
's RHS, so the worker would inline back into the wrapper, effectively undoing w/w in the cases where it turned out to have no benefit.
Why non-exported? Because for exported Ids we may well get external calls that beneft from the wrapper.
Why non-recursive? Because if the worker is recursive, we won't be able to inline it into the wrapper anyway.
Where this came up
All this showed up when looking at a substantial regression in nofib/shootout/reverse-complement
. That ultimately turned out to be extra allocation in GHC.IO.Handle.Text.hGetBuf
. I saw this code
GHC.IO.Handle.Text.$whGetBuf
= \ (@a_s4L9)
(h_s4La :: Handle)
(ww_s4Ld :: Addr#)
(ww1_s4Lh :: Int#)
(eta_s4Lj [OS=OneShot] :: State# RealWorld) ->
case ww1_s4Lh of wild_X1 {
__DEFAULT ->
let {
wild1_a3XQ :: Int
[LclId, Unf=OtherCon []]
wild1_a3XQ = GHC.Types.I# wild_X1 } in
case <# wild_X1 0# of {
__DEFAULT ->
let {
ptr_X0 :: Ptr a_s4L9
[LclId, Unf=OtherCon []]
ptr_X0 = GHC.Ptr.Ptr @a_s4L9 ww_s4Ld } in
let {
$wact1_s4L7 [InlPrag=[2], Dmd=LCL(C1(C1(C1(C1(C1(C1(C1(!L))))))))]
:: forall {dev} {enc_state} {dec_state}.
(RawIO.RawIO dev, Buffered.BufferedIO dev) =>
dev
-> MutVar# RealWorld (Buffer Word8)
-> MutVar# RealWorld (dec_state, Buffer Word8)
-> MutVar# RealWorld (Buffer CharBufElem)
-> Maybe (GHC.IO.Encoding.Types.TextDecoder dec_state)
-> State# RealWorld
-> (# State# RealWorld, Int #)
[LclId,
Arity=8,
Str=<LP(LCL(C1(C1(C1(C1(P(L,1P(1L))))))),A,A,A)><LP(A,LCL(C1(C1(P(L,1P(1P(1L),L))))),A,A,A,A)><L><L><L><L><ML><L>,
Unf=OtherCon []]
$wact1_s4L7
= \ (@dev_s4KJ)
(@enc_state_s4KK)
(@dec_state_s4KL)
(ww2_s4KM :: RawIO.RawIO dev_s4KJ)
(ww3_s4KO :: Buffered.BufferedIO dev_s4KJ)
(ww4_s4KQ
:: dev_s4KJ
Unf=OtherCon [])
(ww5_s4KS :: MutVar# RealWorld (Buffer Word8))
(ww6_s4KU :: MutVar# RealWorld (dec_state_s4KL, Buffer Word8))
(ww7_s4KV :: MutVar# RealWorld (Buffer CharBufElem))
(ww8_s4KY
:: Maybe (GHC.IO.Encoding.Types.TextDecoder dec_state_s4KL))
(eta1_s4L4 [OS=OneShot] :: State# RealWorld) ->
case GHC.IO.Handle.Internals.$wflushCharReadBuffer
@dev_s4KJ
@enc_state_s4KK
@dec_state_s4KL
ww5_s4KS
ww6_s4KU
ww7_s4KV
ww8_s4KY
eta1_s4L4
of ww17_a4cj
{ __DEFAULT ->
case readMutVar# @RealWorld @(Buffer Word8) ww5_s4KS ww17_a4cj of
{ (# ipv_a3VH, ipv1_a3VI #) ->
case ipv1_a3VI of wild2_X5
{ Buffer bx_d3HU bx1_d3HV ds_d3pV bx2_d3HW bx3_d3HX bx4_d3HY
bx5_d3HZ ->
case ==# bx4_d3HY bx5_d3HZ of {
__DEFAULT ->
GHC.IO.Handle.Text.hGetBuf15
(GHC.IO.Handle.Types.Handle__
@dev_s4KJ
@enc_state_s4KK
@dec_state_s4KL
ww2_s4KM
(GHC.IO.Handle.Text.hGetBuf14 @dev_s4KJ)
ww3_s4KO
(GHC.IO.Handle.Text.hGetBuf13 @dev_s4KJ)
ww4_s4KQ
GHC.IO.Handle.Text.hGetBuf12
ww5_s4KS
GHC.IO.Handle.Text.hGetBuf11
ww6_s4KU
ww7_s4KV
(RUBBISH(UnliftedRep)
@(MutVar# RealWorld (BufferList CharBufElem)))
(GHC.IO.Handle.Text.hGetBuf10 @enc_state_s4KK)
ww8_s4KY
GHC.IO.Handle.Text.hGetBuf9
GHC.IO.Handle.Text.hGetBuf8
GHC.IO.Handle.Text.hGetBuf8
GHC.IO.Handle.Text.hGetBuf7)
wild2_X5
(ptr_X0
`cast` ((Ptr Univ(phantom phantom <*>_N :: a_s4L9, Word8))_R
:: Ptr a_s4L9 ~R# Ptr Word8))
GHC.IO.Handle.Text.hGetBuf6
wild1_a3XQ
ipv_a3VH;
1# ->
GHC.IO.Handle.Text.hGetBuf5
(GHC.IO.Handle.Types.Handle__
@dev_s4KJ
@enc_state_s4KK
@dec_state_s4KL
ww2_s4KM
(GHC.IO.Handle.Text.hGetBuf14 @dev_s4KJ)
ww3_s4KO
(GHC.IO.Handle.Text.hGetBuf13 @dev_s4KJ)
ww4_s4KQ
GHC.IO.Handle.Text.hGetBuf12
ww5_s4KS
GHC.IO.Handle.Text.hGetBuf11
ww6_s4KU
ww7_s4KV
(RUBBISH(UnliftedRep)
@(MutVar# RealWorld (BufferList CharBufElem)))
(GHC.IO.Handle.Text.hGetBuf10 @enc_state_s4KK)
ww8_s4KY
GHC.IO.Handle.Text.hGetBuf9
GHC.IO.Handle.Text.hGetBuf8
GHC.IO.Handle.Text.hGetBuf8
GHC.IO.Handle.Text.hGetBuf7)
wild2_X5
(ptr_X0
`cast` ((Ptr Univ(phantom phantom <*>_N :: a_s4L9, Word8))_R
:: Ptr a_s4L9 ~R# Ptr Word8))
GHC.IO.Handle.Text.hGetBuf6
wild1_a3XQ
ipv_a3VH
}
}
}
} } in
let {
act1_s4gq [InlPrag=[2], Dmd=LCL(C1(L))]
:: Handle__ -> State# RealWorld -> (# State# RealWorld, Int #)
[LclId,
Arity=2,
Str=<1!P(LP(LCL(C1(C1(C1(C1(P(L,1P(1L))))))),A,A,A),A,LP(A,LCL(C1(C1(P(L,1P(1P(1L),L))))),A,A,A,A),A,L,A,L,A,L,L,A,A,ML,A,A,A,A)><L>,
Unf=Unf{Src=InlineStable, TopLvl=False, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=2,unsat_ok=True,boring_ok=False)
Tmpl= \ (h__s4KH [Occ=Once1!] :: Handle__)
(eta1_s4L4 [Occ=Once1, OS=OneShot] :: State# RealWorld) ->
case h__s4KH of
{ Handle__ @dev_s4KJ @enc_state_s4KK @dec_state_s4KL
ww2_s4KM [Occ=Once1] _ [Occ=Dead] ww4_s4KO [Occ=Once1]
_ [Occ=Dead] ww6_s4KQ [Occ=Once1] _ [Occ=Dead]
ww8_s4KS [Occ=Once1] _ [Occ=Dead] ww10_s4KU [Occ=Once1]
ww11_s4KV [Occ=Once1] _ [Occ=Dead] _ [Occ=Dead]
ww14_s4KY [Occ=Once1] _ [Occ=Dead] _ [Occ=Dead]
_ [Occ=Dead] _ [Occ=Dead] ->
$wact1_s4L7
@dev_s4KJ
@enc_state_s4KK
@dec_state_s4KL
ww2_s4KM
ww4_s4KO
ww6_s4KQ
ww8_s4KS
ww10_s4KU
ww11_s4KV
ww14_s4KY
eta1_s4L4
}}]
act1_s4gq
= \ (h__s4KH :: Handle__)
(eta1_s4L4 [OS=OneShot] :: State# RealWorld) ->
case h__s4KH of
{ Handle__ @dev_s4KJ @enc_state_s4KK @dec_state_s4KL ww2_s4KM
ww3_s4KN ww4_s4KO ww5_s4KP ww6_s4KQ ww7_s4KR ww8_s4KS ww9_s4KT
ww10_s4KU ww11_s4KV ww12_s4KW ww13_s4KX ww14_s4KY ww15_s4KZ
ww16_s4L0 ww17_s4L1 ww18_s4L2 ->
$wact1_s4L7
@dev_s4KJ
@enc_state_s4KK
@dec_state_s4KL
ww2_s4KM
ww4_s4KO
ww6_s4KQ
ww8_s4KS
ww10_s4KU
ww11_s4KV
ww14_s4KY
eta1_s4L4
} } in
case h_s4La of wild2_a457 {
FileHandle ds_a459 bx_a45a ->
GHC.IO.Handle.Internals.$wwantReadableHandle'
@Int
GHC.IO.Handle.Text.hGetBuf3
wild2_a457
bx_a45a
(act1_s4gq
`cast` (<Handle__>_R
%<'Many>_N ->_R Sym (GHC.Types.N:IO[0] <Int>_R)
:: (Handle__ -> State# RealWorld -> (# State# RealWorld, Int #))
~R# (Handle__ -> IO Int)))
eta_s4Lj;
DuplexHandle ds_a45c bx_a45d bx1_a45e ->
GHC.IO.Handle.Internals.$wwantReadableHandle'
@Int
GHC.IO.Handle.Text.hGetBuf3
wild2_a457
bx_a45d
(act1_s4gq
`cast` (<Handle__>_R
%<'Many>_N ->_R Sym (GHC.Types.N:IO[0] <Int>_R)
:: (Handle__ -> State# RealWorld -> (# State# RealWorld, Int #))
~R# (Handle__ -> IO Int)))
eta_s4Lj
};
1# ->
GHC.IO.Handle.Text.hGetBuf2
@Int h_s4La GHC.IO.Handle.Text.hGetBuf3 wild1_a3XQ eta_s4Lj
};
0# -> (# eta_s4Lj, GHC.IO.Handle.Text.hGetBuf6 #)
}
Note that
-
act1_s4gq
is passed to$wwantReadableHandle'
; it is not applied -
act1_s4gq
is a wrapper that unboxes its Handle argument - But the worker
$wact1_s4L7
reboxes it, in every control path.