Silly worker/wrapper split
Cnnsider this program:
flags (Options f x)
= reverse (reverse (reverse (reverse (reverse (reverse (reverse (reverse x)))))))
`seq` f
data X = X String Bool Bool Bool Bool
data Options = Options !X [Int]
All the reverse
stuff is just a trick to make the function big, so that we will
get a worker/wapper split. (Small functions are not split.)
But basically flags
is just a record selector.
Now look at what we get:
$wflags_sxd :: String -> Bool -> Bool -> Bool -> Bool -> [Int]
-> (# String, Bool, Bool, Bool, Bool #)
[Str=<L,U><L,U><L,U><L,U><L,U><S,1*U>]
$wflags_sxd
= \ (ww_sx2 :: String)
(ww_sx3 :: Bool)
(ww_sx4 :: Bool)
(ww_sx5 :: Bool)
(ww_sx6 :: Bool)
(ww_sx8 :: [Int]) ->
case GHC.List.reverse1
@Int
(GHC.List.reverse1
@Int
(GHC.List.reverse1
@Int
(GHC.List.reverse1
@Int
(GHC.List.reverse1
@Int
(GHC.List.reverse1
@Int
(GHC.List.reverse1
@Int
(GHC.List.reverse1 @Int ww_sx8 (GHC.Types.[] @Int))
(GHC.Types.[] @Int))
(GHC.Types.[] @Int))
(GHC.Types.[] @Int))
(GHC.Types.[] @Int))
(GHC.Types.[] @Int))
(GHC.Types.[] @Int))
(GHC.Types.[] @Int)
of
{ __DEFAULT ->
(# ww_sx2, ww_sx3, ww_sx4, ww_sx5, ww_sx6 #)
}
flags [InlPrag=NOUSERINLINE[2]] :: Options -> X
[Str=<S(SS),U(U(U,U,U,U,U),U)>,]
flags
= \ (w_swW :: Options) ->
case w_swW of
{ Options ww_swZ [Dmd=<S,1*U(U,U,U,U,U)>] ww_sx8 [Dmd=<S,1*U>] ->
case ww_swZ of { X ww_sx2 ww_sx3 ww_sx4 ww_sx5 ww_sx6 ->
case $wflags_sxd ww_sx2 ww_sx3 ww_sx4 ww_sx5 ww_sx6 ww_sx8 of
{ (# ww_sxe, ww_sxf, ww_sxg, ww_sxh, ww_sxi #) ->
Bug.X ww_sxe ww_sxf ww_sxg ww_sxh ww_sxi
}
}
}
This is utterly bonkers. Why does the wrapper unpacking the X field of Options
, pass them
to the worker, which returns them, only for the wrapper to re-assemble the X value.
This came up when investigating the perf regression on record selectors described in !2873 (closed).
The culprit is this code in DmdAnal
:
mkRhsDmd :: AnalEnv -> Arity -> CoreExpr -> CleanDemand
mkRhsDmd env rhs_arity rhs =
case peelTsFuns rhs_arity (findTypeShape (ae_fam_envs env) (exprType rhs)) of
Just (TsProd tss) -> mkCallDmds rhs_arity (cleanEvalProdDmd (length tss))
_ -> mkCallDmds rhs_arity cleanEvalDmd
cleanEvalDmd :: CleanDemand
cleanEvalDmd = JD { sd = HeadStr, ud = Used }
cleanEvalProdDmd :: Arity -> CleanDemand
cleanEvalProdDmd n = JD { sd = HeadStr, ud = UProd (replicate n useTop) }
This function constructs the "vanilla RHS demand" used by the demand analyser
for the RHS of a binding. But the case-expression arranges that when the
result type is a product we use cleanEvalProdDmd
. That is what gives the funny looking
strictness for flags
[Str=<S(SS),U( U(U,U,U,U,U), U)>,]
And it's that inner U(U,U,U,U,U)
that causes the extra unpacking.
This bit of code is justified in Note [Product demands for function body]
,
and that in turn was introduced seven years ago:
commit 99d4e5b4a0bd32813ff8c74e91d2dcf6b3555176
Author: Simon Peyton Jones <simonpj@microsoft.com>
Date: Fri May 3 14:50:58 2013 +0100
Implement cardinality analysis
Now, it turns out that things have changed in the meantime. Simply removing the special case, to give
mkRhsDmd :: AnalEnv -> Arity -> CoreExpr -> CleanDemand
mkRhsDmd env rhs_arity rhs = mkCallDmds rhs_arity cleanEvalDmd
still correctly optimises the check
function in binary_trees
, which is
the function idenified in Note [Product demands for function body]
.
In other words, removing the special case still does the optimisation
we want -- but also stops the stupid w/w split above.
TL;DR. Let's try this new, simpler mkRhsDmd
.