Improve FloatIn for case expressions
Related tickets:
Consider this function:
f :: Int -> Int -> (Int,Int)
f z x = let y = x+1 in
case z of
0 -> (y,y)
1 -> (y,y+1)
_ -> (99,1)
It would be better if we floated that y
thunk into the two case branches, thus
f z x = let y = x+1 in
case z of
0 -> let y = x+1 in (y,y)
1 -> let y = x+1 in (y,y+1)
_ -> (99,1)
We get a bit of code duplication (just as we do with inlining) but in exchange we may never allocate that thunk, which may matter if the default branch is the hot path.
Also y
might be used strictly. Suppose one RHS was simply g y y
where g
is strict.
Then if we float the thunk into that RHS we can use call-by-value for it.
Currently we sort-of handle this case in postInlineUnconditionally
; see
this Note in GHC.Core.Opt.Simplify.Utils:
{- Note [Inline small things to avoid creating a thunk]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The point of examining occ_info here is that for *non-values* that
occur outside a lambda, the call-site inliner won't have a chance
(because it doesn't know that the thing only occurs once). The
pre-inliner won't have gotten it either, if the thing occurs in more
than one branch So the main target is things like
let x = f y in
case v of
True -> case x of ...
False -> case x of ...
This is very important in practice; e.g. wheel-seive1 doubles
in allocation if you miss this out. And bits of GHC itself start
to allocate more. An egregious example is test perf/compiler/T14697,
where GHC.Driver.CmdLine.$wprocessArgs allocated hugely more.
But it's very ad-hoc. It only applies if x
is used exactly once in each branch!
A better plan
A better plan would be for FloatIn to float y
inwards, perhaps entirely replacing this
stuff in postInlineUnconditionally
. This ticket records the idea.
A problem
Consider
foo1 z = let x = <thunk> in
join j y = .x..x... in
case z of
A -> j 1
B -> j 2
C -> x+1
foo2 z = let x = <thunk> in
join j y = ...x... in
case h x z of
A -> j 1
B -> j 2
C -> x+1
- In the case of
foo1
we can floatx
inwards, both into the RHS of the join pointj
, and into the C branch. - But we must not do that for
foo2
because it would lose sharing. We would evaluate<thunk>
twice, once in the callh <thunk z
, and once in the RHS ofj
Annoyingly, if we inlined the join point we would do the Right Thing in both cases.
How can we do the Right Thing even when the join point is there?
Note that no occurrence analysis (which pins a single OccInfo on each binder) can get this right. Consider this combination of foo1
and foo2
:
foo1 z = let x = <thunk> in
join j1 y = .x..x... in
case z of
A -> j1 1
B -> j1 2
C -> join j2 p = ...x...
case h x z of
A -> j2 1
B -> j2 2
C -> x+1
Here we can push x
into the RHS of j1
and into the C
branch; but not into the RHS of j2
for the same reason as foo2
.