Bad inlining cascade leads to slow optimisation
Consider this simple program
{-# INLINE [0] f #-}
f x y = case y of
True -> reverse x
False -> reverse (reverse x)
foo a b = let x = [a,a,a,a]
in f x b
If you compile with -DDEBUG you'll see
bash$ ghc-stage1 Foo.hs -O -fforce-recomp -c
WARNING: file compiler/simplCore/SimplCore.hs, line 670
Simplifier bailing out after 4 iterations [17, 1, 1, 1]
Size = {terms: 60, types: 49, coercions: 0, joins: 0/0}}}}
Yikes! One transformation per simplifier iteration! What is going on?
Before inlining f
we have
(f [a,a,a,a] b)
Then, we inline, beta-reduce, and build let bindings thus
let x1 = a : []
x2 = a : x1
x3 = a : x2
x4 = a : x3
in case b of
True -> reverse x4
False -> reverse (reverse x4)
So far so good. But then, bizarrely, we do postInlineUnconditionally
on x4
(see comments in that function). But not on x3
because its occ-info is "once inside lambda". Then in the next iteration we postInlineUnconditionally
x3
, and so on.
Terrible, terrible. My thought: revisit these comments in postInlineUnconditionally
:
OneOcc { occ_in_lam = in_lam, occ_int_cxt = int_cxt }
-- OneOcc => no code-duplication issue
-> smallEnoughToInline dflags unfolding -- Small enough to dup
-- ToDo: consider discount on smallEnoughToInline if int_cxt is true
--
-- NB: Do NOT inline arbitrarily big things, even if one_br is True
-- Reason: doing so risks exponential behaviour. We simplify a big
-- expression, inline it, and simplify it again. But if the
-- very same thing happens in the big expression, we get
-- exponential cost!
-- PRINCIPLE: when we've already simplified an expression once,
-- make sure that we only inline it if it's reasonably small.
&& (not in_lam ||
-- Outside a lambda, we want to be reasonably aggressive
-- about inlining into multiple branches of case
-- e.g. let x = <non-value>
-- in case y of { C1 -> ..x..; C2 -> ..x..; C3 -> ... }
-- Inlining can be a big win if C3 is the hot-spot, even if
-- the uses in C1, C2 are not 'interesting'
-- An example that gets worse if you add int_cxt here is 'clausify'
What is particularly annoying in this case is that x
is used in all code paths, so all this inlining simply duplicates code while gaining nothing.
Moreover, if we had
let x = blah
in
case y of
A -> ...x...
B -> ..x..x...
C ->
then the multiple uses of x
in the B
branch would disable this entire preInlineUnconditionally
thing, even though it might be a good idea to push the allocation of x
into the A
and B
branches, to avoid the C
code path.
Trac metadata
Trac field | Value |
---|---|
Version | 8.0.1 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |