Skip to content

Bad inlining cascade leads to slow optimisation

Consider this simple program

{-# INLINE [0] f #-}
f x y = case y of
         True  -> reverse x
         False -> reverse (reverse x)

foo a b = let x = [a,a,a,a]
          in f x b

If you compile with -DDEBUG you'll see

bash$ ghc-stage1 Foo.hs -O -fforce-recomp -c
WARNING: file compiler/simplCore/SimplCore.hs, line 670
  Simplifier bailing out after 4 iterations [17, 1, 1, 1]
    Size = {terms: 60, types: 49, coercions: 0, joins: 0/0}}}}

Yikes! One transformation per simplifier iteration! What is going on?

Before inlining f we have

(f [a,a,a,a] b)

Then, we inline, beta-reduce, and build let bindings thus

let x1 = a : []
    x2 = a : x1
    x3 = a : x2
    x4 = a : x3
in case b of
  True -> reverse x4
  False -> reverse (reverse x4)

So far so good. But then, bizarrely, we do postInlineUnconditionally on x4 (see comments in that function). But not on x3 because its occ-info is "once inside lambda". Then in the next iteration we postInlineUnconditionally x3, and so on.

Terrible, terrible. My thought: revisit these comments in postInlineUnconditionally:

      OneOcc { occ_in_lam = in_lam, occ_int_cxt = int_cxt }
               -- OneOcc => no code-duplication issue
        ->     smallEnoughToInline dflags unfolding     -- Small enough to dup
                        -- ToDo: consider discount on smallEnoughToInline if int_cxt is true
                        --
                        -- NB: Do NOT inline arbitrarily big things, even if one_br is True
                        -- Reason: doing so risks exponential behaviour.  We simplify a big
                        --         expression, inline it, and simplify it again.  But if the
                        --         very same thing happens in the big expression, we get
                        --         exponential cost!
                        -- PRINCIPLE: when we've already simplified an expression once,
                        -- make sure that we only inline it if it's reasonably small.

           && (not in_lam ||
                        -- Outside a lambda, we want to be reasonably aggressive
                        -- about inlining into multiple branches of case
                        -- e.g. let x = <non-value>
                        --      in case y of { C1 -> ..x..; C2 -> ..x..; C3 -> ... }
                        -- Inlining can be a big win if C3 is the hot-spot, even if
                        -- the uses in C1, C2 are not 'interesting'
                        -- An example that gets worse if you add int_cxt here is 'clausify'

What is particularly annoying in this case is that x is used in all code paths, so all this inlining simply duplicates code while gaining nothing.

Moreover, if we had

let x = blah
in
case y of
  A -> ...x...
  B -> ..x..x...
  C -> 

then the multiple uses of x in the B branch would disable this entire preInlineUnconditionally thing, even though it might be a good idea to push the allocation of x into the A and B branches, to avoid the C code path.

Trac metadata
Trac field Value
Version 8.0.1
Type Bug
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information