Don't inline top level WHNFs; and preInlineUnconditionally
Consider a program like this which has a lot of static data:
x1 = a1 : []
x2 = a2 : x2
x3 = a3 : x2
h = f (g x3)
Currently GHC inlines x3,x2,x1, which is silly. They probably ended up at top level because of floating, and there is no good reason to inline them. And if we are just going to re-inline them it's silly to float them out in the first place.
There is a mysterious comment in preInlineUnconditionally
saying
early_phase = case sm_phase mode of
Phase 0 -> False
_ -> True
-- If we don't have this early_phase test, consider
-- x = length [1,2,3]
-- The full laziness pass carefully floats all the cons cells to
-- top level, and preInlineUnconditionally floats them all back in.
-- Result is (a) static allocation replaced by dynamic allocation
-- (b) many simplifier iterations because this tickles
-- a related problem; only one inlining per pass
--
-- On the other hand, I have seen cases where top-level fusion is
-- lost if we don't inline top level thing (e.g. string constants)
-- Hence the test for phase zero (which is the phase for all the final
-- simplifications). Until phase zero we take no special notice of
-- top level things, but then we become more leery about inlining
-- them.
but it's from many years ago.
I think it'd be better to keep them at top level throughout.
On the other hand, we can get a lot of let-bindings, which makes many passes slower. So an alternative would be
- Don't float them to top level
- Do inline them
until late in compilation where we can
- Float them to top level (to get static allocation -- a bit like late lambda lifting)
- Don't inline them
Notes:
- We should always float to top level if doing so escapes a value lambda; that makes inlining more likely to happen. An we won't inline back in.
- Literal strings are worth thinking about specially.