Late lambda lifting and multiple arguments
Consider
module LL where
f xs x a b c d e f = let g :: Bool -> [a] -> ([a], Int, Int, Int, Int, Int, Int)
g True ys = (ys, a, b, c, d, e, f)
g False ys = g x ys
in case g x xs of { (xs', _, _, _, _, _, _) ->
case g x xs' of { (xs'', _, _, _, _, _, _) ->
(xs', xs'') } }
Currently late-lambda-lifting does not lift out g, because the lifted function has a lot of arguments. But actually lifting up to top level will reduce allocation (by not allocating g) in exchange for some stack shuffling. The wiki page and draft paper just mutter about register pressure.
My instinct is that lifting is a win, no matter how many arguments -- or at least that the threshold should be significantly larger than currently.
So this ticket is just to suggest that some expermimentation with -fstg-lift-lams-non-rec-args and -fstg-lift-lams-rec-args could be profitable.
I tripped over this when doing some performance debugging on another MR (!7847 (closed)), in nofib/real/smallpt. For some unrelated reason, instead of
let g = \xy. blah in foldr k z [a,b,a]
we were unrolling the loop to
case g c z of r1 -> case g b r1 of r2 -> ...
In the former case, once foldr was inlined we could inline g so in the end it was never allocated. But in the unrolled version it was.