Late lambda lifting and multiple arguments
Consider
module LL where
f xs x a b c d e f = let g :: Bool -> [a] -> ([a], Int, Int, Int, Int, Int, Int)
g True ys = (ys, a, b, c, d, e, f)
g False ys = g x ys
in case g x xs of { (xs', _, _, _, _, _, _) ->
case g x xs' of { (xs'', _, _, _, _, _, _) ->
(xs', xs'') } }
Currently late-lambda-lifting does not lift out g
, because the lifted function has a lot of arguments. But actually lifting up to top level will reduce allocation (by not allocating g
) in exchange for some stack shuffling. The wiki page and draft paper just mutter about register pressure.
My instinct is that lifting is a win, no matter how many arguments -- or at least that the threshold should be significantly larger than currently.
So this ticket is just to suggest that some expermimentation with -fstg-lift-lams-non-rec-args
and -fstg-lift-lams-rec-args
could be profitable.
I tripped over this when doing some performance debugging on another MR (!7847 (closed)), in nofib/real/smallpt
. For some unrelated reason, instead of
let g = \xy. blah in foldr k z [a,b,a]
we were unrolling the loop to
case g c z of r1 -> case g b r1 of r2 -> ...
In the former case, once foldr
was inlined we could inline g
so in the end it was never allocated. But in the unrolled version it was.