Late lambda lifting and multiple arguments

Consider

module LL where

f xs x a b c d e f = let g :: Bool -> [a] -> ([a], Int, Int, Int, Int, Int, Int)
                         g True  ys = (ys, a, b, c, d, e, f)
                         g False ys = g x ys
                     in case g x xs of  { (xs',  _, _, _, _, _, _) ->
                        case g x xs' of { (xs'', _, _, _, _, _, _) ->
                        (xs', xs'') } }

Currently late-lambda-lifting does not lift out g, because the lifted function has a lot of arguments. But actually lifting up to top level will reduce allocation (by not allocating g) in exchange for some stack shuffling. The wiki page and draft paper just mutter about register pressure.

My instinct is that lifting is a win, no matter how many arguments -- or at least that the threshold should be significantly larger than currently.

So this ticket is just to suggest that some expermimentation with -fstg-lift-lams-non-rec-args and -fstg-lift-lams-rec-args could be profitable.

I tripped over this when doing some performance debugging on another MR (!7847 (closed)), in nofib/real/smallpt. For some unrelated reason, instead of

   let g = \xy. blah in foldr k z [a,b,a]

we were unrolling the loop to

   case g c z of r1 -> case g b r1 of r2 -> ...

In the former case, once foldr was inlined we could inline g so in the end it was never allocated. But in the unrolled version it was.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information