Commit dfe536be authored by batterseapower's avatar batterseapower

Give a unfolding argument discount proportional to the number of available arguments

Ensures that h1 gets inlined into its use sites in cases like:

"""
h1 k = k undefined undefined undefined
        undefined undefined undefined
        undefined undefined undefined
        undefined undefined undefined
        undefined undefined undefined
        undefined undefined undefined

a = h1 (\x _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ -> x)
b = h1 (\_ x _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ -> x)
"""

I've benchmarked this on nofib (albeit recompiling only the
benchmarks, not the library) and it hardly shifts the numbers - binary
size is up by 0.1% at most (average 0.0%) and the worst-case
allocation increase is 0.2% (best case -0.1%, 0.0% average).

If you also rebuild the libraries with this change, the only further
change is a +0.2% allocation increase in cacheprof. So this looks like
a pretty low-risk change that will considerably benefit certain
programs.
parent 28f783f1
......@@ -502,6 +502,81 @@ sizeExpr bOMB_OUT_SIZE top_args expr
d2 -- Ignore d1
\end{code}
Note [Function application discount]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I noticed that the output of the supercompiler generates a lot of code
with this form:
"""
module Inlining where
h1 k = k undefined undefined undefined
undefined undefined undefined
undefined undefined undefined
undefined undefined undefined
undefined undefined undefined
undefined undefined undefined
a = h1 (\x _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ -> x)
b = h1 (\_ x _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ -> x)
c = h1 (\_ _ x _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ -> x)
d = h1 (\_ _ _ x _ _ _ _ _ _ _ _ _ _ _ _ _ _ -> x)
e = h1 (\_ _ _ _ x _ _ _ _ _ _ _ _ _ _ _ _ _ -> x)
f = h1 (\_ _ _ _ _ x _ _ _ _ _ _ _ _ _ _ _ _ -> x)
g = h1 (\_ _ _ _ _ _ x _ _ _ _ _ _ _ _ _ _ _ -> x)
h = h1 (\_ _ _ _ _ _ _ x _ _ _ _ _ _ _ _ _ _ -> x)
i = h1 (\_ _ _ _ _ _ _ _ x _ _ _ _ _ _ _ _ _ -> x)
j = h1 (\_ _ _ _ _ _ _ _ _ x _ _ _ _ _ _ _ _ -> x)
k = h1 (\_ _ _ _ _ _ _ _ _ _ x _ _ _ _ _ _ _ -> x)
l = h1 (\_ _ _ _ _ _ _ _ _ _ _ x _ _ _ _ _ _ -> x)
m = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ x _ _ _ _ _ -> x)
n = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ _ x _ _ _ _ -> x)
o = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ _ _ x _ _ _ -> x)
p = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ x _ _ -> x)
q = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ x _ -> x)
r = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ x -> x)
"""
With GHC head the applications of h1 are not inlined, which hurts the
quality of the generated code a bit. I was wondering why h1 wasn't
getting inlined into each of "a" to "i" - after all, it has a manifest
lambda argument.
It turns out that the code in CoreUnfold gives a fixed discount of
opt_UF_FunAppDiscount to a function argument such as "k" if it applied
to any arguments. This is enough to ensure that h1 is inlined if the number
of arguments applied to k is below a certain limit, but if many arguments are
applied to k then the fixed discount can't overcome the size of the
chain of apps, and h1 is never inlined.
My proposed solution is to change CoreUnfold.funSize so that longer
chains of arguments being applied to a lambda-bound function give a
bigger discount. The motivation for this is that we would *generally*
expect that the lambda at the callsite has enough lambdas such that
all of the applications within the body can be beta-reduced away. This
change might lead to over eager inlining in cases like this, though:
{{{
h1 k = k x y z
{-# NOINLINE g #-}
g = ...
main = ... h1 (\x -> g x) ...
}}}
In this case we aren't able to beta-reduce away all of the
applications in the body of h1 because the lambda at the call site
only binds 1 argument, not the 3 allowed by the type. I don't expect
this case to be particularly common, however.
I chose the bonus to be (size - 20) so that application to 1 arg got
same bonus as the old fixed bonus (i.e. opt_UF_FunAppDiscount, which is 60).
If you have the bonus being (size - 40) then $fMonad[]_$c>>= with interesting
2nd arg doesn't inline in cryptarithm2 so we lose some deforestation, and
overall binary size hardly falls.
\begin{code}
-- | Finds a nominal size of a string literal.
litSize :: Literal -> Int
......@@ -541,13 +616,15 @@ funSize top_args fun n_val_args
where
some_val_args = n_val_args > 0
-- See Note [Function application discount]
arg_discount | some_val_args && fun `elem` top_args
= unitBag (fun, opt_UF_FunAppDiscount)
= unitBag (fun, opt_UF_FunAppDiscount + (size - 20))
| otherwise = emptyBag
-- If the function is an argument and is applied
-- to some values, give it an arg-discount
res_discount | idArity fun > n_val_args = opt_UF_FunAppDiscount
-- See Note [Function application discount]
res_discount | idArity fun > n_val_args = opt_UF_FunAppDiscount + (size - 20)
| otherwise = 0
-- If the function is partially applied, show a result discount
size | some_val_args = 10 * (1 + n_val_args)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment