Skip to content

Draft: Prototype for deep inlining discounts.

Andreas Klebinger requested to merge wip/andreask/deep_discounts into master

The basic idea

See #21938 for the basic idea.

What does this MR change?

When computing unfolding guidance for a function f with an argument x and we see a case expression of the form

f x = 
  case x of
    Con1 c1_bndr1 c1_bndr1 -> rhs1
    Con2 c2_bndr2 c2_bndr2 -> rhs2

Then while traversing the rhss we not only look for uses of the top level arguments to give them a discount but we also look for uses of the constructor binders. If they are used in a interesting way we compute a discount for that use which we will only apply if the argument x is equal to the relevant constructor. And we do so recursively.

Here is a example produced with this branch:

-- Source definition
f :: Either Bool Int -> Int
f x = case x of
    Left l -> case l of
        True -> 1
        False -> sum [1..200] :: Int
    Right r -> case r of
        r -> r + succ r

Which gives this guidance

Guidance=IF_ARGS 
    [some_con:62,
      Left:43
        [some_con:30, False:10[], True:49[]],
      Right:70
        [some_con:20, GHC.Types.I#:11[disc:30]]
        ]
    111 0

Which gives us both a default discount (essentially equivalent to what we get now) and more specific discounts if the argument structure is known.

For example if we apply f (Left True) we add up Left:43 and True:49 to give us a discount of 92 which is almost as large as the function itself. Which makes sense since the argument will be merely the literal 1.

In order to make use of this structure when we compute argument summaries we now also turn core expressions into an abstraction of the AST. E.g we turn the argument in f (Left True) into ConArg "Left" [ConArg "True" []] which later on get's matched against the tree of discounts in the unfolding guidance.

There is a cutoff flag that controls how deep the unfolding guidance and argument abstractions are allowed to get to avoid compile time blowups.

Tradeoffs

This comes at a compile time cost of course. With the cutoff I chose the overhead is in the 0.7-1% range for compiler allocations and goes down to 0.4% if we limit nesting to a depth of one. The comments to the MR have more numbers, in particular !8778 (comment 447344)

Notes on not breaking existing code

I tried to ensure as much as possible that everything that currently inlines will also inline after this patch, while also inlining more in places where we can now tell that it makes sense. And it seems I managed surprisingly well to balance this since we improve both code size and runtime with this patch!

One peculiarity is that sometimes we can see that for a specific constructor we should give less discount than is the current minimum for a value argument. The current default discount for a seq on a argument is 20. However if we have code like case x of I# x' -> x the x being known to be I# only reduces the size by 11.

If we simple take this at face value this means going from f x with x being known to be a value but "unknown" constructor to f (I# x) results in a smaller discount even though it eliminates more work. So I just round up the discount to the default case.

One example where we might end up inlining less now despite our best efforts is code like

let a = ... (f (SomeCon SomeMoreCons)) ...
let b = ... a ...

We might now inline f since we can tell good things will happen if we do so. Which can stop a from inlining into b since a is larger now. But such is life.


The code isn't quite polished, some notes need to be written, tests need updating etc. But I would like to know if others think this rework is worthwhile before polishing it further.

Edited by Andreas Klebinger

Merge request reports