Making inlining heuristics work for functions matching on nested data structures like generics.

Currently GHC does a terrible job at inlining functions which take apart or branch on a known data structure in order to produce a small result.

The problem

Consider one of the simplest nested data structures, inductively defined numbers:

data Nat = Zero | Succ Nat

natToSmallInt :: Nat -> Int
natToSmallInt Zero = 0
natToSmallInt (Succ Zero) = 1
natToSmallInt (Succ (Succ Zero)) = 2
natToSmallInt (Succ (Succ (Succ Zero))) = 3
natToSmallInt (Succ (Succ (Succ (Succ Zero)))) = 4
natToSmallInt (Succ (Succ (Succ (Succ (Succ Zero))))) = 5
natToSmallInt (Succ (Succ (Succ (Succ (Succ (Succ Zero)))))) = 6
natToSmallInt (Succ (Succ (Succ (Succ (Succ (Succ (Succ Zero))))))) = 7
natToSmallInt (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ Zero)))))))) = 8
natToSmallInt (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ Zero))))))))) = 9
natToSmallInt (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ Zero)))))))))) = 10
natToSmallInt _ = error "Not Small"


foo :: Int
foo = natToSmallInt (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ Zero)))))))))) -- 10 as argument

bar :: Int
bar = natToSmallInt Zero

We, as humans, can easily see that this function applied to any statically known argument will produce a small expression once we apply case of known constructor. However we don't inline natToSmallInt as the nested cases make it fairly large and we only apply a discount based on the argument once.

In a sense the issue is that GHC ignores how the function uses the components of it's argument, as well as GHC not considering the structure of the argument when computing the discount.

The end result is that we get:

-- RHS size: {terms: 12, types: 0, coercions: 0, joins: 0/0}
foo :: Int
[GblId,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
         WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 120 0}]
foo
  = natToSmallInt
      (A.Succ
         (A.Succ
            (A.Succ
               (A.Succ
                  (A.Succ (A.Succ (A.Succ (A.Succ (A.Succ (A.Succ A.Zero))))))))))

-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
bar :: Int
[GblId,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
         WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 20 0}]
bar = natToSmallInt A.Zero

Where we would have expected foo = 10, bar = 0

In practice this pattern arises (at least) with generics like described here: #21457 (comment 445884) where we have a largish function taking apart a largish generic representation to select between a number of small rhs expressions.

Option A: Improve inlining to account for nested data structures.

The only way I can (currently) think of to make this general pattern work well would be to have discounts based on the structure of arguments if they are known.

That is for natToSmallInt instead of generating this unfolding info:

-- RHS size: {terms: 71, types: 25, coercions: 4, joins: 0/0}
natToSmallInt :: Nat -> Int
[Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True, Guidance=IF_ARGS [40] 420 110}]

We generate argument guidance based on the shape of the argument. So rather than just using [40] we could generate somthing like:

[(Default:10,
  Zero:190 [], 
  Succ:40 
    [ (Default:10
    , Zero:140
    , Succ:..)]
)]

Numbers are subject to change but in general the argument guidance could be computed by somthing along the lines of:

Default: The discount we apply simply for knowing it's an evaluated argument so just a small discount.
Zero: A known constructor which is a terminal for our discrimination.
- Apply a decent discount to account for the branch being eliminated.
- Apply an additional discount based on the size difference of the rhs if we apply case-of-known-constructor using this constructor. This allows e.g. natToSmallInt Zero to inline.
Succ: Similarly to the Zero case.
- But additionally compute a discount for the constructors arguments given the alt-rhs we chose for Succ.

When we then see an application like natToSmallInt (Succ Zero) we compute the discount by:

Apply the Default discount since the argument is evaluated.
Apply the relevant constructor discount if any.
Apply the relevant discounts for the constructor arguments recursively.
Add up all these discounts as usual and make a decision.

I think this should work. But there is of course concern for impact on compile time. In particular we likely don't want to compute individual argument guidance on cases with >100 alternatives, or types with >100 arguments.

But having a (somewhat arbitrary) cutoff to achieve this seems reasonable.

Option B: Solve this by making SpecConstr apply more liberally.

SpecConstr is actually very well equipped to handle cases like these already. Although it has a number of issues which prevent it from being a direct solution.

It's not always applied e.g. top level non-recursive definitions like natToSmallInt don't get any benefit currently (see #21457).
It's not enabled by default at -O since it can destroy sharing.
It can destroy sharing.
It is by default limited to 3 specializations of a function. For natToSmallInt we would want 10 at least. For more complex types like generic instances this number increases rapidly.
SpecConstr only considers the size of the input when deciding if something is worth specializing for. Although perhaps that can/should be fixed by applying a discount in a fashion similar to what I describe above for inlining.
It's only run once while we apply inlining often and very opportunistically.
It doesn't work across modules

So imo in order for SpecConstr to even be an option for addressing this we need to:

Make it work across module boundaries.
Have it work for arbitrarily large input bindings if we can tell that they will optimize to a small rhs when specialized. (E.g apply discounts similar to described above for inlining).
Remove the limit on number of specializations. However this likely also requires us to implement some sort of heuristic/cost model that prevents GHC from generating an insurmountable number of relatively useless specializations.
Improve the issue of it destroying sharing at least enough to make it safe to enable for -O

Overall this seems less doable than improving inlining.

Edited Aug 04, 2022 by Ben Gamari

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information