Improve inlining heuristics with case-of-known-constructor

Consider the following code:

module B where

data T = T Integer Integer Integer Integer Integer
       | Nought
       deriving (Eq,Ord)

module A where
import B

foo :: T -> Bool
foo x = Nought < x

We get the following unfolding for T's (<):

-- RHS size: {terms: 54, types: 19, coercions: 0, joins: 0/0}
B.$fOrdT_$c< :: T -> T -> Bool
[GblId,
 Arity=2,
 Str=<1L><1L>,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True, Guidance=IF_ARGS [40 50] 320 100}]
B.$fOrdT_$c<
  = \ (a_aK1 :: T) (b_aK2 :: T) ->
      case a_aK1 of {
        T a1_aK3 a2_aK4 a3_aK5 a4_aK6 a5_aK7 ->
          case b_aK2 of {
            T b1_aK8 b2_aK9 b3_aKa b4_aKb b5_aKc ->
              case GHC.Num.Integer.integerCompare a1_aK3 b1_aK8 of {
                LT -> GHC.Types.True;
                EQ ->
                  case GHC.Num.Integer.integerCompare a2_aK4 b2_aK9 of {
                    LT -> GHC.Types.True;
                    EQ ->
                      case GHC.Num.Integer.integerCompare a3_aK5 b3_aKa of {
                        LT -> GHC.Types.True;
                        EQ ->
                          case GHC.Num.Integer.integerCompare a4_aK6 b4_aKb of {
                            LT -> GHC.Types.True;
                            EQ -> GHC.Num.Integer.integerLt a5_aK7 b5_aKc;
                            GT -> GHC.Types.False
                          };
                        GT -> GHC.Types.False
                      };
                    GT -> GHC.Types.False
                  };
                GT -> GHC.Types.False
              };
            Nought -> GHC.Types.True
          };
        Nought -> case b_aK2 of { __DEFAULT -> GHC.Types.False }
      }

The inlining guidance is IF_ARGS [40 50] 320 100 which means that the unfolding size is 320 and that if the first provided argument at a call site is a value (as it is in Nought < x) then a discount of 40 is applied (similarly a discount of 50 for the second argument but we don't care in this example).

Using -ddump-verbose-inlinings when compiling A we get:

Considering inlining: $fOrdT_$c<
  arg infos [ValueArg, TrivArg]
  interesting continuation BoringCtxt
  some_benefit True
  is exp: True
  is work-free: True
  guidance IF_ARGS [40 50] 320 100
  case depth = 0
  depth based penalty = 0
  discounted size = 250
  ANSWER = NO -- won't be inlined

Discounted size is computed as follows (see GHC.Core.Unfold.computeDiscount):

250 =
  320    -- unfolding size
  - 10   -- because the result would replace a function call
  - 10*2 -- because 2 arguments are provided
  - 40   -- discount because the first argument is value

The discounted unfolding size is greater than the inlining use threshold (grep unfoldingUseThreshold) which defaults to 90 hence the unfolding isn't inlined.

However, if we force the inlining, case-of-known-constructor applies and the resulting expression for foo is very small:

foo
  = \ (x_a11L :: T) ->
      case x_a11L of { __DEFAULT -> GHC.Types.False }

The issue is that the discount per value argument doesn't take into account which value (which constructor/literal) is applied. While it would be beneficial to inline in foo above, it wouldn't be in bar below:

bar :: T -> Bool
bar x = T 1 2 3 4 5 < x

One way to allow this is to compute a more precise unfolding size at every call site:

when the existing heuristics failed to inline
when we know the constructor or literal for some of the arguments
when these arguments are cased on in the unfolding

1 is trivial
2 is easy because we already detect arguments that are values (cf ValueArg constructor in ArgSummary) so we just have to add a field to ValueArg indicating the DataCon/Literal (if any) of the argument
3 can easily be inferred from the existing discounts: if an argument discount isn't 0, it is used as a scrutinee in a case-expresssion in the unfolding.

The more precise unfolding size is then computed as usual except that the sizes of case alternatives that are sure not to be taken (because of the specific value arguments that are passed) and that will be removed by case-of-known-constructor if we inline aren't added to the total.

!6732 (closed) implements this and the inlining decision for foo becomes:

Considering inlining:
  id: $fOrdT_$c<
  arg infos: [ValueArg Nought, TrivArg] -- ValueArg now contains the arg DataCon
  interesting continuation: BoringCtxt
  some_benefit: True
  is exp: True
  is work-free: True
  guidance: IF_ARGS [40 50] 320 100
  case depth = 0
  orig depth based penalty = 0
  orig discounted size = 250
  final depth based penalty = 0
  final discounted size = -40
  final unfolding size = 10    -- more precise unfolding size
  ANSWER = YES

while the inlining decision for bar is:

Considering inlining:
  id: $fOrdT_$c<
  arg infos: [ValueArg T, TrivArg]
  interesting continuation: BoringCtxt
  some_benefit: True
  is exp: True
  is work-free: True
  guidance: IF_ARGS [40 50] 320 100
  case depth = 0
  orig depth based penalty = 0
  orig discounted size = 250
  final depth based penalty = 0
  final discounted size = 240
  final unfolding size = 290
  ANSWER = NO

A drawback of doing this is that we compute the more precise size of unfoldings more often. However:

we are only doing it if other heuristics failed and under certain conditions (see above)
unfoldings are usually quite small (bounded by the unfolding creation threshold in the defining module)
we could easily provide a flag to disable this heuristics per-module (not yet done in !6732 (closed))

This was found in !6696 (closed) which makes Integer's (<) unfolding bigger, leading to a bigger (<) unfolding for a datatype similar to T above, leading to a change in the inlining decision in a expression similar to foo's rhs.

Edited Oct 15, 2021 by Sylvain Henry

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information