Improve inlining heuristics with case-of-known-constructor
Consider the following code:
module B where
data T = T Integer Integer Integer Integer Integer
| Nought
deriving (Eq,Ord)
module A where
import B
foo :: T -> Bool
foo x = Nought < x
We get the following unfolding for T
's (<)
:
-- RHS size: {terms: 54, types: 19, coercions: 0, joins: 0/0}
B.$fOrdT_$c< :: T -> T -> Bool
[GblId,
Arity=2,
Str=<1L><1L>,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [40 50] 320 100}]
B.$fOrdT_$c<
= \ (a_aK1 :: T) (b_aK2 :: T) ->
case a_aK1 of {
T a1_aK3 a2_aK4 a3_aK5 a4_aK6 a5_aK7 ->
case b_aK2 of {
T b1_aK8 b2_aK9 b3_aKa b4_aKb b5_aKc ->
case GHC.Num.Integer.integerCompare a1_aK3 b1_aK8 of {
LT -> GHC.Types.True;
EQ ->
case GHC.Num.Integer.integerCompare a2_aK4 b2_aK9 of {
LT -> GHC.Types.True;
EQ ->
case GHC.Num.Integer.integerCompare a3_aK5 b3_aKa of {
LT -> GHC.Types.True;
EQ ->
case GHC.Num.Integer.integerCompare a4_aK6 b4_aKb of {
LT -> GHC.Types.True;
EQ -> GHC.Num.Integer.integerLt a5_aK7 b5_aKc;
GT -> GHC.Types.False
};
GT -> GHC.Types.False
};
GT -> GHC.Types.False
};
GT -> GHC.Types.False
};
Nought -> GHC.Types.True
};
Nought -> case b_aK2 of { __DEFAULT -> GHC.Types.False }
}
The inlining guidance is IF_ARGS [40 50] 320 100
which means that the unfolding size is 320 and that if the first provided argument at a call site is a value (as it is in Nought < x
) then a discount of 40 is applied (similarly a discount of 50 for the second argument but we don't care in this example).
Using -ddump-verbose-inlinings
when compiling A
we get:
Considering inlining: $fOrdT_$c<
arg infos [ValueArg, TrivArg]
interesting continuation BoringCtxt
some_benefit True
is exp: True
is work-free: True
guidance IF_ARGS [40 50] 320 100
case depth = 0
depth based penalty = 0
discounted size = 250
ANSWER = NO -- won't be inlined
Discounted size is computed as follows (see GHC.Core.Unfold.computeDiscount
):
250 =
320 -- unfolding size
- 10 -- because the result would replace a function call
- 10*2 -- because 2 arguments are provided
- 40 -- discount because the first argument is value
The discounted unfolding size is greater than the inlining use threshold (grep unfoldingUseThreshold
) which defaults to 90 hence the unfolding isn't inlined.
However, if we force the inlining, case-of-known-constructor applies and the resulting expression for foo
is very small:
foo
= \ (x_a11L :: T) ->
case x_a11L of { __DEFAULT -> GHC.Types.False }
The issue is that the discount per value argument doesn't take into account which value (which constructor/literal) is applied. While it would be beneficial to inline in foo
above, it wouldn't be in bar
below:
bar :: T -> Bool
bar x = T 1 2 3 4 5 < x
One way to allow this is to compute a more precise unfolding size at every call site:
- when the existing heuristics failed to inline
- when we know the constructor or literal for some of the arguments
- when these arguments are cased on in the unfolding
- 1 is trivial
- 2 is easy because we already detect arguments that are values (cf
ValueArg
constructor inArgSummary
) so we just have to add a field toValueArg
indicating the DataCon/Literal (if any) of the argument - 3 can easily be inferred from the existing discounts: if an argument discount isn't 0, it is used as a scrutinee in a case-expresssion in the unfolding.
The more precise unfolding size is then computed as usual except that the sizes of case alternatives that are sure not to be taken (because of the specific value arguments that are passed) and that will be removed by case-of-known-constructor if we inline aren't added to the total.
!6732 (closed) implements this and the inlining decision for foo
becomes:
Considering inlining:
id: $fOrdT_$c<
arg infos: [ValueArg Nought, TrivArg] -- ValueArg now contains the arg DataCon
interesting continuation: BoringCtxt
some_benefit: True
is exp: True
is work-free: True
guidance: IF_ARGS [40 50] 320 100
case depth = 0
orig depth based penalty = 0
orig discounted size = 250
final depth based penalty = 0
final discounted size = -40
final unfolding size = 10 -- more precise unfolding size
ANSWER = YES
while the inlining decision for bar
is:
Considering inlining:
id: $fOrdT_$c<
arg infos: [ValueArg T, TrivArg]
interesting continuation: BoringCtxt
some_benefit: True
is exp: True
is work-free: True
guidance: IF_ARGS [40 50] 320 100
case depth = 0
orig depth based penalty = 0
orig discounted size = 250
final depth based penalty = 0
final discounted size = 240
final unfolding size = 290
ANSWER = NO
A drawback of doing this is that we compute the more precise size of unfoldings more often. However:
- we are only doing it if other heuristics failed and under certain conditions (see above)
- unfoldings are usually quite small (bounded by the unfolding creation threshold in the defining module)
- we could easily provide a flag to disable this heuristics per-module (not yet done in !6732 (closed))
This was found in !6696 (closed) which makes Integer's (<)
unfolding bigger, leading to a bigger (<)
unfolding for a datatype similar to T
above, leading to a change in the inlining decision in a expression similar to foo
's rhs.