Boring join points should not inline
In !9104 (comment 457018) I investigated how a different optimisation path in the Simplifier caused unnecessary code bloat. Here is a standalone reproducer:
{-# LANGUAGE BangPatterns #-}
module Lib where
data T = T (Maybe Bool) (Maybe Bool) (Maybe Bool) (Maybe Bool)
m :: Maybe a -> Maybe a -> Maybe a
m (Just v1) Nothing = Just v1
m _ mb = mb
{-# INLINE m #-}
f :: T -> T -> T
f (T a1 b1 c1 d1) (T a2 b2 c2 d2)
= let j1 !a = let j2 !b = let j3 !c = let j4 !d = T a b c d
in j4 (m d1 d2)
in j3 (m c1 c2)
in j2 (m b1 b2)
in j1 (m a1 a2)
{-# OPAQUE f #-}
(The use of OPAQUE
is jsut so that we don't unbox.)
After inlining m
, this is pretty much the optimal code; specifically, it doesn't make sense to inline the join points, for the following reasons:
- No simplification to be had with the return site, because otherwise we'd have pushed the context into the join point
- No simplifiation to be had with concrete arguments, because they just end up in
T
's fields. Theseq
will be done at the call site thanks to tag inference. -
jump j4 ...
is much smaller thanT a b c d
and an unconditional direct jump, so that is also not a good reason.
Yet in HEAD, I see
f = \ ds ds1 ->
case ds of { T a1 b1 c1 d1 ->
case ds1 of { T a2 b2 c2 d2 ->
join {
$j a
= case a of a4 { __DEFAULT ->
join {
$j1 b
= case b of b4 { __DEFAULT ->
case c1 of wild2 {
Nothing ->
case c2 of c { __DEFAULT ->
case d1 of wild3 {
Nothing -> case d2 of d { __DEFAULT -> T a4 b4 c d };
Just v1 ->
case d2 of wild4 {
Nothing -> T a4 b4 c wild3;
Just ipv -> T a4 b4 c wild4
}
}
};
Just v1 ->
case c2 of wild3 {
Nothing ->
case d1 of wild4 {
Nothing -> case d2 of d { __DEFAULT -> T a4 b4 wild2 d };
Just v2 ->
case d2 of wild5 {
Nothing -> T a4 b4 wild2 wild4;
Just ipv -> T a4 b4 wild2 wild5
}
};
Just ipv ->
case d1 of wild4 {
Nothing -> case d2 of d { __DEFAULT -> T a4 b4 wild3 d };
Just v2 ->
case d2 of wild5 {
Nothing -> T a4 b4 wild3 wild4;
Just ipv1 -> T a4 b4 wild3 wild5
}
}
}
}
} } in
case b1 of wild2 {
Nothing -> jump $j1 b2;
Just v1 ->
case b2 of wild3 {
Nothing -> jump $j1 wild2;
Just ipv -> jump $j1 wild3
}
}
} } in
case a1 of wild2 {
Nothing -> jump $j a2;
Just v1 ->
case a2 of wild3 {
Nothing -> jump $j wild2;
Just ipv -> jump $j wild3
}
}
}
}
Note that we inlined j3
and j4
.
My investigation so far pointed to Note [Inline small things to avoid creating a thunk]
which causes us to postinline small join points in the final phase. But inlining join points doesn't avoid creating a thunk! So at the very least that Note should not apply to join points.
But even then, this is the output of -dinline-check \$j
:
Considering inlining: $j_szM
arg infos [ValueArg]
interesting continuation BoringCtxt
some_benefit True
is exp: True
is work-free: True
guidance ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=True)
ANSWER = YES
Considering inlining: $j_szM
arg infos [ValueArg]
interesting continuation BoringCtxt
some_benefit True
is exp: True
is work-free: True
guidance ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=True)
ANSWER = YES
Considering inlining: $j_szM
arg infos [ValueArg]
interesting continuation BoringCtxt
some_benefit True
is exp: True
is work-free: True
guidance ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=True)
ANSWER = YES
Considering inlining: $j_szN
arg infos [ValueArg]
interesting continuation BoringCtxt
some_benefit True
is exp: True
is work-free: True
guidance IF_ARGS [20] 120 30
case depth = 1
depth based penalty = 0
discounted size = 80
ANSWER = YES
Considering inlining: $j_szN
arg infos [ValueArg]
interesting continuation BoringCtxt
some_benefit True
is exp: True
is work-free: True
guidance IF_ARGS [20] 120 30
case depth = 2
depth based penalty = 0
discounted size = 80
ANSWER = YES
Considering inlining: $j_szN
arg infos [ValueArg]
interesting continuation BoringCtxt
some_benefit True
is exp: True
is work-free: True
guidance IF_ARGS [20] 120 30
case depth = 2
depth based penalty = 0
discounted size = 80
ANSWER = YES
...
So calcUnfoldingGuidance
seems to give some of these an ALWAYS_IF
guidance and boring contexts and unsaturated calls (!!) are OK, too. Note [INLINE for small functions]
seems relevant there.
It's all a bit fishy; clearly j4
is not just a function call that is as small as the call to j4
itself. There is no call to j4
to begin with, just a jump, because it's a join point. The only time when inlining a join point may decrease code size is when itself is just a jump to another join point. Currently, condInline
does not check for join points at all.
Anyway,
Still, it's wasteful to inline any of the join points.