Skip to content

Boring join points should not inline

In !9104 (comment 457018) I investigated how a different optimisation path in the Simplifier caused unnecessary code bloat. Here is a standalone reproducer:

{-# LANGUAGE BangPatterns #-}

module Lib where

data T = T (Maybe Bool) (Maybe Bool) (Maybe Bool) (Maybe Bool)


m :: Maybe a -> Maybe a -> Maybe a
m (Just v1) Nothing = Just v1
m _         mb      = mb
{-# INLINE m #-}

f :: T -> T -> T
f (T a1 b1 c1 d1) (T a2 b2 c2 d2)
  = let j1 !a = let j2 !b = let j3 !c = let j4 !d = T a b c d
                                        in j4 (m d1 d2)
                            in j3 (m c1 c2)
                in j2 (m b1 b2)
    in j1 (m a1 a2)
{-# OPAQUE f #-}

(The use of OPAQUE is jsut so that we don't unbox.)

After inlining m, this is pretty much the optimal code; specifically, it doesn't make sense to inline the join points, for the following reasons:

  • No simplification to be had with the return site, because otherwise we'd have pushed the context into the join point
  • No simplifiation to be had with concrete arguments, because they just end up in T's fields. The seq will be done at the call site thanks to tag inference.
  • jump j4 ... is much smaller than T a b c d and an unconditional direct jump, so that is also not a good reason.

Yet in HEAD, I see

f = \ ds ds1 ->
      case ds of { T a1 b1 c1 d1 ->
      case ds1 of { T a2 b2 c2 d2 ->
      join {
        $j a
          = case a of a4 { __DEFAULT ->
            join {
              $j1 b
                = case b of b4 { __DEFAULT ->
                  case c1 of wild2 {
                    Nothing ->
                      case c2 of c { __DEFAULT ->
                      case d1 of wild3 {
                        Nothing -> case d2 of d { __DEFAULT -> T a4 b4 c d };
                        Just v1 ->
                          case d2 of wild4 {
                            Nothing -> T a4 b4 c wild3;
                            Just ipv -> T a4 b4 c wild4
                          }
                      }
                      };
                    Just v1 ->
                      case c2 of wild3 {
                        Nothing ->
                          case d1 of wild4 {
                            Nothing -> case d2 of d { __DEFAULT -> T a4 b4 wild2 d };
                            Just v2 ->
                              case d2 of wild5 {
                                Nothing -> T a4 b4 wild2 wild4;
                                Just ipv -> T a4 b4 wild2 wild5
                              }
                          };
                        Just ipv ->
                          case d1 of wild4 {
                            Nothing -> case d2 of d { __DEFAULT -> T a4 b4 wild3 d };
                            Just v2 ->
                              case d2 of wild5 {
                                Nothing -> T a4 b4 wild3 wild4;
                                Just ipv1 -> T a4 b4 wild3 wild5
                              }
                          }
                      }
                  }
                  } } in
            case b1 of wild2 {
              Nothing -> jump $j1 b2;
              Just v1 ->
                case b2 of wild3 {
                  Nothing -> jump $j1 wild2;
                  Just ipv -> jump $j1 wild3
                }
            }
            } } in
      case a1 of wild2 {
        Nothing -> jump $j a2;
        Just v1 ->
          case a2 of wild3 {
            Nothing -> jump $j wild2;
            Just ipv -> jump $j wild3
          }
      }
      }
      }

Note that we inlined j3 and j4.

My investigation so far pointed to Note [Inline small things to avoid creating a thunk] which causes us to postinline small join points in the final phase. But inlining join points doesn't avoid creating a thunk! So at the very least that Note should not apply to join points.

But even then, this is the output of -dinline-check \$j:

Considering inlining: $j_szM
  arg infos [ValueArg]
  interesting continuation BoringCtxt
  some_benefit True
  is exp: True
  is work-free: True
  guidance ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=True)
  ANSWER = YES
Considering inlining: $j_szM
  arg infos [ValueArg]
  interesting continuation BoringCtxt
  some_benefit True
  is exp: True
  is work-free: True
  guidance ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=True)
  ANSWER = YES
Considering inlining: $j_szM
  arg infos [ValueArg]
  interesting continuation BoringCtxt
  some_benefit True
  is exp: True
  is work-free: True
  guidance ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=True)
  ANSWER = YES
Considering inlining: $j_szN
  arg infos [ValueArg]
  interesting continuation BoringCtxt
  some_benefit True
  is exp: True
  is work-free: True
  guidance IF_ARGS [20] 120 30
  case depth = 1
  depth based penalty = 0
  discounted size = 80
  ANSWER = YES
Considering inlining: $j_szN
  arg infos [ValueArg]
  interesting continuation BoringCtxt
  some_benefit True
  is exp: True
  is work-free: True
  guidance IF_ARGS [20] 120 30
  case depth = 2
  depth based penalty = 0
  discounted size = 80
  ANSWER = YES
Considering inlining: $j_szN
  arg infos [ValueArg]
  interesting continuation BoringCtxt
  some_benefit True
  is exp: True
  is work-free: True
  guidance IF_ARGS [20] 120 30
  case depth = 2
  depth based penalty = 0
  discounted size = 80
  ANSWER = YES
...

So calcUnfoldingGuidance seems to give some of these an ALWAYS_IF guidance and boring contexts and unsaturated calls (!!) are OK, too. Note [INLINE for small functions] seems relevant there.

It's all a bit fishy; clearly j4 is not just a function call that is as small as the call to j4 itself. There is no call to j4 to begin with, just a jump, because it's a join point. The only time when inlining a join point may decrease code size is when itself is just a jump to another join point. Currently, condInline does not check for join points at all.

Anyway,

Still, it's wasteful to inline any of the join points.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information