Skip to content

Cross-module specialization of recursive function isn’t worker/wrappered

As the issue title says, cross-module specialization can defeat worker/wrappering in certain situations, and such situations probably miss out on other beneficial optimizations, too. Here’s a pair of modules that demonstrate the problem:

{-# LANGUAGE FlexibleContexts, LambdaCase #-}
module A where
import Control.Monad.State.Class

countdown :: MonadState Int m => m Int
countdown = do
  n <- get
  if n <= 0 then pure n
            else put (n - 1) >> countdown
{-# INLINABLE countdown #-}
module B where
import Control.Monad.State.Strict
import A

program :: IO ()
program = print (runState countdown 1000)

During the compilation of B, countdown will be specialized at State Int, leaving us with a tight loop that looks like this:

$scountdown :: Int -> (Int, Int)
$scountdown n = if n <= 0 then (n, n)
                          else $scountdown (n - 1)

We’d really like for this to be worker-wrappered so we can eliminate all the boxing/unboxing on each iteration of the loop. But if you look at the optimized core, you’ll find that never happens, and we really do all that wasteful allocation! What goes wrong?

Note [Specialising imported functions] in GHC.Core.Opt.OccurAnal hints at the root issue. When we specialize a (strong) loop breaker, we mark it NOINLINE to avoid running into any infinite simplifier loops. That alone is not a death sentence—NOINLINE functions can still be worker/wrappered—but after a few simplifier iterations, we actually end up with something like this:

Rec {
$scountdown1 :: Int -> (Int, Int)
$scountdown1 n = if n <= 0 then (n, n)
                           else $scountdown (n - 1)

$scountdown [InlPrag=NOUSERINLINE[~], Occ=LoopBreaker]
  :: State Int Int
$scountdown = $scountdown1 `cast` <Co:1>
end Rec }

The simplifier has smuggled the entire body of $scountdown out from under the cast into a separate binding that isn’t marked NOINLINE, and since $scountdown is marked NOINLINE, it is selected as the loop breaker. This doesn’t actually do anyone any good because nobody ever calls $scountdown1 directly—not even $scountdown1 itself—they all go through $scountdown, which will never be inlined.

Now we have a problem. We’d like to worker/wrapper $scountdown1, but we won’t, as it’s not a loop breaker and very small (see Note [Don't w/w inline small non-loop-breaker things] in GHC.Core.Opt.WorkWrap). But even if we did, that wouldn’t get us anywhere, since the recursive call is to $scountdown, which isn’t a function and therefore won’t be worker/wrappered either (see Note [Don't eta expand in w/w] in GHC.Core.Opt.WorkWrap).

So now we’re stuck. Who’s to blame? It isn’t immediately clear to me. There are two different places it seems plausible to point fingers:

  1. It’s not very nice that the specializer is putting NOINLINE pragmas on functions that don’t need them. The explicit NOINLINE pragma pins the LoopBreaker designation on $scountdown even after arbitrary amounts of code motion, which turns out poorly here. Without the pragma, the auxiliary $scountdown1 binding would eventually end up being selected as the LoopBreaker, and the $scountdown binding would be inlined everywhere.

  2. When $scountdown1 is floated out from under the cast, the NOINLINE pragma isn’t transferred to the new binding. This seems unhelpful to me. If the NOINLINE pragma were transferred to $scountdown1, then things would also go fine for the same reasons as in the previous point.

I don’t fully understand all the ways that things can go wrong if the NOINLINE pragma isn’t added to the specialized binding, so I don’t feel equipped to say whether it could somehow be avoided here or not. But point 2 seems pretty safe to me (and would benefit user-written uses of NOINLINE, too!), assuming I’m interpreting things correctly.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information