Cross-module specialization of recursive function isn’t worker/wrappered
As the issue title says, cross-module specialization can defeat worker/wrappering in certain situations, and such situations probably miss out on other beneficial optimizations, too. Here’s a pair of modules that demonstrate the problem:
{-# LANGUAGE FlexibleContexts, LambdaCase #-}
module A where
import Control.Monad.State.Class
countdown :: MonadState Int m => m Int
countdown = do
n <- get
if n <= 0 then pure n
else put (n - 1) >> countdown
{-# INLINABLE countdown #-}
module B where
import Control.Monad.State.Strict
import A
program :: IO ()
program = print (runState countdown 1000)
During the compilation of B
, countdown
will be specialized at State Int
, leaving us with a tight loop that looks like this:
$scountdown :: Int -> (Int, Int)
$scountdown n = if n <= 0 then (n, n)
else $scountdown (n - 1)
We’d really like for this to be worker-wrappered so we can eliminate all the boxing/unboxing on each iteration of the loop. But if you look at the optimized core, you’ll find that never happens, and we really do all that wasteful allocation! What goes wrong?
Note [Specialising imported functions]
in GHC.Core.Opt.OccurAnal
hints at the root issue. When we specialize a (strong) loop breaker, we mark it NOINLINE
to avoid running into any infinite simplifier loops. That alone is not a death sentence—NOINLINE
functions can still be worker/wrappered—but after a few simplifier iterations, we actually end up with something like this:
Rec {
$scountdown1 :: Int -> (Int, Int)
$scountdown1 n = if n <= 0 then (n, n)
else $scountdown (n - 1)
$scountdown [InlPrag=NOUSERINLINE[~], Occ=LoopBreaker]
:: State Int Int
$scountdown = $scountdown1 `cast` <Co:1>
end Rec }
The simplifier has smuggled the entire body of $scountdown
out from under the cast into a separate binding that isn’t marked NOINLINE
, and since $scountdown
is marked NOINLINE
, it is selected as the loop breaker. This doesn’t actually do anyone any good because nobody ever calls $scountdown1
directly—not even $scountdown1
itself—they all go through $scountdown
, which will never be inlined.
Now we have a problem. We’d like to worker/wrapper $scountdown1
, but we won’t, as it’s not a loop breaker and very small (see Note [Don't w/w inline small non-loop-breaker things]
in GHC.Core.Opt.WorkWrap
). But even if we did, that wouldn’t get us anywhere, since the recursive call is to $scountdown
, which isn’t a function and therefore won’t be worker/wrappered either (see Note [Don't eta expand in w/w]
in GHC.Core.Opt.WorkWrap
).
So now we’re stuck. Who’s to blame? It isn’t immediately clear to me. There are two different places it seems plausible to point fingers:
-
It’s not very nice that the specializer is putting
NOINLINE
pragmas on functions that don’t need them. The explicitNOINLINE
pragma pins theLoopBreaker
designation on$scountdown
even after arbitrary amounts of code motion, which turns out poorly here. Without the pragma, the auxiliary$scountdown1
binding would eventually end up being selected as theLoopBreaker
, and the$scountdown
binding would be inlined everywhere. -
When
$scountdown1
is floated out from under the cast, theNOINLINE
pragma isn’t transferred to the new binding. This seems unhelpful to me. If theNOINLINE
pragma were transferred to$scountdown1
, then things would also go fine for the same reasons as in the previous point.
I don’t fully understand all the ways that things can go wrong if the NOINLINE
pragma isn’t added to the specialized binding, so I don’t feel equipped to say whether it could somehow be avoided here or not. But point 2 seems pretty safe to me (and would benefit user-written uses of NOINLINE
, too!), assuming I’m interpreting things correctly.