Wrappers are inlined before cross-module specializations have a chance to fire

The worker/wrapper transformation takes some care to avoid generating wrappers that activate too early to avoid preventing RULES—and especially specializations—from firing, as described in Note [Wrapper activation] in GHC.Core.Opt.WorkWrap. Unfortunately, I’ve discovered that the current strategy usually fails to prevent this from happening. Here’s a pair of modules that reproduce the issue:

module A where

f :: Num b => b -> (b, b) -- note: recursive to prevent inlining
f x = (x + 1, snd (f x))  -- on such a small example
{-# SPECIALIZE f :: Int -> (Int, Int) #-}

module B (g') where
import A

g :: Num a => a -> a
g x = fst (f x)
{-# NOINLINE[99] g #-}

g' :: Int -> Int
g' = g

When compiling module B, I expect GHC to generate a specialization of g @Int, and I expect that specialization to use the specialized version of f. Unfortunately, this does not happen! Running ghc --make -c -O -ddump-simpl B.hs yields the following Core:

g' :: Int -> Int
g' = \ (x_aLf :: Int) ->
      case $wf @Int $fNumInt x_aLf of { (# ww_iMp, ww1_iMq #) -> ww_iMp }

This is a disastrous outcome: we’re passing a dictionary to $wf even though there’s a perfectly good f_$sf :: Int -> (Int, Int) specialization available.

The cause of the bug

What goes wrong? The issue arises from the following set of interactions:

When a binding is worker/wrappered, the wrapper is explicitly given an unfolding that activates in the earliest phase after any existing RULEs. If there are no RULEs, or if all RULEs are always active, the wrapper activates in phase 2.
By default, specializations are always active, even in the InitialPhase. This means f’s wrapper activates in phase 2, since the worker/wrapper logic assumes the RULE will have already fired in the InitialPhase.
But there’s a problem: the specializer runs after the InitialPhase simplifier, so in the above program, there’s no chance for the SPEC f RULE to fire (since the specialization of g hasn’t been generated yet). The first phase in which the RULE has a chance to fire is phase 2, but at that point, the wrapper’s unfolding is already active.

In other words, the worker/wrapper transformation assumes that nothing will happen between the initial phase and phase 2 that might cause more RULEs to fire, but this is wrong, and it causes this bug.

What to do about it

There are a few potential solutions to this issue, though none of them seem obviously right to me:

Delay wrapper activations to phase 1 at the earliest if there are any RULEs defined on the binding at all. This would avoid the problem by simply being more conservative, but it might lead to other missed optimizations due to the wrappers not being inlined as early.
Run the simplifier again (at the initial phase), after the specializer but before FloatOut. This would allow specialization rules to fire after additional opportunities have been revealed by the specializer, but it could make compilation times slower.
Somehow make the simplifier smarter so that the specialization rule can fire even though the wrapper is active. I have no idea what this would actually look like, but it seems to be what #20364 is about, so I am including it here for completeness.

Note that I have not included “do nothing” as an option, as I think the status quo is really quite unacceptably bad: cross-module specialization is currently so fragile that it’s essentially impossible to use properly. So I think something must be done, I’m just not sure what, at least not without further investigation.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information