## SpecConstr rule base woes

Consider the following stream fusion approach harnessing the type class specialiser (and also constructor-pattern specialisation for subsequently specialising `loop`

):

```
{-# LANGUAGE TypeFamilies, FlexibleContexts, BangPatterns #-}
data Step iter
= Yield !iter !(Item iter)
| Skip !iter
| Done
class Iterator iter where
type Item iter
next :: iter -> Step iter
data Singleton a = Empty' | Singleton !a
singletonS :: a -> Singleton a
singletonS = Singleton
instance Iterator (Singleton a) where
type Item (Singleton a) = a
next Empty' = Done
next (Singleton a) = Yield Empty' a
data EnumFromTo a = EnumFromTo !a !a
enumFromToS :: a -> a -> EnumFromTo a
enumFromToS = EnumFromTo
instance (Ord a, Num a) => Iterator (EnumFromTo a) where
type Item (EnumFromTo a) = a
next (EnumFromTo i high)
| i > high = Done
| otherwise = Yield (EnumFromTo (i + 1) high) i
sumS :: (Num (Item iter), Iterator iter) => iter -> Item iter
sumS = loop 0
where
loop !total iter1 =
case next iter1 of
Done -> total
Skip iter2 -> loop total iter2
Yield iter2 x -> loop (total + x) iter2
-- {-# NOINLINE[99] loop #-} -- comment in for 10x speedup
{-# INLINE sumS #-}
data ConcatMap i o = ConcatMap !(Maybe o) !(Item i -> o) !i
concatMapS :: (Item i -> o) -> i -> ConcatMap i o
concatMapS = ConcatMap Nothing
instance (Iterator i, Iterator o) => Iterator (ConcatMap i o) where
type Item (ConcatMap i o) = Item o
next (ConcatMap Nothing f i) = case next i of
Skip i' -> Skip (ConcatMap Nothing f i')
Yield i' x -> Skip (ConcatMap (Just (f x)) f i')
Done -> Done
next (ConcatMap (Just o) f i) = case next o of
Skip o' -> Skip (ConcatMap (Just o') f i)
Yield o' x -> Yield (ConcatMap (Just o') f i) x
Done -> Skip (ConcatMap Nothing f i)
ex1 :: Int -> Int
ex1
= sumS
. concatMapS (\a -> singletonS (a*a))
. enumFromToS 1
{-# NOINLINE ex1 #-}
main = print (ex1 99999999)
```

If the `{-# NOINLINE[99] loop #-}`

pragma is commented in, `ex1`

will compile down to this perfect bit of code:

```
Main.main_$s$wloop
= \ (sc_s4GJ :: GHC.Prim.Int#)
(sc1_s4GI :: GHC.Prim.Int#)
(sc2_s4GH :: GHC.Prim.Int#) ->
case GHC.Prim.># sc1_s4GI sc_s4GJ of {
__DEFAULT ->
Main.main_$s$wloop
sc_s4GJ
(GHC.Prim.+# sc1_s4GI 1#)
(GHC.Prim.+# sc2_s4GH (GHC.Prim.*# sc1_s4GI sc1_s4GI));
1# -> sc2_s4GH
}
```

Beautiful!

Without the `NOINLINE`

pragma, not so much, though. Here's the Core:

```
Main.main_$s$wloop [Occ=LoopBreaker]
:: GHC.Prim.Int#
-> GHC.Prim.Int#
-> (Item (EnumFromTo Int) -> Singleton Int)
-> Singleton Int
-> GHC.Prim.Int#
-> GHC.Prim.Int#
[GblId,
Arity=5,
Caf=NoCafRefs,
Str=<S,U><S,U><L,C(U)><S,1*U><S,U>,
Unf=OtherCon []]
Main.main_$s$wloop
= \ (sc_s4FQ :: GHC.Prim.Int#)
(sc1_s4FP :: GHC.Prim.Int#)
(sc2_s4FO :: Item (EnumFromTo Int) -> Singleton Int)
(sc3_s4FN :: Singleton Int)
(sc4_s4FM :: GHC.Prim.Int#) ->
case sc3_s4FN of {
Empty' ->
case GHC.Prim.># sc1_s4FP sc_s4FQ of {
__DEFAULT ->
Main.main_$s$wloop
sc_s4FQ
(GHC.Prim.+# sc1_s4FP 1#)
sc2_s4FO
(sc2_s4FO
((GHC.Types.I# sc1_s4FP)
`cast` (Sub (Sym (Main.D:R:ItemEnumFromTo[0] <Int>_N))
:: Int ~R# Item (EnumFromTo Int))))
sc4_s4FM;
1# -> sc4_s4FM
};
Singleton a_a1z4 ->
case a_a1z4 of { GHC.Types.I# y_s4Dg ->
Main.main_$s$wloop
sc_s4FQ
sc1_s4FP
sc2_s4FO
(Main.Empty' @ Int)
(GHC.Prim.+# sc4_s4FM y_s4Dg)
}
}
```

Long story short, constructor specialisation failed to specialise `$wloop`

for some higher-order arguments (the function of type `Item (EnumFromTo Int) -> Singleton Int`

in particular). That results in a slow-down of a factor of 10 and it goes from constant space to allocating huge amounts of memory (still constant residency, though).

# Why?

**Edit: Don't bother reading this section yet, it's incoherent and inaccurate. Suffice it to say, the conclusion is accurate: Even though we specialise for the stronger call pattern, we don't rewrite to it because the rewrite rule is not attached to the strictly weaker specialisation.**

Looking at the difference in `-dverbose-core2core`

, the first real divergence is in the output from desugaring after optimisation, where the inner `loop`

is specialised for the type arguments and dictionaries of the outer `sumS`

function in the version without `NOINLINE`

.

Then after the initial simplifier phase we get the first digression because `loop`

was specialised for the particular types it was called at, whereas it wasn't in the `NOINLINE`

version (which makes sense).

Now, the critical moment is when `loop`

becomes too large to inline into `ex1`

in the version *without* `NOINLINE`

. But `ex1`

calls `loop`

with quite interesting arguments, those we want to specialise for (the particular `ConcatMap`

datum in particular)! In code:

```
f :: Int -> Singleton Int -- same function for both
f ...
inner :: Int -> EnumFromToInt Int -- the input stream from which ConcatMap draws elements
inner ...
-- without NOINLINE
loop (ConcatMap x f inner) = HUGE and recursive
ex1 n = loop (ConcatMap Nothing f (inner n))
-- with NOINLINE[99], loop gets inlined after the initial phase
ex1 n = joinrec loop' (ConcatMap x f (inner n)) = <HUGE and recursive> in loop' (ConcatMap Nothing f (inner n))
```

Then WW kicks in. `loop`

(and `loop'`

) is deeply strict in the `ConcatMap`

we pass it, but it refuses to 'unpack' (i.e. specialise) for `ConcatMap`

's `f`

. Note also that it will WW `loop`

, `ex1`

and `loop'`

:

```
-- without NOINLINE
$wloop x f <unpacked components from inner> = HUGE and recursive
loop (ConcatMap x f (.. unpack components from inner ...)) = $wloop x f <unpacked components from inner>
ex1 n = $wloop (ConcatMap Nothing f (inner n))
-- with NOINLINE
$wex1 n = joinrec $wloop' (ConcatMap x f inner) = <HUGE and recursive> in $wloop' (ConcatMap Nothing f (inner n))
ex1 (I# n) = $wex1 n
```

Note how in the NOINLINE case this separated the definition of `f`

from its usage by a lambda binder, so the subsequent SpecConstr pass is unable to figure out what to specialise for. It can't look through `$fw`

. I think this is the same or at least a similar effect that @nomeata experienced in #14068 and accounted for in #14844.

Or at least that's my working hypothesis, I'll edit this post accordingly tomorrow when I had time to investigate.