SpecConstr regression in NoFib's `spectral/ansi`

With a recent master GHC, I observe a perf regression in NoFib's spectral/ansi the presence of -fspec-constr.

$ _build/stage1/bin/ghc -O Main.hs
$ ./Main 400 +RTS -t < ansi.stdout > /dev/null
<<ghc: 24321415416 bytes, 5880 GCs, 258273/364656 avg/max bytes residency (28 samples), 7M in use, 0.000 INIT (0.000 elapsed), 2.297 MUT (2.290 elapsed), 0.167 GC (0.171 elapsed) :ghc>>
$ _build/stage1/bin/ghc -O -fspec-constr Main.hs
$ ./Main 400 +RTS -t < ansi.stdout > /dev/null
<<ghc: 32211087200 bytes, 6471 GCs, 253717/401024 avg/max bytes residency (30 samples), 8M in use, 0.000 INIT (0.000 elapsed), 2.836 MUT (2.827 elapsed), 0.211 GC (0.215 elapsed) :ghc>>

(An aside: Perhaps me piping ansi.stdout is an incorrect run of the benchmark, but we shouldn't regress either way.)

Note that the second run allocates 33% more. This is due to SpecConstr introducing reboxing.

Having a hunch, I reverted !11689 (closed) and got the following results:

$ _build/stage1/bin/ghc -O -fspec-constr Main.hs
$ ./Main 400 +RTS -t < ansi.stdout > /dev/null
<<ghc: 22238093736 bytes, 5407 GCs, 258627/350776 avg/max bytes residency (22 samples), 7M in use, 0.000 INIT (0.000 elapsed), 2.127 MUT (2.120 elapsed), 0.125 GC (0.129 elapsed) :ghc>>

So that improved; hence !11689 (closed) is introducing a 40% regression for spectral/ansi. Perhaps we should re-evaluate that patch or find a way that it does not regress.

I began to diagnose.

Here's a diff of the specialisations we do according to -ddump-spec-constr, with !11689 (closed) reverted (e.g., OLD) first:

loop [Occ=LoopBreaker] :: Int -> [Char] -> Interact
[LclId,
 Arity=2,
 Str=<L><L>,
 Unf=Unf{Src=<vanilla>, TopLvl=True,
         Value=True, ConLike=True, WorkFree=True, Expandable=True,
         Guidance=IF_ARGS [60 30] 585 60},
 RULES: "SC:loop0"
            forall (sc :: GHC.Prim.Int#).
              loop (GHC.Types.I# sc) (GHC.Types.[] @Char)
              = $sloop sc
        "SC:loop1"
            forall (sc :: Char) (sc :: [Char]) (sc :: Int).
              loop sc (GHC.Types.: @Char sc sc)
              = $sloop sc sc sc
        "SC:loop2"
            forall (sc :: GHC.Prim.Int#). loop (GHC.Types.I# sc) = $sloop sc]

and now with !11689 (closed) (e.g., NEW):

loop [Occ=LoopBreaker] :: Int -> [Char] -> Interact
[LclId,
 Arity=2,
 Str=<L><L>,
 Unf=Unf{Src=<vanilla>, TopLvl=True,
         Value=True, ConLike=True, WorkFree=True, Expandable=True,
         Guidance=IF_ARGS [60 30] 585 60},
 RULES: "SC:loop0"
            forall (sc :: GHC.Prim.Int#). loop (GHC.Types.I# sc) = $sloop sc
        "SC:loop1"
            forall (sc :: Char) (sc :: [Char]) (sc :: Int).
              loop sc (GHC.Types.: @Char sc sc)
              = $sloop sc sc sc]

Apparently, we lose the specialisation

 RULES: "SC:loop0"
            forall (sc :: GHC.Prim.Int#).
              loop (GHC.Types.I# sc) (GHC.Types.[] @Char)
              = $sloop sc

Which IMO is an instance of (OLD)

        "SC:loop2"
            forall (sc :: GHC.Prim.Int#). loop (GHC.Types.I# sc) = $sloop sc]

With !11689 (closed), we never generate the first one, only the second one. But the $sloop of the second one needs to rebox its I# sc, resulting in the huge regression, whereas the $sloop for the [] specialisation does not.

Although I'm tempted to accept this regression because it is ultimately a result of a lack of awareness of reboxing in SpecConstr, I wonder why we so easily discard the specialisation for [].

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information