GHC 9.4.6 fails to specialize `recip` from `Fractional` given a fully specific `SPECIALIZE` pragma
[Edit: I've made a mistake and in some of my tests (not the ones that reproduced unsoundness) I've added -fpolymorphic-specialisation
as a flag to cabal
invocation instead of --ghc-options=-fpolymorphic-specialisation'
. When I do the proper --ghc-options
, I can't find any recip
at all in the emitted Core and so this repro exhibits nothing new. I think it's ripe for closing, since it's full of mistaken claims by me (only some edited out) and since all findings can now be explained by older tickets as follows:
- This repro shows a case where
-fno-polymorphic-specialisation
leads to a non-specializedrecip
(and consequently to 25% slowdown, which is unreliable data, because the code is not specialized enough, so there's an order of magnitude of noise), which is just one more documented performance regression in addition to those mentioned in #23559. - This repro shows a case where
-fpolymorphic-specialisation
leads to unsoundness in GHC >= 9.6, which is again one more case added to what's already in #23559. - This repro shows how GHC 9.4.6 doesn't specialize
recip
as it should, despite SPECIALIZE, which is already fixed for many cases in GHC >= 9.6 (and possibly for this one, can't tell), so we should not focus on 9.4 and it might have been mentioned in some other tickets. - With GHC 9.8.1-alpha1 there's no occurrence of
recip
in the Core emitted with the correct--ghc-options
, which proves absolutely nothing, because there's no easy way to discover ifrecip
gets specialized or not in another module from which it's used transitively (#23881). ]
----- Old description below this line
Note that I can't rule out that the unsoundness messes up the specialiser, even though I doubt it. Definitely, the benchmark in question fails at runtime due to the unsoundness.
This specialize pragma
makes GHC 9.4.6 emit Core for recip
below that does not contain any mention of the IsPrimal
class. OTOH, in GHC 9.8.1-alpha1, both with and without -fpolymorphic-specialisation
I'm getting non-specialized code for this case (lvl298_s1EhY
is a (specialized) dictionary for IsPrimal
, as can be seen earlier in the Core).
of
{ D ww_alzs [Dmd=1L] ww1_alzt [Dmd=1L] ww2_alzu [Dmd=1L] ->
case HordeAd.Core.DualNumber.$w$crecip
@GHC.Num.Natural.Natural
@(AstRanked PrimalSpan)
@Double
@0
(GHC.Real.$p2RealFrac
@(AstRanked PrimalSpan Double 0)
(GHC.Float.$p1RealFloat
@(AstRanked PrimalSpan Double 0)
(HordeAd.Core.TensorADVal.$fIfFNaturalADVal_$s$fRankedTensorAstRanked22
@Double
@0
(HordeAd.Core.Engine.rev4
`cast` (Sym (HordeAd.Core.Types.N:GoodScalar[0] <Double>_N)
:: (HordeAd.Core.Types.GoodScalarConstraint
Double :: Constraint)
~R# (GoodScalar Double :: Constraint)))
$dKnownNat_alqv
(ghc-prim:GHC.Classes.$p0(%,%)
@(RealFloat Double)
@(RealFloat (Data.Vector.Storable.Vector Double))
$d(%,%)_alzh)
(ghc-prim:GHC.Classes.$p1(%,%)
@(RealFloat Double)
@(RealFloat (Data.Vector.Storable.Vector Double))
$d(%,%)_alzh))))
lvl298_s1EhY
ww_alzs
ww1_alzt
ww2_alzu
of
Repro: check out commit https://github.com/Mikolaj/horde-ad/commit/5393574a36292d133aa92ee2b11a223785a72ad9, uncomment -ddump-stranal
in file bench/common/BenchProdTools.hs, run
cabal bench longProdBench --enable-optimization --benchmark-options='-n0 -m prefix "1e7/rev"' -fpolymorphic-specialisation >& log
observe the offending Core (two copies, because -ddump-stranal
prints Core twice)
grep -50 HordeAd.Core.DualNumber.\$w\$crecip log