Specialiser creates bogus specialisations
See also
- #21229 (closed)
- #23445 (closed)
- #23469
- #23866
- #23923 (closed) <--------- Apparently this is the one to fix first
- #24606
- #24765 (closed) (lack of polymorphic specialisation)
- #20897
- #25804
-
https://github.com/haskell/core-libraries-committee/issues/3 proposal to make
Eq
a unary class.
Original report
I'd like to ask for help in what is most probably a case of a bad port (by yours truly) of a typing plugin (GHC.TypeLits.Normalise) that fails in GHC 9.6.1, but worked fine in GHC 9.4.4 and 9.2.7 (and works fine in 9.6.1 with -O0). The typing breakage is described below. The bad port to GHC 9.6 is in https://github.com/clash-lang/ghc-tcplugins-extra/pull/24 and https://github.com/clash-lang/ghc-typelits-natnormalise/pull/72. I tried to follow the migration guide, but I didn't understand much and just randomly wibbled until it type-checked.
To reproduce, clone https://github.com/Mikolaj/horde-ad/tree/repro-rank-mismatch and run (the linters are included only to show they don't produce any info at all for the part of the codebase that is being compiled)
cabal test simplifiedOnlyTest --test-options='-p 3concatBuild' --allow-newer --enable-optimization --ghc-options='-dcore-lint -dstg-lint -dcmm-lint -dtag-inference-checks'
Normally it should pass fine, but instead you would be getting the following error
Exception: fromVector: rank mismatch (1,32)
where the value 32 comes from type variable n
, from a different test than the one being run. If you change number 33 in this other test, you get a different error message. If you remove the other test, the error disappears. With -O0 and with older GHCs the problem does not appear. In larger versions of this reproducer, the error differed when run with profiling mode and without.
If it helps, I can try to minimize further, but the corruption is very fragile and tiny changes eliminate the error or at least change it to a form that is not as obviously wrong (does not mention a number from the code it should not depend on).
cc: @christiaanb
- Show closed items
Relates to
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Mikolaj Konarski added needs triage label
added needs triage label
- Maintainer
Sam will try to have a look.
1 - Ben Gamari added Tquestion label and removed needs triage label
added Tquestion label and removed needs triage label
- Ben Gamari assigned to @sheaf
assigned to @sheaf
- Ben Gamari added typechecker plugins label
added typechecker plugins label
- Developer
I think the typechecker plugin is more of a red herring, or at the very most simply creating a situation that's tripping up the simplifier.
Doing:
--- a/test/simplified/TestHighRankSimplified.hs +++ b/test/simplified/TestHighRankSimplified.hs @@ -56,6 +56,7 @@ nestedBuildMap r = tmap0N id $ tkonst0N (takeShape @n @(114 - n) (2 :$ 4 :$ 2 :$ 2 :$ 4 :$ 2 :$ 1 :$ 3 :$ 2 :$ 2 :$ 4 :$ 2 :$ 1 :$ 3 :$ 2 :$ 2 :$ 4 :$ 2 :$ 1 :$ 3 :$ 2 :$ 2 :$ 4 :$ 2 :$ 1 :$ 3 :$ 2 :$ 2 :$ 4 :$ 2 :$ 1 :$ 3 :$ 2 :$ 2 :$ 4 :$ 2 :$ 1 :$ 3 :$ 2 :$ 2 :$ 4 :$ 2 :$ 1 :$ 3 :$ 2 :$ 1 :$ 3 :$ 2 :$ 2 :$ 4 :$ 2 :$ 1 :$ 3 :$ 2 :$ 2 :$ 4 :$ 2 :$ 1 :$ 3 :$ 2 :$ 2 :$ 4 :$ 2 :$ 1 :$ 3 :$ 2 :$ 2 :$ 4 :$ 2 :$ 1 :$ 3 :$ 2 :$ 2 :$ 4 :$ 2 :$ 1 :$ 3 :$ 2 :$ 2 :$ 4 :$ 2 :$ 1 :$ 3 :$ 2 :$ 2 :$ 4 :$ 2 :$ 1 :$ 3 :$ 2 :$ 2 :$ 4 :$ 2 :$ 1 :$ 3 :$ 2 :$ 2 :$ 4 :$ 2 :$ 1 :$ 3 :$ 2 :$ 2 :$ 4 :$ 2 :$ 1 :$ 3 :$ 2 :$ 2 :$ 4 :$ 2 :$ 1 :$ 3 :$ 2 :$ ZS)) (tindex (tkonst 1 r) (0 :. ZI)) +{-# NOINLINE nestedBuildMap #-} testNestedBuildMap7 :: Assertion testNestedBuildMap7 =
is suffient to make the testsuite pass
1 - Developer
at the very most simply creating a situation that's tripping up the simplifier.
If you think there is a bug in the simplifier, could you offer a repro case? Thanks!
Collapse replies - Author Developer
@simonpj: I don't know where the bug is, but if it's in the simplifier, the original repro case from the ticket description very likely triggers it. Does it suffice? If not, how can I improve it?
- Author Developer
I'm now hit by this issue really hard. Over a hundred of my tests are affected. (I only guess it's the same issue --- it shows up exclusively in GHC 9.6.1 with with -O1 and hours of debugging haven't provided any alternative explanation; however, I haven't checked if the errors display the values of type applications from unrelated code, as before). To see these new cases, run
cabal test simplifiedOnlyTest --enable-optimization -ftest_seq --allow-newer
on this commit
https://github.com/Mikolaj/horde-ad/commit/cd5d64ef14710ff53052e9b882a14fe6ddd7988c
I tried
-fno-cse -fno-full-laziness
globally for the whole program, but it doesn't help and I can't see a-fno-inline
option that I could use per-module instead of marking two hundred functions withNOINLINE
.Edit: I've tried
{-# OPTIONS_GHC -funfolding-creation-threshold=0 #-}
but it doesn't seem to help and
{-# OPTIONS_GHC -funfolding-use-threshold=0 #-}
which helps a bit and NOINLINE, but these two help only for a few tests. So there doesn't seem to be a simple workarounds this time.
Edited by Mikolaj Konarski Collapse replies - Developer
I couldn't figure why but
-fno-specialise
seems to avoid the issue. - Author Developer
Oh, interesting. Unfortunately,
-fno-specialise
specified only in the test files fixes only a dozen tests. Removing-fexpose-all-unfoldings -fspecialise-aggressively
from the cabal file was similarly ineffective. Your global-fno-specialise
fixes the problem completely, but slows down the test run by 50%. BTW, I couldn't deduce from the GHC manual whether it disables explicit SPECIALIZE pragmas, which the codebase has a lot of, so I don't know if 50% is the whole benefit of specialization for this code path.
- Mikolaj Konarski changed the description
Compare with previous version changed the description
- Author Developer
May I ask for some help here? I'd discover many more GHC 9.6 bugs (or I'd confirm it's rock-solid if all bugs are already fixed) if I was able to use it, which I'm not able to for two months now.
- sheaf assigned to @mpickering
assigned to @mpickering
- Developer
This specialisation at least appears like one bug. It replaces a call to
typeNatTypeRep
with a specific one (ignoring what the value should actually be)."SPEC/TestHighRankSimplified typeNatTypeRep @_" forall (@(ipv_acze :: Nat)) ($dKnownNat_scAz :: KnownNat ipv_acze). base:Data.Typeable.Internal.typeNatTypeRep @ipv_acze $dKnownNat_scAz = TestHighRankSimplified.$s$fTensorADVal0_$stypeNatTypeRep @ipv_acze
1 - Developer
-- RHS size: {terms: 14, types: 11, coercions: 0, joins: 0/0} TestHighRankSimplified.$s$fTensorADVal0_$stypeNatTypeRep :: forall {ipv :: Nat}. base:Data.Typeable.Internal.TypeRep ipv [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False, WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 120 0}] TestHighRankSimplified.$s$fTensorADVal0_$stypeNatTypeRep = \ (@(ipv_acze :: Nat)) -> case ghc-prim:GHC.Magic.noinline @([GHC.Fingerprint.Type.Fingerprint] -> GHC.Fingerprint.Type.Fingerprint) GHC.Fingerprint.fingerprintFingerprints (TestHighRankSimplified.$s$fTensorADVal17 @ipv_acze) of { GHC.Fingerprint.Type.Fingerprint bx_abEC bx1_abED -> base:Data.Typeable.Internal.mkTrCon_$smkTrCon @ipv_acze bx_abEC bx1_abED base:Data.Typeable.Internal.typeCharTypeRep3 (TestHighRankSimplified.$s$fTensorADVal15 @ipv_acze) 0# base:Data.Typeable.Internal.typeNatTypeRep1 (ghc-prim:GHC.Types.[] @base:Data.Typeable.Internal.SomeTypeRep) }
- Developer
I am struggling to work out how we end up with this core, it seems that we are missing applying a substitution or something along those lines.
1 - Developer
I had a nice call with @simonpj where we(/he) identified the site of the bad specialisation. He will post a comment explaining what the issue is.
1 - Developer
@simonpj Did you have time to think about this more and a possible solution? It would be good to write down at least what the problem is before it is paged out.
- Developer
Here is what happens, in gory detail
-
In
test/simplified/TestHigherRankSimplified
we callHordeAd.Core.ADValTensor.$w$s$ctbuild3
. -
The specialiser generates a specialised version of that imported function:
$s$w$s$ctbuild3 :: ShapeInt (32 + 0) -> (IndexOf 32 (ADVal ADModeGradient Double) -> TensorOf 0 (ADVal ADModeGradient Double)) -> TensorOf (32 + 0) (ADVal ADModeGradient Double)
-
In the body of
$s$w$s$ctbuild3
we see a local recursive definitionbuildSh_scxv [Occ=LoopBreaker] :: forall (m1 :: Nat). KnownNat m1 => ShapeInt m1 -> (IndexOf m1 (ADVal ADModeGradient Double) -> TensorOf 0 (ADVal ADModeGradient Double)) -> TensorOf (m1 + 0) (ADVal ADModeGradient Double) buildSh_scxv = \ (@(m1_acwR :: Nat)) ($dKnownNat2_acwS [Occ=Once1] :: KnownNat m1_acwR) (ds_acwT [Occ=Once1!] :: Shape m1_acwR Int) (f_acwU [Occ=OnceL2!] :: IndexOf m1_acwR (ADVal ADModeGradient Double) -> TensorOf 0 (ADVal ADModeGradient Double)) -> case ds_acwT |> co of { ... HordeAd.Internal.SizedList.::: @ipv_acwZ ipv1_acx0 _ [Occ=Dead] ipv3_acx2 [Occ=Once1] ipv4_acx3 [Occ=OnceL1] -> let { $dKnownNat3_scxx :: Natural $dKnownNat3_scxx = GHC.Num.Natural.naturalSubThrow ($dKnownNat2_acwS `cast` (GHC.TypeNats.N:KnownNat[0] ipv1_acx0 ; GHC.TypeNats.N:SNat[0] <1 + ipv_acwZ>_P :: KnownNat m1_acwR ~R# Natural)) HordeAd.Core.SizedIndex.$m:$2 } in ... buildSh_scxv @ipv_acwZ ($dKnownNat3_scxx `cast` (Sym (GHC.TypeNats.N:SNat[0] <ipv_acwZ>_P) ; Sym (GHC.TypeNats.N:KnownNat[0] <ipv_acwZ>_N) :: Natural ~R# KnownNat ipv_acwZ)) ...
Note the recursive call to
buildSh_scxv
passing locally-bound dictionary$dKnownNat3_scxx :: Natural
. -
In the body of
$s$w$s$ctbuild3
, from a call tobuildSh_scxv
in the body of thelet
, we make a specialised copy ofbuildSh_scxv
, at type32
, thus:"SPEC buildSh @32" forall ($dKnownNat2_scxy :: KnownNat 32). buildSh_scxv @32 $dKnownNat2_scxy = $sbuildSh_scxB]
where
$sbuildSh_scxB :: Shape 32 Int -> (IndexOf 32 (ADVal ADModeGradient Double) -> TensorOf 0 (ADVal ADModeGradient Double)) -> TensorOf (32 + 0) (ADVal ADModeGradient Double)
-
So the body of
$sbuildSh_scxB
will look like this, instantiatingbuildSh_scxv
withm1
=32
, and$dKnownNat2_acwS
as follows:$sbuildSh_scxB = \ (ds_acwT [Occ=Once1!] :: Shape 32 Int) (f_acwU [Occ=OnceL2!] :: IndexOf 32 (ADVal ADModeGradient Double) -> TensorOf 0 (ADVal ADModeGradient Double)) -> let $dKnownNat2_acwS :: KnownNat 32 $dKnownNat2_acwS = $dKnownNat_saDl `cast` (Sym (GHC.TypeNats.N:SNat[0] <32>_P) ; Sym (GHC.TypeNats.N:KnownNat[0] <32>_N) :: Natural ~R# KnownNat 32)) case ds_acwT |> co of HordeAd.Internal.SizedList.::: @ipv_acwZ ipv1_acx0 _ [Occ=Dead] ipv3_acx2 [Occ=Once1] ipv4_acx3 [Occ=OnceL1] -> let { $dKnownNat3_scxx :: Natural $dKnownNat3_scxx = GHC.Num.Natural.naturalSubThrow ($dKnownNat2_acwS `cast` (GHC.TypeNats.N:KnownNat[0] ipv1_acx0 ; GHC.TypeNats.N:SNat[0] <1 + ipv_acwZ>_P :: KnownNat 32 ~R# Natural)) HordeAd.Core.SizedIndex.$m:$2 } in ... ....(buildSh_scxv @ipv_acwZ ($dKnownNat3_scxx `cast` (Sym (GHC.TypeNats.N:SNat[0] <ipv_acwZ>_P) ; Sym (GHC.TypeNats.N:KnownNat[0] <ipv_acwZ>_N) :: Natural ~R# KnownNat ipv_acwZ))
Here
$dKnownNat3_saDl
is a top-level defn$dKnownNat_saDl :: Natural $dKnownNat_saDl = GHC.Num.Natural.NS 32##
-
Now inline that local
$dKnownNat2_awcS
into its single use site inscxx
:$sbuildSh_scxB = \ (ds_acwT [Occ=Once1!] :: Shape 32 Int) (f_acwU [Occ=OnceL2!] :: IndexOf 32 (ADVal ADModeGradient Double) -> TensorOf 0 (ADVal ADModeGradient Double)) -> case ds_acwT |> co of HordeAd.Internal.SizedList.::: @ipv_acwZ ipv1_acx0 _ [Occ=Dead] ipv3_acx2 [Occ=Once1] ipv4_acx3 [Occ=OnceL1] -> let { $dKnownNat3_scxx :: Natural $dKnownNat3_scxx = GHC.Num.Natural.naturalSubThrow ($dKnownNat2_saDl `cast` (Sym (GHC.TypeNats.N:SNat[0] <32>_P) ; Sym (GHC.TypeNats.N:KnownNat[0] <32>_N) :: Natural ~R# KnownNat 32) `cast` (GHC.TypeNats.N:KnownNat[0] ipv1_acx0 ; GHC.TypeNats.N:SNat[0] <1 + ipv_acwZ>_P :: KnownNat 32 ~R# Natural)) HordeAd.Core.SizedIndex.$m:$2 } in ... ....(buildSh_scxv @ipv_acwZ ($dKnownNat3_scxx `cast` (Sym (GHC.TypeNats.N:SNat[0] <ipv_acwZ>_P) ; Sym (GHC.TypeNats.N:KnownNat[0] <ipv_acwZ>_N) :: Natural ~R# KnownNat ipv_acwZ))
-
Now the two casts cancel out, thereby dropping all mention of
ipv1_acx0 :: 32 ~ (ipv_acw+1)
.$sbuildSh_scxB = \ (ds_acwT [Occ=Once1!] :: Shape 32 Int) (f_acwU [Occ=OnceL2!] :: IndexOf 32 (ADVal ADModeGradient Double) -> TensorOf 0 (ADVal ADModeGradient Double)) -> let $dKnownNat2_acwS :: KnownNat 32 $dKnownNat2_acwS = $dKnownNat_saDl case ds_acwT |> co of HordeAd.Internal.SizedList.::: @ipv_acwZ ipv1_acx0 _ [Occ=Dead] ipv3_acx2 [Occ=Once1] ipv4_acx3 [Occ=OnceL1] -> let { $dKnownNat3_scxx :: Natural $dKnownNat3_scxx = GHC.Num.Natural.naturalSubThrow $dKnownNat2_saDl HordeAd.Core.SizedIndex.$m:$2 } in ... ....(buildSh_scxv @ipv_acwZ ($dKnownNat3_scxx `cast` (Sym (GHC.TypeNats.N:SNat[0] <ipv_acwZ>_P) ; Sym (GHC.TypeNats.N:KnownNat[0] <ipv_acwZ>_N) :: Natural ~R# KnownNat ipv_acwZ))
-
BLAM! Now we use
Note [Specialising polymorphic dictionaries]
in Specialise.hs, to float- The evidence
let $dKnownNat3_scxx
- The call
(buildSh_scxv @ipv_acwZ ($dKnownNat3_scxx
cast<coercion mentioning ipv_acwZ>)
- The evidence
-
And then we utterly-bogusly specialise
buildSh_scxv
to giveRULES: "SPEC buildSh @_" forall (@(ipv_acwZ :: Nat)) ($dKnownNat2_scxC :: KnownNat ipv_acwZ). buildSh_scxv @ipv_acwZ $dKnownNat2_scxC = $sbuildSh_scxF @ipv_acwZ
1 -
- Simon Peyton Jones mentioned in commit f0221b7b
mentioned in commit f0221b7b
- Developer
Do we have any plan on how to fix this @simonpj? It seems like if there is no immediate plan we should aim to revert the changes to polymorphic specialisation as this is a very serious bug.