{-# LANGUAGE MonoLocalBinds #-}moduleMain(main)whereclassCawhereop::a->Stringinstance{-# OVERLAPPABLE #-}Cawhereop_="Generic"{-# NOINLINE op #-}instance{-# INCOHERENT #-}C()whereop_="Specific"{-# NOINLINE op #-}{-# NOINLINE large #-}-- | Inhibit inlining, but keep specialize-abilitylarge::a->alargex=xbar::Ca=>a->Stringbarx=large(large(large(large(large(large(large(large(large(large(large(large(large(large(opx))))))))))))))spec::()->Stringspec=bargen::a->Stringgen=barmain::IO()main=doputStrLn$"spec () == "<>spec()putStrLn$"gen () == "<>gen()
Because gen has no typeclass constraint on C a, the only way to resolve the constraint on bar is to choose the generic instance. On the other hand, in spec, we have a more specific instance for C (), and so that one should be picked.
Compiling and running without optimizations, this is indeed the behaviour one can observe:
$ ./_build/stage1/bin/ghc -fforce-recomp --make input/overlap3.hs && ./input/overlap3[1 of 2] Compiling Main ( input/overlap3.hs, input/overlap3.o )[2 of 2] Linking input/overlap3 [Objects changed]spec () == Specificgen () == Generic
However, if we then enable specialisation, something goes wrong. How exactly, depends on the version of GHC.
$ ./_build/stage1/bin/ghc -fforce-recomp -fspecialise -fenable-rewrite-rules --make input/overlap3.hs && ./input/overlap3[1 of 2] Compiling Main ( input/overlap3.hs, input/overlap3.o )[2 of 2] Linking input/overlap3 [Objects changed]spec () == Genericgen () == Generic
But from 5a997e16 onwards, we get the, perhaps even more baffling:
$ ./_build/stage1/bin/ghc -fforce-recomp -fspecialise -fenable-rewrite-rules --make input/overlap3.hs && ./input/overlap3[1 of 2] Compiling Main ( input/overlap3.hs, input/overlap3.o )[2 of 2] Linking input/overlap3 [Objects changed]spec () == Specificgen () == Specific
The change between these two commits is an overhaul of the specialiser implementation by @simonpj , so it's at least understandable that the behaviour changed at this point.
I think the clue {-# INCOHERENT #-}. It says "all bets are off". And indeed so it proves.
The specialiser operates under the assumption that if you need a dictionary of type C () then any value of that type will do. But not so here, by design. If you want {-# INCOHERENT #-} you may need to switch off specialisation. Your example demonstrates that very nicely, thank you.
Without {-# INCOHERENT #-} GHC will reject the defn of gen, on the grounds that it can't find a (coherent) soution for C a. You neeed to say gen :: C a => a -> String.
The specialiser operates under the assumption that if you need a dictionary of type C () then any value of that type will do.
Is this really true? I thought the specialiser was careful never to come up with evidence on its own, and instead, taking all evidence from its input and just shuffling them around. Are you saying that it is OK, or even expected, for the specialiser to be making decisions on typeclass resolution?
My coworker's waiting for his haskell.org account to be verified, but he sent me the following detailed derivation:
From the GHC manual, section 6.8.8.4:
A more precise specification is as follows. The willingness to be
overlapped or incoherent is a property of the instance declaration
itself, controlled as follows:
An instance is incoherent if: it has an INCOHERENT pragma; or if
the instance has no pragma and it appears in a module compiled
with IncoherentInstances.
An instance is overlappable if: it has an OVERLAPPABLE or OVERLAPS
pragma; or if the instance has no pragma and it appears in a
module compiled with OverlappingInstances; or if the instance is
incoherent.
An instance is overlapping if: it has an OVERLAPPING or
OVERLAPS pragma; or if the instance has no pragma and it appears
in a module compiled with OverlappingInstances; or if the
instance is incoherent.
Now suppose that, in some client module, we are searching for an
instance of the target constraint (C ty1 .. tyn). The search works
like this:
Find all instances I that match the target constraint; that is,
the target constraint is a substitution instance of I. These
instance declarations are the candidates.
If no candidates remain, the search fails
Eliminate any candidate IX for which there is another candidate IY
such that both of the following hold:
IY is strictly more specific than IX. That is, IY is a
substitution instance of IX but not vice versa.
Either IX is overlappable, or IY is overlapping. (This
"either/or" design, rather than a "both/and" design, allow a
client to deliberately override an instance from a library,
without requiring a change to the library.)
If all the remaining candidates are incoherent, the search succeeds, returning an arbitrary surviving candidate.
If more than one non-incoherent candidate remains, the search fails.
Otherwise there is exactly one non-incoherent candidate; call it the "prime candidate".
Now find all instances, or in-scope given constraints, that unify
with the target constraint, but do not match it. Such
non-candidate instances might match when the target constraint is
further instantiated. If all of them are incoherent top-level
instances, the search succeeds, returning the prime
candidate. Otherwise the search fails.
I don't see any mention of inlining having an effect in the
specification above. According to the manual, gen should be
specialised with the C a generic instance and spec should be
specialised with the C () specific instance. See derivations below.
Resolving the target constraint C a inside of gen is documented to
follow these steps:
There is only one candidate matching, which is the generic instance C a
There is no elimination of candidates (C () is more specific than
C a, with C a overlappable and C () overlapping because
incoherent, but C () is not in the set of candidates)
We are in the case where there is exactly one non-incoherent
instance: C a is the "prime candidate"
The final check is performed, C () is one such non-candidate
instance, and it is incoherent.
Therefore the search succeeds, returning the prime candidate C a
Resolving the target constraint C () inside of spec is documented to follow these steps:
There are two candidates matching, the generic instance C a and
the specific instance C ()
C a is eliminated because C () is more specific than C a,
with C a overlappable and C () overlapping because incoherent
Remaining candidates, which is only C (), are all incoherent: the
search succeeds returning an arbitrary surviving candidate, which
here can only be C ()
Your colleague's reasoning is correct. That's why the program typechecks, and is rejected if you remove {-# INCOHERENT #-}.
But {-# INCOHERENT #-} is called that way for a reason. Perhaps it should be {-# VERY_BAD_THINGS_CAN_HAPPEN_BUT_I_KNOW_WHAT_I_AM_DOING #-}. It's a no-seatbelts pragma.
If you really want incoherence -- but you don't want the specialiser to mess with the incoherent choices made by the type checker -- then use -fno-specialise
But the manual, as quoted above, does NOT say that incoherence is all-bets-are-off. It describes exactly an algorithm to compute instance resolution. This instance resolution algorithm is in fact the one that seems to be implemented by the typechecker, the part of GHC that I would expect to make any instance resolution decisions at all.
In contrast, the specialiser is an optimization, and as such, it should have no impact on a program whose behaviour is specified by the user-visible algorithm described in the manual.
I agree that the manual should say what I say in this thread: that incoherence is extremely dangerous and may affect the optimiser. If you could offer a patch that would be great.
I agree that it would be lovely if the specialiser did not affect the meaning of incoherent program, but I don't know a way to do that except by switching it off. And that seems too drastic because we know that the specialiser is crucial to performance.
Any better ideas would be great. But that's how it is right now, and I think the best we can do is to explain it in the manual.
Documentation for unsafeCoerce is in base:Unsafe.Coerce
{- Note [Implementing unsafeCoerce]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~The implementation of unsafeCoerce is surprisingly subtle.This Note describes the moving parts. You will find morebackground in MR !1869 and ticket #16893.
Idea: tag incoherently-originating dictionaries in the typechecker's output with an application of GHC.Magic.nospec, so that the specializer can avoid them (see GHC.Tc.Instance.Class for details).
This gives us a way to keep the current well-specified semantics of incoherent instances without giving up specialization for a whole module.
Idea: tag incoherently-originating dictionaries in the typechecker's output with an application of GHC.Magic.nospec, so that the specializer can avoid them (see GHC.Tc.Instance.Class for details).
This is a promising idea to reduce the damage from {-# INCOHERENT #-} instances.
We have an existing precedent: see Note [withDict] in GHC.Tc.Instance.Class, wrinkle (WD6). withDict is the archetypal example of incoherence!
OK so I finally have some time to look into this. To make it more concrete, for the original example in this ticket, currently we generate the following code:
Getting the "we arrived at the $fMyC_a decision in a potentially incoherent way" flag from instance lookup to match_one is straightforward. However, in match_one, we are only generating the code for the right-hand side of $dC_a's binding. The code for passing $dC_a to bar @a is done by the HsWrapper made in instCallConstraints, which doesn't yet know what the result of solving the just-emitted Wanted for C a is going to be.
Idea: Life would be much simpler if instead we could generate
because then, the wrapper doesn't need to know if we need nospec or not, only the definition of $cps_dC_a (and of course, if it was coherent, we'd just generate $cps_dC_a = \@b k -> k ($fMyC_a @a)); and we would get the desired core in both cases after a single round of inlining (which already happens before specialization).
New idea: handle this during desugaring. At that point, all the evidence variables have already been filled, so we should be able to look at their coherency, stored in a new field of EvBind.
The tricky part is that sometimes evidence variables are bound in one HsWrapper and used in another, so we'll need to pass the coherence information between the two in DsM's local env.
I think the ship has sailed, but I still say I'm pretty unconvinced by the motivation here.
I think part of the problem is that we're conflating two properties:
Coherence. A type inference algorithm is coherent if the result of type inference is reliable -- no arbitrary decisions are made during solving. Because our type inference algorithm depends on the instances that are in scope, we can describe a set of instances as coherent or not.
Canonicity. A dictionary type is canonical if every instantiation of that type -- both fully concrete ones and abstract ones containing variables -- has exactly one inhabitant. By "exactly one inhabitant", we really mean that if d1 :: ty and d2 :: ty, then d1 and d2 behave the same in all reasonable contexts. By "reasonable", I'm excluding trickery like reallyUnsafePtrEq# and similar.
Rule IL6 in the note describes throwing away some instances and choosing a "prime candidate". Rule IL6 thus does not threaten coherence -- the chosen candidate is uniquely determined. However, rule IL6 does threaten canonicity, to the consternation of @cactus. (On the other hand, rule IL4 hurts both coherence and canonicity. On the other hand, type inference rules like Note [Instance and Given overlap] threaten coherence but not, I think, canonicity.)
According to this analysis, the careful explanation in the manual of our resolution algorithm does suggest that, under the conditions of our main example, GHC's algorithm is coherent: the result of type inference is reliable, and does not depend on the phase of the moon. However, it says nothing about canonicity -- a property that we have never promised to hold if there are any incoherent instances around.
The patch associated with this ticket, !9411 (merged), is an approach to make the specializer behave well even on non-canonical dictionary types. But at what cost? I will explore more in the patch, but I'd be hesitant to worsen performance for other users who care not about canonicity to satisfy one user who does care.
OK I had not really distinguished canonicity from coherence. But yes, this MR is all about canonicity.
I'd be hesitant to worsen performance for other users who care not about canonicity to satisfy one user who does care.
Well, the specialiser relies on canonicity. And {-# INCOHERENT #-} is largely about losing canonicity and only a bit about losing coherence I think. So if you say {-# INCOHERENT #-} anywhere, you are risking arbitrarily weird results if you also run the specialiser. This MR fixes that.
I'd be hesitant to worsen performance for other users who care not about canonicity to satisfy one user who does care.
What performance are you worried about? CI seems stable.
From my perspective, it is {-# OVERLAPPABLE #-} that signals "please don't specialize to this instance." Perhaps {-# INCOHERENT #-}sometimes carries the same meaning due to the "prefer non-incoherent instances" rule, but I would expect (without much evidence) that it is primarily used for its "choose arbitrarily among several incoherent instances" feature. And the latter feature is much less useful when the choice of instances actually matters.
So if you say {-# INCOHERENT #-} anywhere, you are risking arbitrarily weird results if you also run the specialiser.
Not arbitrary weird, right? Where two incoherent instances exist for the same type, you get no guarantee as to which dictionary is picked after specialisation, but if the programmer defines instances appropriately that may be acceptable?
I think there are two reasonable interpretations of INCOHERENT:
The current approach: the programmer is obliged to define instances such that the selected dictionary does not matter, permitting more specialisation.
The proposed approach: allow the programmer to define instances with varying behaviour, with the type inference algorithm giving some guarantees on which will be picked (albeit with arbitrary choices in some cases); restrict specialisation so that optimisation does not change whatever semantics was determined by type-checking.
The latter seems useful, but I don't think we should silently switch from 1 to 2. User code may well depend on specialisation for good performance. Perhaps we need another pragma, or a flag -fno-specialise-incoherent-instances?
I would define "the current approach" as whatever is specified in the documentation, and any difference to this is a bug in the implementation. So this ticket (and !9411 (merged)) does not propose any new interpretation of INCOHERENT, it merely brings the implementation in line with the specification. The specification that no one have had any complaints about.
That assumes the documentation is a complete specification, which is sadly far from true.
The (hypothetical future) complaint I'm concerned about is that a library might rely on INCOHERENT instances and "enough" specialisation, and so would mysteriously regress in performance after this change. Do we have evidence that such a situation won't arise? (Though I realise that mysterious specialisation-related regressions are unfortunately not that uncommon...)
commit b56025f448646de40446a133f140f62c8a49cabfAuthor: Gergő Érdi <gergo@erdi.hu>Date: Mon Nov 21 02:49:17 2022 +0000 Don't specialise incoherent instance applications Using incoherent instances, there can be situations where two occurrences of the same overloaded function at the same type use two different instances (see #22448). For incoherently resolved instances, we must mark them with `nospec` to avoid the specialiser rewriting one to the other. This marking is done during the desugaring of the `WpEvApp` wrapper. Fixes #22448 Metric Increase: T15304 compiler/GHC/Core/InstEnv.hs | 255 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------- compiler/GHC/HsToCore.hs | 6 +- compiler/GHC/HsToCore/Arrows.hs | 4 +- compiler/GHC/HsToCore/Binds.hs | 222 ++++++++++++++++++++++++++++++++++++++++++++++---------------------- compiler/GHC/HsToCore/Binds.hs-boot | 2 +- compiler/GHC/HsToCore/Expr.hs | 36 ++++++------ compiler/GHC/HsToCore/Match.hs | 6 +- compiler/GHC/HsToCore/Match/Constructor.hs | 14 ++--- compiler/GHC/HsToCore/Monad.hs | 18 +++++- compiler/GHC/HsToCore/Pmc/Desugar.hs | 4 +- compiler/GHC/HsToCore/Quote.hs | 6 +- compiler/GHC/HsToCore/Types.hs | 4 ++ compiler/GHC/Tc/Errors.hs | 4 +- compiler/GHC/Tc/Instance/Class.hs | 131 +++++++++++++++++++++-------------------- compiler/GHC/Tc/Solver.hs | 8 +-- compiler/GHC/Tc/Solver/Canonical.hs | 16 ++--- compiler/GHC/Tc/Solver/Interact.hs | 45 +++++++------- compiler/GHC/Tc/Solver/Monad.hs | 16 ++--- compiler/GHC/Tc/TyCl/Instance.hs | 2 +- compiler/GHC/Tc/Types/Evidence.hs | 33 +++++++---- docs/users_guide/9.8.1-notes.rst | 6 ++ testsuite/tests/simplCore/should_run/T22448.hs | 30 ++++++++++ testsuite/tests/simplCore/should_run/T22448.stdout | 2 + testsuite/tests/simplCore/should_run/all.T | 1 + 24 files changed, 592 insertions(+), 279 deletions(-)
Disabling specialization for incoherent instances unconditionally seems to me like an extremely blunt hammer that makes it impossible to use them in a performant manner.
FWIW, the motivating example from OP that led to all this is easily fixable by putting an INLINE pragma on bar.