Specialiser changes program's behaviour

I think the clue {-# INCOHERENT #-}. It says "all bets are off". And indeed so it proves.

The specialiser operates under the assumption that if you need a dictionary of type C () then any value of that type will do. But not so here, by design. If you want {-# INCOHERENT #-} you may need to switch off specialisation. Your example demonstrates that very nicely, thank you.

Without {-# INCOHERENT #-} GHC will reject the defn of gen, on the grounds that it can't find a (coherent) soution for C a. You neeed to say gen :: C a => a -> String.

The specialiser operates under the assumption that if you need a dictionary of type C () then any value of that type will do.

Is this really true? I thought the specialiser was careful never to come up with evidence on its own, and instead, taking all evidence from its input and just shuffling them around. Are you saying that it is OK, or even expected, for the specialiser to be making decisions on typeclass resolution?

added specialisation label

Is this really true?

I think so. Suppose f :: Num a => [a] -> a. If it sees two calls

   f @T d1
   f @T d2

it feels free to

Specialise f for d1, making $sf :: [T] -> T
Replace both calls by $sf

Of course that is wrong if d1 :: Num T does not have the same value as d2 :: Num T

My coworker's waiting for his haskell.org account to be verified, but he sent me the following detailed derivation:

From the GHC manual, section 6.8.8.4:

A more precise specification is as follows. The willingness to be overlapped or incoherent is a property of the instance declaration itself, controlled as follows:

An instance is incoherent if: it has an INCOHERENT pragma; or if the instance has no pragma and it appears in a module compiled with IncoherentInstances.

An instance is overlappable if: it has an OVERLAPPABLE or OVERLAPS pragma; or if the instance has no pragma and it appears in a module compiled with OverlappingInstances; or if the instance is incoherent.

An instance is overlapping if: it has an OVERLAPPING or OVERLAPS pragma; or if the instance has no pragma and it appears in a module compiled with OverlappingInstances; or if the instance is incoherent.

Now suppose that, in some client module, we are searching for an instance of the target constraint (C ty1 .. tyn). The search works like this:

Find all instances I that match the target constraint; that is, the target constraint is a substitution instance of I. These instance declarations are the candidates.

If no candidates remain, the search fails

Eliminate any candidate IX for which there is another candidate IY such that both of the following hold:

IY is strictly more specific than IX. That is, IY is a substitution instance of IX but not vice versa.

Either IX is overlappable, or IY is overlapping. (This "either/or" design, rather than a "both/and" design, allow a client to deliberately override an instance from a library, without requiring a change to the library.)

If all the remaining candidates are incoherent, the search succeeds, returning an arbitrary surviving candidate.

If more than one non-incoherent candidate remains, the search fails.

Otherwise there is exactly one non-incoherent candidate; call it the "prime candidate".

Now find all instances, or in-scope given constraints, that unify with the target constraint, but do not match it. Such non-candidate instances might match when the target constraint is further instantiated. If all of them are incoherent top-level instances, the search succeeds, returning the prime candidate. Otherwise the search fails.

I don't see any mention of inlining having an effect in the specification above. According to the manual, gen should be specialised with the C a generic instance and spec should be specialised with the C () specific instance. See derivations below.

Resolving the target constraint C a inside of gen is documented to follow these steps:

There is only one candidate matching, which is the generic instance C a
There is no elimination of candidates (C () is more specific than C a, with C a overlappable and C () overlapping because incoherent, but C () is not in the set of candidates)
We are in the case where there is exactly one non-incoherent instance: C a is the "prime candidate"
The final check is performed, C () is one such non-candidate instance, and it is incoherent.
Therefore the search succeeds, returning the prime candidate C a

Resolving the target constraint C () inside of spec is documented to follow these steps:

There are two candidates matching, the generic instance C a and the specific instance C ()
C a is eliminated because C () is more specific than C a, with C a overlappable and C () overlapping because incoherent
Remaining candidates, which is only C (), are all incoherent: the search succeeds returning an arbitrary surviving candidate, which here can only be C ()

Your colleague's reasoning is correct. That's why the program typechecks, and is rejected if you remove {-# INCOHERENT #-}.

But {-# INCOHERENT #-} is called that way for a reason. Perhaps it should be {-# VERY_BAD_THINGS_CAN_HAPPEN_BUT_I_KNOW_WHAT_I_AM_DOING #-}. It's a no-seatbelts pragma.

If you really want incoherence -- but you don't want the specialiser to mess with the incoherent choices made by the type checker -- then use -fno-specialise

But the manual, as quoted above, does NOT say that incoherence is all-bets-are-off. It describes exactly an algorithm to compute instance resolution. This instance resolution algorithm is in fact the one that seems to be implemented by the typechecker, the part of GHC that I would expect to make any instance resolution decisions at all.

In contrast, the specialiser is an optimization, and as such, it should have no impact on a program whose behaviour is specified by the user-visible algorithm described in the manual.

I agree that the manual should say what I say in this thread: that incoherence is extremely dangerous and may affect the optimiser. If you could offer a patch that would be great.

I agree that it would be lovely if the specialiser did not affect the meaning of incoherent program, but I don't know a way to do that except by switching it off. And that seems too drastic because we know that the specialiser is crucial to performance.

Any better ideas would be great. But that's how it is right now, and I think the best we can do is to explain it in the manual.

changed the description

Documentation for unsafeCoerce is in base:Unsafe.Coerce

{- Note [Implementing unsafeCoerce]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The implementation of unsafeCoerce is surprisingly subtle.
This Note describes the moving parts.  You will find more
background in MR !1869 and ticket #16893.

Idea: tag incoherently-originating dictionaries in the typechecker's output with an application of GHC.Magic.nospec, so that the specializer can avoid them (see GHC.Tc.Instance.Class for details).

This gives us a way to keep the current well-specified semantics of incoherent instances without giving up specialization for a whole module.

Idea: tag incoherently-originating dictionaries in the typechecker's output with an application of GHC.Magic.nospec, so that the specializer can avoid them (see GHC.Tc.Instance.Class for details).

This is a promising idea to reduce the damage from {-# INCOHERENT #-} instances.

We have an existing precedent: see Note [withDict] in GHC.Tc.Instance.Class, wrinkle (WD6). withDict is the archetypal example of incoherence!

withDict is the archetypal example of incoherence!

Missed opportunity to call it "the canonical example of non-canonicity"!

changed the description

added Pnormal Tbug labels

assigned to @cactus

OK so I finally have some time to look into this. To make it more concrete, for the original example in this ticket, currently we generate the following code:

gen :: forall a. a -> String
gen = \@a -> 
  let $dC_a :: C a = $fMyC_a @a in 
  bar @a $dC_a

instead, we would like to generate:

gen :: forall a. a -> String
gen = \@a -> 
  let $dC_a :: C a = $fMyC_a @a in 
  nospec @(C a => a -> String) (bar @a) $dC_a

Getting the "we arrived at the $fMyC_a decision in a potentially incoherent way" flag from instance lookup to match_one is straightforward. However, in match_one, we are only generating the code for the right-hand side of $dC_a's binding. The code for passing $dC_a to bar @a is done by the HsWrapper made in instCallConstraints, which doesn't yet know what the result of solving the just-emitted Wanted for C a is going to be.

Idea: Life would be much simpler if instead we could generate

gen = \@a ->
  let $cps_dC_a [InlPrag=INLINE] :: forall b. (C a => b) -> b
      $cps_dC_a = \@b k -> nospec @(C => b) k ($fMyC_a @a)
  in $cps_dC_a @(a -> String) (bar @a)

because then, the wrapper doesn't need to know if we need nospec or not, only the definition of $cps_dC_a (and of course, if it was coherent, we'd just generate $cps_dC_a = \@b k -> k ($fMyC_a @a)); and we would get the desired core in both cases after a single round of inlining (which already happens before specialization).

New idea: handle this during desugaring. At that point, all the evidence variables have already been filled, so we should be able to look at their coherency, stored in a new field of EvBind.

The tricky part is that sometimes evidence variables are bound in one HsWrapper and used in another, so we'll need to pass the coherence information between the two in DsM's local env.

mentioned in merge request !9402 (closed)

mentioned in merge request !9411 (merged)

I think the ship has sailed, but I still say I'm pretty unconvinced by the motivation here.

I think part of the problem is that we're conflating two properties:

Coherence. A type inference algorithm is coherent if the result of type inference is reliable -- no arbitrary decisions are made during solving. Because our type inference algorithm depends on the instances that are in scope, we can describe a set of instances as coherent or not.

Canonicity. A dictionary type is canonical if every instantiation of that type -- both fully concrete ones and abstract ones containing variables -- has exactly one inhabitant. By "exactly one inhabitant", we really mean that if d1 :: ty and d2 :: ty, then d1 and d2 behave the same in all reasonable contexts. By "reasonable", I'm excluding trickery like reallyUnsafePtrEq# and similar.

Rule IL6 in the note describes throwing away some instances and choosing a "prime candidate". Rule IL6 thus does not threaten coherence -- the chosen candidate is uniquely determined. However, rule IL6 does threaten canonicity, to the consternation of @cactus. (On the other hand, rule IL4 hurts both coherence and canonicity. On the other hand, type inference rules like Note [Instance and Given overlap] threaten coherence but not, I think, canonicity.)

According to this analysis, the careful explanation in the manual of our resolution algorithm does suggest that, under the conditions of our main example, GHC's algorithm is coherent: the result of type inference is reliable, and does not depend on the phase of the moon. However, it says nothing about canonicity -- a property that we have never promised to hold if there are any incoherent instances around.

The patch associated with this ticket, !9411 (merged), is an approach to make the specializer behave well even on non-canonical dictionary types. But at what cost? I will explore more in the patch, but I'd be hesitant to worsen performance for other users who care not about canonicity to satisfy one user who does care.

OK I had not really distinguished canonicity from coherence. But yes, this MR is all about canonicity.

I'd be hesitant to worsen performance for other users who care not about canonicity to satisfy one user who does care.

Well, the specialiser relies on canonicity. And {-# INCOHERENT #-} is largely about losing canonicity and only a bit about losing coherence I think. So if you say {-# INCOHERENT #-} anywhere, you are risking arbitrarily weird results if you also run the specialiser. This MR fixes that.

I'd be hesitant to worsen performance for other users who care not about canonicity to satisfy one user who does care.

What performance are you worried about? CI seems stable.

From my perspective, it is {-# OVERLAPPABLE #-} that signals "please don't specialize to this instance." Perhaps {-# INCOHERENT #-} sometimes carries the same meaning due to the "prefer non-incoherent instances" rule, but I would expect (without much evidence) that it is primarily used for its "choose arbitrarily among several incoherent instances" feature. And the latter feature is much less useful when the choice of instances actually matters.

So if you say {-# INCOHERENT #-} anywhere, you are risking arbitrarily weird results if you also run the specialiser.

Not arbitrary weird, right? Where two incoherent instances exist for the same type, you get no guarantee as to which dictionary is picked after specialisation, but if the programmer defines instances appropriately that may be acceptable?

I think there are two reasonable interpretations of INCOHERENT:

The current approach: the programmer is obliged to define instances such that the selected dictionary does not matter, permitting more specialisation.
The proposed approach: allow the programmer to define instances with varying behaviour, with the type inference algorithm giving some guarantees on which will be picked (albeit with arbitrary choices in some cases); restrict specialisation so that optimisation does not change whatever semantics was determined by type-checking.

The latter seems useful, but I don't think we should silently switch from 1 to 2. User code may well depend on specialisation for good performance. Perhaps we need another pragma, or a flag -fno-specialise-incoherent-instances?

I would define "the current approach" as whatever is specified in the documentation, and any difference to this is a bug in the implementation. So this ticket (and !9411 (merged)) does not propose any new interpretation of INCOHERENT, it merely brings the implementation in line with the specification. The specification that no one have had any complaints about.

That assumes the documentation is a complete specification, which is sadly far from true.

The (hypothetical future) complaint I'm concerned about is that a library might rely on INCOHERENT instances and "enough" specialisation, and so would mysteriously regress in performance after this change. Do we have evidence that such a situation won't arise? (Though I realise that mysterious specialisation-related regressions are unfortunately not that uncommon...)

Perhaps we need another pragma, or a flag -fno-specialise-incoherent-instances?

That would be fine with me ... a small addition to what Gergo has done.

closed with merge request !9411 (merged)

Fixed by

commit b56025f448646de40446a133f140f62c8a49cabf
Author: Gergő Érdi <gergo@erdi.hu>
Date:   Mon Nov 21 02:49:17 2022 +0000

    Don't specialise incoherent instance applications
    
    Using incoherent instances, there can be situations where two
    occurrences of the same overloaded function at the same type use two
    different instances (see #22448). For incoherently resolved instances,
    we must mark them with `nospec` to avoid the specialiser rewriting one
    to the other. This marking is done during the desugaring of the
    `WpEvApp` wrapper.
    
    Fixes #22448
    
    Metric Increase:
        T15304

 compiler/GHC/Core/InstEnv.hs                       | 255 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------
 compiler/GHC/HsToCore.hs                           |   6 +-
 compiler/GHC/HsToCore/Arrows.hs                    |   4 +-
 compiler/GHC/HsToCore/Binds.hs                     | 222 ++++++++++++++++++++++++++++++++++++++++++++++----------------------
 compiler/GHC/HsToCore/Binds.hs-boot                |   2 +-
 compiler/GHC/HsToCore/Expr.hs                      |  36 ++++++------
 compiler/GHC/HsToCore/Match.hs                     |   6 +-
 compiler/GHC/HsToCore/Match/Constructor.hs         |  14 ++---
 compiler/GHC/HsToCore/Monad.hs                     |  18 +++++-
 compiler/GHC/HsToCore/Pmc/Desugar.hs               |   4 +-
 compiler/GHC/HsToCore/Quote.hs                     |   6 +-
 compiler/GHC/HsToCore/Types.hs                     |   4 ++
 compiler/GHC/Tc/Errors.hs                          |   4 +-
 compiler/GHC/Tc/Instance/Class.hs                  | 131 +++++++++++++++++++++--------------------
 compiler/GHC/Tc/Solver.hs                          |   8 +--
 compiler/GHC/Tc/Solver/Canonical.hs                |  16 ++---
 compiler/GHC/Tc/Solver/Interact.hs                 |  45 +++++++-------
 compiler/GHC/Tc/Solver/Monad.hs                    |  16 ++---
 compiler/GHC/Tc/TyCl/Instance.hs                   |   2 +-
 compiler/GHC/Tc/Types/Evidence.hs                  |  33 +++++++----
 docs/users_guide/9.8.1-notes.rst                   |   6 ++
 testsuite/tests/simplCore/should_run/T22448.hs     |  30 ++++++++++
 testsuite/tests/simplCore/should_run/T22448.stdout |   2 +
 testsuite/tests/simplCore/should_run/all.T         |   1 +
 24 files changed, 592 insertions(+), 279 deletions(-)

mentioned in commit b56025f4

mentioned in issue #23287 (closed)

As @adamgundry suspected, !9411 (merged) introduced significant performance regressions in the optics library (details in #23287 (closed)).

Disabling specialization for incoherent instances unconditionally seems to me like an extremely blunt hammer that makes it impossible to use them in a performant manner.

FWIW, the motivating example from OP that led to all this is easily fixable by putting an INLINE pragma on bar.

FWIW, the motivating example from OP that led to all this is easily fixable by putting an INLINE pragma on bar.

That was not meant to be a motivating example, it was meant to be a minimal demonstrator of the problem.

mentioned in issue #23172 (closed)

mentioned in merge request !10342 (closed)

mentioned in issue #23429

mentioned in issue #23988 (closed)

mentioned in issue #24924

Specialiser changes program's behaviour

Child items 0

Activity