Skip to content

Encode refutable shape information in `PmOracle`

Sebastian Graf requested to merge wip/pmcheck-ncon into master

Previously, we had an elaborate mechanism for selecting the warnings to generate in the presence of different COMPLETE matching groups that, albeit finely-tuned, produced wrong results from an end user's perspective in some cases (#13363 (closed)).

The underlying issue is that at the point where the ConVar case has to commit to a particular COMPLETE group, there's not enough information to do so and the status quo was to just enumerate all possible complete sets nondeterministically. The getResult function would then pick the outcome according to metrics defined in accordance to the user's guide. But crucially, it lacked knowledge about the order in which affected clauses appear, leading to the surprising behavior in #13363 (closed).

In !1010 (closed) we taught the term oracle to reason about literal values a variable can certainly not take on. This MR extends that idea to ConLikes and thereby fixes #13363 (closed): Instead of committing to a particular COMPLETE group in the ConVar case, we now split off the matching constructor incrementally and record the newly covered case as a refutable shape in the oracle. Whenever the set of refutable shapes covers any COMPLETE set, the oracle recognises vacuosity of the uncovered set.

This patch goes a step further: Since at this point the information in value abstractions is merely a cut down representation of what the oracle knows, value abstractions degenerate to a single Id, the semantics of which is determined by the oracle state Delta. Value vectors become lists of [Id] given meaning to by a single Delta, value set abstractions (of which the uncovered set is an instance) correspond to a union of Deltas which instantiate the same [Id] (akin to models of formula).


Reviewer's guide

Best followed in the order points appear.

PmExpr

  • In analogy to AltCon, I introduced PmAltCon to subsume PmExprCon and PmExprLit of PmExpr in a single case.
  • PmExpr is gone. We encode the remaining two cases (PmExprCon, PmExprVar) in PmOracle.TmCt instead.
  • The equality check between literals/ConLike is actually undecidable in some cases (see eqPmAltCons and Note [Undecidable Equality for PmAltCons]), and rather than just returning False in those cases (like what eqPmLit did), we really want to make explicit when we are sure when things are different. So: Equal :: PmEquality means definitely equal, Disjoint means definitely different. PossiblyOverlap means unsure.
  • Correctly handling undecidable equality revealed a bug in the fix for #14546 (closed). I added T14546d as a regression test demonstrating how we lossily convert some literals to overloaded literals on HEAD for no reason.
  • Since Check.translatePat now works by desugaring to Core (see the later section in Check for why) and then recovering possible literals and constructor applications, we need a function literalToPmLit that can turn Core Literals into PmLits. RHS of PmGrds are now plain CoreExprs, which we make sense of by feeding them to the oracle with addVarCoreCt.
  • Now, we could do type PmLit = Literal. But that doesn't work, because Literal has no notion of overloaded literals or String literals and the literals are all defined in terms of primitive types (Int#) instead of the user-facing types (Int). So: Define a clone of Literal, sitting between Source Haskell and Core and un-desugar Literals in literalToPmLit.

PmOracle

  • Due to various reasons (for example, deciding whether a COMPLETE set is overlapped needs type info in addition to which constructors were already matched), most operations on the term oracle also need access to the type oracle now. That's why I merged the type oracle and many operations related to void type from Check into TmOracle and renamed it to PmOracle.
  • Central to this patch are the changes to the term oracle. Ergo, understanding the representation of its state TmState is a good starting point. It stores everything it knows about a variable ('value abstraction') in a VarInfo, so it's basically just a (shared) IdEnv VarInfo. The shared bit (see SharedDIdEnv) comes from the fact that we want to represent i.e. x and y by the same VarInfo if we get to know that x ~ y.
  • VarInfo carries two essential fields: One for positive information (i.e., x is Just 5) and one for negative information (i.e., x can't be Left, because we matched on it earlier).
  • Now is a good time to understand Note [The Pos/Neg invariant], then Note [TmState invariants]. This should be enough to grok the section Term oracle unification procedure and addRefutableAltCon
  • Appreciate the new definition of addVarVarCt and addVarConCt (formerly unify), where all cases are exhaustive now.
  • Note that adding new type constraints (for example in a GADT match) makes it necessary to prove that every VarInfo is still inhabited by at least one constructor. That's what the call to ensureAllPossibleMatchesInhabited in tyIsSatisfiable is about.
  • The 3 independent tests in pmIsSatisfiable (ty oracle, term oracle and strict fields) are now split into their own SatisfiabilityCheck newtype, which different call sites compose as they see fit.
  • The mechanic surrounding vi_cache is worth mentioning. Whenever we test whether a COMPLETE set is completely overlapped, we have to take the set difference with vi_neg. When we do this repeatedly, it becomes pretty expensive for large enums, like in #14667 (closed). So we cache the thinned out COMPLETE sets in VarInfo, which is always a superset of the possible matches. Hence we call the associated data structure PossibleMatches. The only invariant surrounding it is that it has no overlap with vi_neg. Note that this is purely a performance optimisation.

Check

  • The first thing to note is that the PmNLit case of PmPat is gone: We encode that knowledge in the oracle now. Also, PmPat is used for patterns exclusively now, since value abstractions are only variables. This means type ValVec = [Id] and also that these variables never change throughout the checking process, it's just the oracle state Delta that is gradually split and refined. So the uncovered set becomes a union of Deltas, type Uncoverd = [Delta].
  • Since we detect an overlapped COMPLETE set immediately when it is overlapped (by vi_neg), we generate much better warnings and don't need to commit to a particular COMPLETE set non-deterministically when encountering the first constructor, like we used to do. Consequently, we don't need ListT anymore and Provenance of a COMPLETE set is irrelevant.
  • We found that negative information (i.e. failed to match x where x is not one of {True}) is not so great from a UX perspective. In addition, it's hard to tell if a Delta is still satisfiable without searching for an inhabitant (which basically is a model delta where every variable reachable from a given value vector only has positive information attached to it). So we turn a delta into N positive inhabitants by calling PmOracle.provideEvidenceForEquation.
  • There are two performance hacks for coping with guards (See item 6 of Note [Guards and Approximation]) and exponential behavior resulting from ConVar splitting on pattern synonyms (see Note [Limit the number of refinements])
  • PmPpr is for user-facing warning messages. Consequently, we can use the Outputable instances for debugging purposes.
Edited by Sebastian Graf

Merge request reports