gkaracha · 8223100e
--- a/pattern-match-check-implementation.md
+++ b/pattern-match-check-implementation.md
-# Exhaustiveness/Redundancy check implementation
+# Exhaustiveness/Redundancy check implementation \[THE PAGE IS CURRENTLY UNDER CONSTRUCTION\]


 This page describes how the exhaustiveness and redundancy cheker is implemented
-in GHC. If you are looking for a general description of the problem and an overview
-of our approach see [PatternMatchCheck](pattern-match-check).
-
-
-See also
-
- Peter Sestoft's paper for negative patterns [ ML's pattern matching compilation and partial evaluation](http://lambda.csail.mit.edu/~chet/papers/others/s/sestoft/sestoft96ml.pdf)
+in GHC. If you are looking for a general description of the problem and an
+overview of our approach see [PatternMatchCheck](pattern-match-check).

 ## The `PmPat` datatype and friends


-The `PmPat` data type is defined in `deSugar/Check.lhs` as:
+The `PmPat` data type is defined in `deSugar/Check.hs` as:

 ```wiki
-data PmPat id = PmGuardPat PmGuard
-              | PmVarPat { pm_ty :: Type, pm_var :: id }
-              | PmConPat { pm_ty :: Type, pm_pat_con :: DataCon, pm_pat_args :: [PmPat id] }
-              | PmLitPat { pm_ty :: Type, pm_lit :: PmLit id }
-              | PmLitCon { pm_ty :: Type, pm_var :: id, pm_not_lit :: [PmLit id] }
+data PmPat p = PmCon { pm_con_con     :: DataCon
+                     , pm_con_arg_tys :: [Type]  -- The univeral arg types, 1-1 with the universal
+                                                 --   tyvars of the constructor
+                     , pm_con_tvs     :: [TyVar] -- Existentially bound type variables (tyvars only)
+                     , pm_con_dicts   :: [EvVar] -- Ditto *coercion variables* and *dictionaries*
+                     , pm_con_args    :: [p] }
+             | PmVar { pm_var_id      :: Id }
+             | PmLit { pm_lit_lit     :: PmLit } -- See NOTE [Literals in PmPat]
 ```


-where
+where type `PmLit` (defined in `deSugar/PmExpr.hs`) is:

 ```wiki
-type PmGuard  = CanItFail
-
-data PmLit id = PmLit HsLit
-              | PmOLit (HsOverLit id) (Maybe (SyntaxExpr id)) (SyntaxExpr id)
+data PmLit = PmSLit HsLit                -- simple
+           | PmOLit Bool  (HsOverLit Id) -- overloaded
 ```


-Instead of using the more verbose type `Pat`, type `PmPat` represents only
-Guards, Term variables, Constructor patterns, Literal patterns and Negative literal patterns.
-Yet, note that `PmPat` is more liberal than `Pat`, in the sense that we allow guards
-to be interleaved with normal patterns. See next section for the translation from `Pat` to `PmPat`.
-Also, note that, since we allow arbitrary many guards to appear in the middle of the pattern vectors,
-we lift the restriction that all clauses must have the same length (Nevertheless, the number of actual
-patterns in every clause must be the same).
-
-
-Even though we could represent negative literals using guards, instead we use negative patterns (see
-Peter Sestoft's paper above), represented using `PmLitCon` (Literal Constraint). A `PmLitCon ty x lits`
-represents all literals `x` with type `ty`, except the ones that are in `lits`.
+Literal patterns (constructed using datacon `PmLit`) are not essential for the
+algorithm to work, but, as we explain below in section `Translation`, it is
+much more efficient to treat them more like constructors by matching against
+them eagerly than generating equality constraints to feed the term oracle with
+(as we do in the paper). `PmCon` contains several fields, effectively copying
+constructor `ConPatOut` of type `Pat` (defined in `hsSyn/HsPat.hs`). Since the
+algorithm runs post-typechecking, this information is crucial for the treatment
+of GADTS.

+### Value Abstractions

-Note that for now `PmGuard`s are simply represented by `CanItFail`. Yet, since we (try to) keep this type
-opaque, it can be extended in the future to support better reasoning for guards than we have now.

-## Transforming `Pat` to `PmPat`
-
-
-The main function for viewing `Pat Id` into our domain (`PmPat Id`) is
+A value abstraction looks exactly like in the paper (with the exception of
+literal patterns). It is essentially a `PmPat`, we simply tie the recursive
+knot with a newtype declaration:

 ```wiki
-mViewPat :: Pat Id -> PmM [PmPat Id]
+newtype ValAbs = VA (PmPat ValAbs) -- Value Abstractions
 ```

-
-It never generates `PmLitCon`s but may introduce guards in the middle of the
-clause in the following cases:
-
- view patterns:  `(f -> p) ==> [x , p <- f x]` (a variable and a guard)
- `n+k` patterns: `(n+k)    ==> [n', n'>=k, let n = n'-k]` (a variable and two guards)
-
-
-Overloaded lists and pattern synonyms must be handled too but for now are just
-introducing a variable and a guard that can fail (`CanFail`).
+### Patterns


-Literal patterns are desugared using functions (in `deSugar/MatchLit.lhs`):
+A pattern is either a simple pattern, or a guard pattern. Hence, it lifts the `PmPat` type:

 ```wiki
-pmTidyLitPat :: HsLit -> Pat Id
-pmTidyNPat   :: HsOverLit Id -> Maybe (SyntaxExpr Id) -> SyntaxExpr Id -> Pat Id
+data Pattern = PmGuard PatVec PmExpr      -- Guard Patterns
+             | NonGuard (PmPat Pattern)   -- Other Patterns
 ```


-The difference between these and the *proper*`tidyLitPat` and `tidyNPat` is
-that we do not wrap the literals in constructors. This way we do not expose their
-internal representation when we issue the warnings. Nevertheless, we unfold strings
-as lists of characters.
+Note that the `PmGuard` constructor takes a pattern vector and **NOT** a single
+pattern. As we explain in the extended version of the paper (Appendix A, B,
+give a link here), during source pattern translation, a source pattern may give
+rise to a list of core patterns, by the introduction of more constraints (in
+the form of guard patterns).

+### Pattern and Value Vectors

-Instead of calling `mViewPat` on every pattern, we define function
+
+A pattern vector `PatVec` and a value vector abstraction `ValVecAbs` is just a
+type synonym for a list of the respective element type:

 ```wiki
-preprocess_match :: EquationInfo -> PmM [PmPat Id]
+type PatVec    = [Pattern] -- Pattern Vectors
+type ValVecAbs = [ValAbs]  -- Value Vector Abstractions
 ```

+## Value Set Abstractions

-That
-
- Transforms all patterns using `mViewPat`
- Extracts the `CaItFail` value out of the `MatchResult`
- Puts them all together in a list of `PmPat Id`.
-
-## Main algorithm

+After `PmPat`, the most important data type of the checker is `ValSetAbs`, used to represent value set
+abstractions. In the formalisation we treat a value set abstraction as a list of value vector abstractions
+but since the vectors we generate usually have a common prefix, we save space and time by representing
+them as a prefix tree. This means that we have a choice to either add the constraint set Delta only at the
+leaves or have constraints in internal nodes as well. To make the most out of our representation we chose
+the latter. Notionally, every path from the root to a leaf represents a value vector abstraction, where the
+set Delta is the union of all constraints in the path. This means that we have the following invariant not
+represented in our type:

-Instead of keeping the eliminated vectors (as we do in the paper), we use the
-type `Forces` to keep track if a clause forces the evaluation of an argument
-or not:
+**INVARIANT VsaArity**: The number of `Cons`s in any path to a leaf is the same. We'll refer to this as the
+arity of the value set abstraction.

 ```wiki
-data Forces = Forces | DoesntForce
+data ValSetAbs
+  = Empty                               -- {}
+  | Union ValSetAbs ValSetAbs           -- S1 u S2
+  | Singleton                           -- { |- empty |> empty }
+  | Constraint [PmConstraint] ValSetAbs -- Extend Delta
+  | Cons ValAbs ValSetAbs               -- map (ucon u) vs
 ```


-The rest of the types that are heavily used in the main algorithm are the following:
+where `PmConstraint`s and `PmExpr`s (defined in `deSugar/PmExpr.hs`) take the following form:

 ```wiki
-data Delta = Delta { delta_evvars :: Bag EvVar, delta_guards :: PmGuard }
- 
-type InVec = [PmPat Id]
-type CoveredVec   = (Delta, [PmPat Id])
-type UncoveredVec = (Delta, [PmPat Id])
-
-data PmTriple = PmTriple { pmt_covered   :: [CoveredVec]
-                         , pmt_uncovered :: [UncoveredVec]
-                         , pmt_forces    :: Forces }
+data PmConstraint = TmConstraint Id PmExpr -- TermEq    : x ~ e
+                  | TyConstraint [EvVar]   -- TypeEq    : ...
+                  | BtConstraint Id        -- StrictVar : x ~ _|_
+
+data PmExpr = PmExprVar   Id
+            | PmExprCon   DataCon [PmExpr]
+            | PmExprLit   PmLit
+            | PmExprEq    PmExpr PmExpr  -- Syntactic equality
+            | PmExprOther (HsExpr Id)    -- NOTE [PmExprOther in PmExpr]
 ```

-`Delta` keeps track of the constraints that correspond to each clause, and includes
-both type constraints `EvVar` and term constraints (guards) `PmGuard`.
-`InVec` is simply a list of `PmPat`s where guards may appear anywhere in the
-vector. Both `CoveredVec` and `UncoveredVec` represent intermediate vectors generated
-by the algorithm where the vectors are guard-free and all constraints are accumulated
-in `Delta`.
-Finally, `PmTriple`, as shown in the paper, is the information we get for every clause.
-It contains information about the cases it covers, the cases that leaves uncovered and
-about forcing.

+Additionally, instead of using constructors `Union`, `Constraint` and `Cons` to
+construct value set abstractions, we use the smart constructors instead
+`mkUnion`, `mkConstraint` and `mkCons` that ensure that an empty value set
+abstraction is represented only by `Empty`. We refer to this as the
+**INVARIANT VsaInvariant**.

-The heavy worker is function `one_step` which is function `alg` from the paper:
-
-```wiki
-one_step      :: UncoveredVec   -> InVec -> PmM PmTriple
-one_full_step :: [UncoveredVec] -> InVec -> PmM PmTriple
-```

-`one_full_step` simply calls `one_step` on every uncovered vector and combines
-the results.
+Why do we need type `PmExpr`? The term oracle (`tmOracle`, defined in
+`deSugar/TmOracle.hs`) cannot handle all kinds of expressions (e.g. it doesn't
+handle function applications). Hence, there is no need to work on the extremely
+big `HsExpr` type (over 40 data constructors). The forms that are not handled
+by the term oracle are wrapped in a `PmExprOther` and ignored by the oracle
+(and the pretty-printer). Why do we keep them around then? An expression we do
+not inspect can always diverge so keeping them around (or at least and
+indication of failure) is essential for the laziness check.


-Note that `one_step` also generates type constraints while traversing the pattern
-vectors (that's the reason we have annotated every data constructor of `PmPat` with
-the respective type) and stores them in `Delta` as it goes deeper.
+Note that data constructor `PmExprEq` does not represent function `(==)` but is
+generated by our algorithm, when we KNOW that two things are equal. E.g. when a
+simple literal (say `5 :: Int`) is matched, we KNOW that `x` where
+`PmExprEq False (PmExprEq x 5)` remains uncovered. So, `PmExprEq` represents
+structural equality and NOT an `Eq` instance.

-## The `PmM` monad
+## Translation from Pat to Pattern


-All the algorithm lives inside `PmM` monad which is nothing else than `DsM` with
-a possibility of failure (`Maybe`).
+The main functions that are responsible for the translation of the type
+`(Pat Id)` to our core pattern type `Pattern` are the following:

 ```wiki
-newtype Pm gbl lcl a = PmM { unPmM :: TcRnIf gbl lcl (Maybe a) }
-type PmM a = Pm DsGblEnv DsLclEnv a
+translatePat    :: Pat Id -> UniqSM PatVec
+translatePatVec :: [Pat Id] -> UniqSM [PatVec]
+translateGuards :: [GuardStmt Id] -> UniqSM PatVec
+translateMatch  :: LMatch Id (LHsExpr Id) -> UniqSM (PatVec,[PatVec])
 ```


-The only reason we had to lift the algorithm in the `TcRnIf` is because we need
-fresh term and type variables but, apart from that, the main part of the algorithm
-is still purely syntactic.
+The worker functions are `translatePat` which translates a single pattern and
+`translateGuards` which translate a list of guards. The implementations are rather
+straightforward with ... exceptions:

-## Changes to `DsLclEnv`
+>
+> 1.

+>
+> 2.

-In order to handle correctly cases like [\#4139](https://gitlab.haskell.org//ghc/ghc/issues/4139), we have to collect constraints not
-only from the match we are currently checking but also from the context. Hence, we
-extend `DsLclEnv` with the additional field `ds_dicts :: Bag EvVar` where we accumulate
-constraints as we desugar the AST. Also, functions
-
-```wiki
-getDictsDs :: DsM (Bag EvVar)
-addDictsDs :: Bag EvVar -> DsM a -> DsM a
-```
+>
+> 3.

+**THE REST IS JUST DUMPED TEXT, I AM WORKING ON IT**

-are used for retrieving and adding more constraints in the context.

-## External interface
+For the different constructors of `PmExpr` see below (sections Translation and The Term Oracle).

-
-The main function that is exported is `checkpm`:
+## Translation

 ```wiki
-checkpm :: [Type] -> [EquationInfo] -> DsM (Maybe PmResult)
+translatePat    :: Pat Id -> UniqSM PatVec
+translatePatVec :: [Pat Id] -> UniqSM [PatVec]
+translateGuards :: [GuardStmt Id] -> UniqSM PatVec -- replace unhandled??
 ```


-Takes as arguments the types of the arguments and all matches and returns a
-`Maybe PmResult`. (Note that we expand all type synonyms). The `PmResult` type
-synonym is just:
+and the function that combines them all translate a whole match-clause (including guards):

 ```wiki
-type PmResult = ([EquationInfo], [EquationInfo], [UncoveredVec])
+translateMatch :: LMatch Id (LHsExpr Id) -> UniqSM (PatVec,[PatVec])
 ```


-That is
+Instead of using the (enormous) `HsExpr` type, we drop most syntactic sugar and have only 5 constructors:
+
+- `PmExprVar`: Variables or wildcards
+- `PmExprCon`: Constructor patterns, Records
+- `PmExprLit`: Overloaded and non-overloaded literals
+- `PmExprEq` : Syntactic equality (NOTE: Pattern matching, NOT a call to the equality function `(==)`!!)

- Redundant clauses
- Clauses with inaccessible right hand side
- Uncovered cases
+## Term Oracle
\ No newline at end of file