Exhaustiveness/Redundancy Check Implementation
This page describes how the exhaustiveness and redundancy cheker is implemented in GHC. If you are looking for a general description of the problem and an overview of our approach see PatternMatchCheck.
The most relevant modules are:

deSugar/PmExpr.hs
: Expressions (PmExpr
,PmLit
), term constraints (SimpleEq
,ComplexEq
) and utilities for the check and the term oracle. 
deSugar/TmOracle.hs
: The term oracle and utilities. Three (incremental) interfaces to the oracle are exported:tmOracle
,solveOneEq
andextendSubst
. 
deSugar/Check.hs
: The main algorithm. Main exports are functionscheckSingle
(check a single binding  let) andcheckMatches
(check aMatchGroup
 case expressions and functions).
We discuss each one separately below.
PmExpr
Datatype and Friends
The The module exports the following data types:
data PmExpr = PmExprVar Name
 PmExprCon DataCon [PmExpr]
 PmExprLit PmLit
 PmExprEq PmExpr PmExpr  Syntactic equality
 PmExprOther (HsExpr Id)  Note [PmExprOther in PmExpr]
data PmLit = PmSLit HsLit  simple
 PmOLit Bool { is it negated? } (HsOverLit Id)  overloaded
type SimpleEq = (Id, PmExpr)
type ComplexEq = (PmExpr, PmExpr)
Type PmExpr
represents Haskell expressions. Since the term oracle is minimal at the moment, only specific forms are supported
and the rest are wrapped in a PmExprOther
. Note that PmExprEq
is not function (==)
but represents structural equality and
is used for nonoverloaded literals. E.g. the constraint *x is not equal to 5* is represented as (False ~ (x ~ 5)):
PmExprEq (PmExprCon <False> []) (PmExprEq (PmExprVar <x>) (PmExprLit (PmSLit <5>)))
Type PmLit
represents literals (both overloaded and nonoverloaded). Literal
equality is defined with signature:
eqPmLit :: PmLit > PmLit > Bool
Strictly speaking, the result type should be Maybe Bool
, since if two overloaded literals look different it does not mean they actually are. Hence
equality in this case is inconclusive, unless we unfold the from
function (like fromInteger
). E.g. syntactic equality 5 ~ 6 is
not necessarily inconsistent, since it actually represents fromInteger 5 ~ fromInteger 6
which depends on the implementation of
function fromInteger
. On the other hand, fromInteger 5 ~ fromInteger 5
is always true, since fromInteger
is a pure function.
We could have treated overloaded literals as function applications (e.g. fromInteger 5
) but since the term oracle cannot handle
function applications, we would get poor error messages for overloaded literals. Instead, we took this *syntactic* approach (which
looks a bit hackish though).
Nevertheless, we close our eyes to the truth above and simply return a Bool
. This way, the warnings are
more comprehensible.
Term equalities take two forms: SimpleEq
and ComplexEq
. In the paper we generate only simple equalities,
yet this is not so performant. In many cases we would generate { x ~ e1, x ~ e2 }
(for a fresh variable x
)
instead of the much more direct e1 ~ e2
. Hence, the current implementation generates ComplexEq
s. SimpleEq
s
are still used for nested pattern matching (see below).
The Term Oracle
As explained in the paper, the algorithm depends on external solvers for checking the satisfiability of term and type constraints. For checking type constraints we simply call GHC's existing inference engine (OutsideIn(X)), using a simple interface:
tyOracle :: Bag EvVar > PmM Bool
For the satisfiability of term constraints, we have implemented a simple oracle
(solver). The term oracle lives in deSugar/TmOracle.hs
and has the following
signature(s):
solveOneEq :: TmState > ComplexEq > Maybe TmState
tmOracle :: TmState > [ComplexEq] > Maybe TmState
extendSubst :: Id > PmExpr > TmState > TmState
Function solveOneEq
as the name suggests solves a single term constraint (which is the most common case)
while tmOracle
takes a list of them. They are both designed to be called incrementally, hence they both take
a term oracle state as argument and return the resulting one (if consistent) as well. Function extendSubst
is
used when we generate a fresh variable (e.g. when checking against guards), where no actual solving takes place.
It is much more efficient than solveOneEq
, but is unsafe, since it does not check the invariant, that the
variable does not already exist in the state.
The state of the oracle
TmState
is represented as follows:
type PmVarEnv = Map.Map Name PmExpr
type TmOracleEnv = (Bool, PmVarEnv)
type TmState = ([ComplexEq], TmOracleEnv)
TmState
contains the following components:
 A list of complex constraints that cannot be solved until we have more information,
 A boolean (
True
iff at least one unhandled constraint has appeared),  A substitution from variables to
PmExpr
s (actual result of solving).
By unhandled we mean constraints that involve an PmExprOther
, for which we know that the oracle
cannot do anything. We could store these constraints instead of keeping just a boolean (would be better
if we were to print them for more precise warnings). This is not done though, so, for efficiency, we
just keep a boolean. But why store something in the first place? For the laziness check, we need to know
if a clause forces any of the arguments. Since we want to play on the safe side, a constraint we cannot
inspect can fail for all we know, so this boolean is used for the computation of the divergent set.
Stategy of the Oracle
The strategy of the term oracle is quite simplistic and proceeds as follows (for a single SimpleEq
):
 Apply existing substitution to the equality using
applySubstComplexEq :: PmVarEnv > ComplexEq > ComplexEq
.  Try to simplify the constraint using
simplifyComplexEq :: ComplexEq > ComplexEq
.  Try to solve the simplified complex equality using
solveComplexEq :: TmState > ComplexEq > Maybe TmState
. If the existing substitution is extended during solving, we apply the extended substitution to the constraints that could not progress before and recurse.
Function solveOneEq
implements the above:
solveOneEq :: TmState > ComplexEq > Maybe TmState
solveOneEq solver_env@(_,(_,env)) complex
= solveComplexEq solver_env
$ simplifyComplexEq
$ applySubstComplexEq env complex
tmOracle :: TmState > [ComplexEq] > Maybe TmState
tmOracle tm_state eqs = foldlM solveOneEq tm_state eqs
Note: We have not implemented an OccursCheck, since it is not needed at the moment. If extended though, the oracle will need an occurs check to ensure termination (e.g. if function applications are unfolded by the oracle).
The Actual Check
The check itself is implemented in module deSugar/Check.hs
. Exported functions:
 Issue warnings
dsPmWarn :: DynFlags > DsMatchContext > DsM PmResult > DsM ()
 Compute the covered/uncovered/divergent sets and
 issue the warnings if the respective flags are enabled
checkSingle :: DynFlags > DsMatchContext > Id > Pat Id > DsM ()  let
checkMatches :: DynFlags > DsMatchContext > [Id] > [LMatch Id (LHsExpr Id)] > DsM ()  case/functions
 Nested pattern matching
genCaseTmCs1 :: Maybe (LHsExpr Id) > [Id] > Bag SimpleEq
genCaseTmCs2 :: Maybe (LHsExpr Id)  Scrutinee
> [Pat Id]  LHS (should have length 1)
> [Id]  MatchVars (should have length 1)
> DsM (Bag SimpleEq)

checkSingle
performs the check on a single pattern (let bindings) 
checkMatches
performs the check on a list of matches (case expressions / function bindings) 
genCaseTmCs1
generates term constraints for nested pattern matching (see below) 
genCaseTmCs2
generates term constraints for nested pattern matching (see below)
Before explaining what each function does, we first discuss the types used.
PmPat
datatype and friends
The The PmPat
data type is defined in deSugar/Check.hs
as:
data PatTy = PAT  VA  Used only as a kind, to index PmPat
data PmPat :: PatTy > * where
PmCon :: { pm_con_con :: DataCon
, pm_con_arg_tys :: [Type]
, pm_con_tvs :: [TyVar]
, pm_con_dicts :: [EvVar]
, pm_con_args :: [PmPat t] } > PmPat t
 For PmCon arguments' meaning see @ConPatOut@ in GHC/Hs/Pat.hs
PmVar :: { pm_var_id :: Id } > PmPat t
PmLit :: { pm_lit_lit :: PmLit } > PmPat t  See Note [Literals in PmPat]
PmNLit :: { pm_lit_id :: Id
, pm_lit_not :: [PmLit] } > PmPat VA
PmGrd :: { pm_grd_pv :: PatVec
, pm_grd_expr :: PmExpr } > PmPat PAT
Literal patterns are not essential for the algorithm to work, but it is
more efficient to treat them more like constructors by matching against them eagerly than
generating equality constraints to feed the term oracle with (as we do in the paper). Additionally,
we use negative patterns (constructor PmNLit
), for efficiency also. PmCon
contains several
fields, effectively copying constructor ConPatOut
of type Pat
(defined in GHC/Hs/Pat.hs
).
Since the algorithm runs posttypechecking, we have a lot of information available for the
treatment of GADTs.
Value Abstractions & Patterns
Apart from the literal/negative literal patterns, both value abstractions and patterns look exactly
like in the paper, and are both implemented by the PmPat
type:
type Pattern = PmPat PAT  ^ Patterns
type ValAbs = PmPat VA  ^ Value Abstractions
Similarly, a pattern vector is just a list of patterns, while a value vector abstraction looks
like the paper, that is, it is a list of value abstractions, accompanied by a set of constraints Delta
:
type PatVec = [Pattern]  ^ Pattern Vectors
data ValVec = ValVec [ValAbs] Delta  ^ Value Vector Abstractions
data Delta = MkDelta { delta_ty_cs :: Bag EvVar
, delta_tm_cs :: TmState }
A big difference from the theory is that instead of storing term constraints, we store in Delta
the whole
term oracle state, that is, a much more efficient representation of the term constraints. This allows us to
have significantly better performance, especially since we have an incremental interface for the term oracle.
Value Set Abstractions (Covered, Uncovered and Divergent Sets)
Value set abstractions are just represented as a list of value vector abstractions:
type ValSetAbs = [ValVec]  ^ Value Set Abstractions
type Uncovered = ValSetAbs
NOTE: We have attempted to use a nicer representation of value set abstractions as prefix trees but we faced some performance issues. Hence, we created the task #11528 (closed) to further investigate whether such an approach is achievable.
Note that the only set we store is the uncovered set: For both the divergent and the covered set, we are only interested in whether they are empty or not. Hence, instead of creating the sets and checking whether they are empty or not afterwards, we check on the fly and use just a boolean for each.
General Strategy
The algorithm process as follows:

Translate a clause (which contains
Pat
s, defined inGHC/Hs/Pat.hs
) toPatVec
s (translatePat
is the workhorse):translatePat :: FamInstEnvs > Pat Id > PmM PatVec translateGuards :: FamInstEnvs > [GuardStmt Id] > PmM PatVec translateMatch :: FamInstEnvs > LMatch Id (LHsExpr Id) > PmM (PatVec,[PatVec])

Call function
pmcheck
on the transformed clause:pmcheck :: PatVec > [PatVec] > ValVec > PmM Triple pmcheckHd :: Pattern > PatVec > [PatVec] > ValAbs > ValVec > PmM Triple pmcheckGuards :: [PatVec] > ValVec > PmM Triple type Triple = (Bool, Uncovered, Bool)
Note that a processed clause has type (PatVec,[PatVec])
. Why? Because each clause may be accompanied by a list
of guardvectors. The formalization indicates that patVectProc
gets a clause but in order to achieve this in
the presence of more than one guardvector, we should duplicate the pattern vector (and possibly alpharename
it). That is:
(vec, [gv1, gv2, .., gvN]) == [vec ++ gv1
, vec ++ gv2
...
, vec ++ gvN
Since this is really bad for performance (significantly increases the number of clauses), we took a different
approach: Process the guards separately using pmcheckGuards
and plug the results at the end of the respective
clause. Yet, this comes with a price: We are not able to issue warnings perguard but only perclause. This was
the case already in GHC though (see ticket #8494 (closed)).
All three pmcheck
functions compute (as the signatures suggest) coverage, exhaustiveness and divergence at
once. pmcheck
is the main interface that takes a pattern vector, the guard vectors and the uncovered set and
returns all three results. It basically handles the guard case (see paper) and if both the value vector
abstraction and the pattern vector are in headtail form, calls pmcheckHd
to handle the rest of the cases.
When both the value vector abstraction and the pattern vector are empty, pmcheckGuards
is called, to handle
the guards.
Since GHC is a production compiler and the uncovered set can grow exponentially, the recursive calls had to be
bounded. Hence, three variants are used in the recursive positions: pmcheckI
, pmcheckHdI
and pmcheckGuardsI
(I
comes from "increment"). All three variants simply call the respective function, after increasing a counter
of the iterations elapsed. For example, pmcheckI
is defined as follows:
pmcheckI :: PatVec > [PatVec] > ValVec > PmM Triple
pmcheckI ps guards vva = incrCheckPmIterDs >> pmcheck ps guards vva
By default, the number of iterations for bailing is 10000000 (none of the tests we have tried seems to exceed that
yet) but it can be set with: fmaxpmcheckiterations=n
. Function warnPmIters
in deSugar/Check.hs
issues a
warning if the limit is exceeded.
Nested Pattern Matching
Since we want to handle nested pattern matching too, two more functions are exported by deSugar/Check.hs
which are used to generate constraints for case expressions:
genCaseTmCs1 :: Maybe (LHsExpr Id) > [Id] > Bag SimpleEq
genCaseTmCs2 :: Maybe (LHsExpr Id)  Scrutinee
> [Pat Id]  LHS (should have length 1)
> [Id]  MatchVars (should have length 1)
> DsM (Bag SimpleEq)

Function
genCaseTmCs1
takes a scrutinee of a case expressione
and a variablex
and generates the term equalityx ~ e
. Variablex
is the one used in the initially uncovered set so the generated constraint records the equality of the scrutinee with all clauses. 
Function
genCaseTmCs2
takes the scrutineee
, an LHSp
, and the initially uncovered variablex
and generates the set of constraints{ x ~ e, x ~ to_expr(p) }
. How is this useful? Consider the following example:f :: Bool > () f y = case y of x@True >  genCaseTmCs2 generates: y ~ True, y ~ x case x of True > () False > ()  genCaseTmCs1 generates: x ~ False (*) False > ()
This way we can detect that (*) is redundant. Note that the rightthingtodo would be to compute the whole
covered set for clause x@True
, and for every value vector abstraction in the set check the nested pattern
matching. Yet, this is too expensive so this hack saves us the trouble and quite cheaply.
Other Approximations (and hacks?)
The algorithm described in the paper is too expressive (especially when it comes to guards) so implementing it in GHC required several approximations and simplifications:
 Function
translateGuards
drops unhandled guards. That is, if there are guards involvingPmExprOther
that will make the algorithm branch but the oracle will not be able to reason about, it replaces them with a single mayfail guard. This way, we record the possibility of failure, without generating unreasonably big value set abstractions. It is a little troubling because we reject seemingly obvious cases like #15753 (closed), see also !699 (comment 191679).  Aspatterns are translated in reverse: An aspattern
x@p
should be formally translated to a variable patternx
and a guard patternx < translate p
. This has a big disadvantage though: pattern matching is transferred to the guard pattern so resolution is postponed until the term oracle. Consider the following example:
f :: Bool > ()
f x@True = ()
f y@False = ()
The uncovered set with the above translation is { x > {x ~ False, x ~ y, y ~ True} }
. Obviously this is
empty but we should call the term oracle to detect the inconsistency and deduce that the pattern match is
exhaustive. Instead, we translate aspatterns x@p
as follows: p (x < coercePmPat p)
. Function
coercePmPat
drops all the guard patterns from p
so it can be transformed to an expression. Even though
this translation is a bit unintuitive, it is efficient and equivalent to the formal one:
 Pattern
p
contains all the guards so no information is lost  Guard pattern
(x < coercePmPat p)
is bidirectional in the sense that an equalityx ~ coercePmPat p
will ultimately be generated, so no information is lost.
In summary, the following will be generated:
f :: Bool > ()
f True (x < True) = ()  uncovered: { False > { x ~ True } }
f False (y < False) = ()  uncovered: {}
 In general, when a guard is not going to contribute, during translation we transform it to a
can fail
pattern:
  A fake guard pattern (True < _) used to represent cases we cannot handle
fake_pat :: Pattern
fake_pat = PmGrd { pm_grd_pv = [truePattern]
, pm_grd_expr = PmExprOther EWildPat }
Later when we check (with pmcheck
), in case we reach a guard, we shortcircuit if it is a "fake" one.
This trick saves a lot of effort in practice (in terms of both memory consumption and performance):
pmcheck (p@(PmGrd pv e) : ps) guards vva@(ValVec vas delta)
 isFakeGuard pv e = forces . mkCons vva <$> pmcheckI ps guards vva
 otherwise = ...