Make the rewriter work only at toplevel
In a recent conversation, @simonpj and I schemed about another way to simplify the solver.
Key idea: Make the rewriter (previously called the flattener) work only at toplevel.
Right now (well, after !4149, which this ticket assumes has been merged), the rewriter does three things: 1) it zonks, 2) it applies the substitution in the inert set, and 3) it normalizes type family applications. It does all of these deeply, at any point in an input type. Yet, we never really need to deeply rewrite  at any given point, we only need to rewrite to a certain depth, and often only at top level. By making the rewriter work shallowly instead of deeply, we should be able to reduce compile time in the type checker significantly.
Concretely, I propose that the main entry point to the rewriter look like rewrite :: TcType > TcS (Maybe (TcType, TcCoercion))
, where the rewriter returns Nothing
if there is nothing to rewrite (much like e.g. tcView
). There will inevitably be extra details in the type (and other entry points), but this type is a good guiding principle.
To understand this better, let's look at the different places where the rewriter is invoked:

When canonicalizing class constraints (
canClass
): This is probably the trickiest case. Let's return to it below. 
When canonicalizing irreducible constraints (
canIrred
): These are constraints likec ty
orF ty1 ty2
. Here, we really only need to reduce at toplevel. Once we do so (if we can do so successfully), then we'll know whether the constraint is really a class constraint, or an equality, or some other predicate. Then, we proceed with that new predicate. Note that, today, we rewrite the irreducible only to call e.g.canClassNC
or canEqNC`, which rewrites it again, utterly redundantly. 
When canonicalizing quantified constraints (
canForall
): At the time of writing, I don't see why we invoke the rewriter here. Seems quite fishy, given that we always callsolveForAll
next. 
When no common structure is found while canonicalizing an equality (
can_eq_nc'
): We really only need to work at toplevel here. Once we reveal toplevel similarity between two types, then we'll decompose. Any deeper rewriting will happen when those decomposed equalities are processed. 
When canonicalizing an equality fails (
canEqFailure
andcanEqHardFailure
): This might actually be an exception; we might need to go deeply here. It would not be hard to write a function that uses the shallow rewriter to do a deep rewriting. Alternatively, we could just imagine rewriting (deeply) with respect to Givens only when printing errors. Note thatcanEqFailure
might process an equality with some hope (likea b ~R Maybe Int
); we'll have to make sure that the type is rewritten enough to get kicked out appropriately. The solution here might be to have a special form for a representational equality stuck on decomposing anAppTy
. 
When normalizing a type during patternmatch checking (
tcNormalise
): This may also need to be deep. I don't know enough about the patternmatch checker to have an opinion here. 
When simplifying a hole or wildcard (
simplifyHoles
): This may also need to be deep, to get good error messages.
It's clear from this review that we'll need to have some places with deep rewriting, but as observed above, that's easy to add. The important part is that deep rewriting will be rare, and will not be needed as part of any hot path in the solver.
Let's return to rewriting class constraints. Here are some examples:
instance C (Maybe Bool)
instance C [a]
[W] w1 :: C alpha
[W] w2 :: C beta
alpha := Maybe gamma
gamma := Bool
beta := [delta]
delta := Maybe epsilon
epsilon := F Bool
type instance F Bool = Int
When we're trying to solve w1
, we need to rewrite alpha
to Maybe gamma
. (Note that we stop there  we rewrite only at toplevel now!) Then we need to rewrite gamma
to get Bool
. We can then use the instance.
When we're trying to solve w2
, we need to rewrite beta
to [delta]
. But then we do not need to go further; the instance already applies.
Generalizing: before rewriting any class constraint, we try to use any instances and Givens that are in scope. This is done by lookupInstEnv
, which returns a list of matchers and a list of unifiers. If there is 1 matcher and no unifiers, we have a match (and no potential others) and should succeed. We might achieve this without rewriting at all! If we have multiple matchers, fail: learning more about the type won't help things. But if we have 0 matchers and some unifiers, then we have reason to rewrite: we want to learn more about the type to change the unifiers into matchers. Crucially, we can (easily) augment lookupInstEnv
to return the unifying substitutions (one per unifying instance). These unifying substitutions will have a domain of variables that need to be refined in order for the associated instance to become a matcher. Now, we simply have to rewrite each variable in the union of the domains of the substitutions. This will build up a rewriting substitution. (Examples below.) Apply the substitution (if it's nonempty) and try again. (We could be more clever, I think, and just compare the result of rewriting against what appears in the ranges of the unifying substitutions, but that may be an annoying, notquiteworthit optimization.)
Examples (continued from above):
Target: C alpha
Result: no matchers, two unifiers: {alpha > Maybe Bool}
, {alpha > [a]}
. We must rewrite alpha
, giving us {alpha > Maybe gamma}
. Apply that substitution to C alpha
and try again.
Target: C (Maybe gamma)
Result: no matchers, one unifier: {gamma > Bool}
. We must rewrite gamma
, giving us {gamma > Bool}
. Apply that substitution to C (Maybe gamma)
and try again. (The hard optimization would be noticing that the range of the unifying substitution matches the result of rewriting, trivially in this case. This might shortcircuit the need to do the general lookup.)
Target: C (Maybe Bool)
Result: one matcher, no unifiers. Success.
Target: C beta
Result: no matchers, two unifiers: {beta > Maybe Bool}
, {beta > [a]}
. We must rewrite beta
, giving us {beta > [delta]}
. Apply that substitution to C beta
and try again.
Target: C [delta]
Result one matcher, no unifiers. Success. NB: No further rewriting.
instance D a Bool
instance D Int b
Target: D alpha beta
Result: no matchers, two unifiers: {alpha > Int}
, {beta > Bool}
. We must rewrite both alpha
and beta
(the union of the domains). Note that rewriting only one will never get us a unique solution.
A proof that this is correct would be nice.
What about type families? They need to be subject to the same scheme as type variables. Therefore, we would need the pure unifier to work in terms of generalized substitutions, whose domains can include both type variables (as now) and exactlysaturated type family applications (this is new). Once we define such beasts (and e.g. functions to apply them), then the type family case is no different than the type variable case.
Note that this scheme does not require ever rewriting deeply. The rewriter will be called with domain elements from the generalized substitutions (henceforth, gsubsts), and then the results of rewriting will be put into a gsubst, which is then applied. Indeed, this in itself might be a very nice optimization, because it means that we rewrite e.g. F Int Bool
only once, not at each occurrence. There is repeated effort at analyzing and reanalyzing the toplevel structure of the target. If that becomes a problem, then we can just be cleverer about reusing the information in the ranges of the returned gsubsts from unification; I claim such an optimization is possible, though I don't wish to work out the details right now.
The recursive use of rewriting when normalizing type family applications would have to use a similar scheme, so that it only rewrites as much as necessary to get a type family to reduce.
Some salutary effects of making the rewriter work only at toplevel:
A. can_eq_nc'
does not need to track whether types have been rewritten or not. Right at the start, we would see if either type involved might make any progress at all when rewritten (viz, is either a type variable or type family application) and then proceed accordingly.
B. The unifier would now work with gsubsts, not ordinary substs. This means that the entire "core flattening" algorithm would no longer be needed. Hooray!
C. We can get rid of CycleBreakerTv
(and its 200line Note!), which exists only to prevent overlyeager deep rewriting.
D. Kicking out becomes much simpler: we kick out precisely when the rewriter makes a difference on a type. If we have an equality a ~ Maybe b
, learning more about b
cannot possibly help us. I believe that learning more about b
would kick that constraint out today. Thus: less kicking out, and less time spent solving. I believe the kickout criteria would also become dead simple, unlike its current hardwon state.
E. anyRewritableTyVar
would look only at toplevel. (CDictCan
constraints would want to record information about what type variables / type family applications are holding up getting solved; these would be their kickout criteria.)
F. Because rewriting isn't recursive, we would no longer need an occurscheck before adding a constraint to the inert set. That is, [G] a ~ Maybe a
wouldn't cause the solver to loop.
G. We still would need an occurscheck before unifying a metavariable. This is easy, though: just zonk the RHS and perform the occurs check. This nicely separates out the two different concerns around looping in the solver and infinite structures.
H. This makes type family reduction act properly lazily, not the arbitraryreduction plan we have right now.
I think this can be implemented in nicely separable pieces.

1. Rewrite the core unifier to use gsubsts, deleting the core flattener.

2. Make the rewriter toplevel, but provide a deep rewriter that runs the toplevel one recursively. Use the toplevel one in
can_eq_nc'
andcanIrred
. KillCycleBreakerTv
s. 
3. Implement the fancy scheme for rewriting class constraints.
Each of these should be wins independently, but work very nicely together.