Make the rewriter work only at top-level

In a recent conversation, @simonpj and I schemed about another way to simplify the solver.

Key idea: Make the rewriter (previously called the flattener) work only at top-level.

Right now (well, after !4149 (closed), which this ticket assumes has been merged), the rewriter does three things: 1) it zonks, 2) it applies the substitution in the inert set, and 3) it normalizes type family applications. It does all of these deeply, at any point in an input type. Yet, we never really need to deeply rewrite -- at any given point, we only need to rewrite to a certain depth, and often only at top level. By making the rewriter work shallowly instead of deeply, we should be able to reduce compile time in the type checker significantly.

Concretely, I propose that the main entry point to the rewriter look like rewrite :: TcType -> TcS (Maybe (TcType, TcCoercion)), where the rewriter returns Nothing if there is nothing to rewrite (much like e.g. tcView). There will inevitably be extra details in the type (and other entry points), but this type is a good guiding principle.

To understand this better, let's look at the different places where the rewriter is invoked:

When canonicalizing class constraints (canClass): This is probably the trickiest case. Let's return to it below.
When canonicalizing irreducible constraints (canIrred): These are constraints like c ty or F ty1 ty2. Here, we really only need to reduce at top-level. Once we do so (if we can do so successfully), then we'll know whether the constraint is really a class constraint, or an equality, or some other predicate. Then, we proceed with that new predicate. Note that, today, we rewrite the irreducible only to call e.g. canClassNC or canEqNC`, which rewrites it again, utterly redundantly.
When canonicalizing quantified constraints (canForall): At the time of writing, I don't see why we invoke the rewriter here. Seems quite fishy, given that we always call solveForAll next.
When no common structure is found while canonicalizing an equality (can_eq_nc'): We really only need to work at top-level here. Once we reveal top-level similarity between two types, then we'll decompose. Any deeper rewriting will happen when those decomposed equalities are processed.
When canonicalizing an equality fails (canEqFailure and canEqHardFailure): This might actually be an exception; we might need to go deeply here. It would not be hard to write a function that uses the shallow rewriter to do a deep rewriting. Alternatively, we could just imagine rewriting (deeply) with respect to Givens only when printing errors. Note that canEqFailure might process an equality with some hope (like a b ~R Maybe Int); we'll have to make sure that the type is rewritten enough to get kicked out appropriately. The solution here might be to have a special form for a representational equality stuck on decomposing an AppTy.
When normalizing a type during pattern-match checking (tcNormalise): This may also need to be deep. I don't know enough about the pattern-match checker to have an opinion here.
When simplifying a hole or wildcard (simplifyHoles): This may also need to be deep, to get good error messages.

It's clear from this review that we'll need to have some places with deep rewriting, but as observed above, that's easy to add. The important part is that deep rewriting will be rare, and will not be needed as part of any hot path in the solver.

Let's return to rewriting class constraints. Here are some examples:

instance C (Maybe Bool)
instance C [a]

[W] w1 :: C alpha
[W] w2 :: C beta

alpha := Maybe gamma
gamma := Bool

beta := [delta]
delta := Maybe epsilon
epsilon := F Bool
type instance F Bool = Int

When we're trying to solve w1, we need to rewrite alpha to Maybe gamma. (Note that we stop there -- we rewrite only at top-level now!) Then we need to rewrite gamma to get Bool. We can then use the instance.

When we're trying to solve w2, we need to rewrite beta to [delta]. But then we do not need to go further; the instance already applies.

Generalizing: before rewriting any class constraint, we try to use any instances and Givens that are in scope. This is done by lookupInstEnv, which returns a list of matchers and a list of unifiers. If there is 1 matcher and no unifiers, we have a match (and no potential others) and should succeed. We might achieve this without rewriting at all! If we have multiple matchers, fail: learning more about the type won't help things. But if we have 0 matchers and some unifiers, then we have reason to rewrite: we want to learn more about the type to change the unifiers into matchers. Crucially, we can (easily) augment lookupInstEnv to return the unifying substitutions (one per unifying instance). These unifying substitutions will have a domain of variables that need to be refined in order for the associated instance to become a matcher. Now, we simply have to rewrite each variable in the union of the domains of the substitutions. This will build up a rewriting substitution. (Examples below.) Apply the substitution (if it's non-empty) and try again. (We could be more clever, I think, and just compare the result of rewriting against what appears in the ranges of the unifying substitutions, but that may be an annoying, not-quite-worth-it optimization.)

Examples (continued from above):

Target: C alpha Result: no matchers, two unifiers: {alpha |-> Maybe Bool}, {alpha |-> [a]}. We must rewrite alpha, giving us {alpha |-> Maybe gamma}. Apply that substitution to C alpha and try again.

Target: C (Maybe gamma) Result: no matchers, one unifier: {gamma |-> Bool}. We must rewrite gamma, giving us {gamma |-> Bool}. Apply that substitution to C (Maybe gamma) and try again. (The hard optimization would be noticing that the range of the unifying substitution matches the result of rewriting, trivially in this case. This might short-circuit the need to do the general lookup.)

Target: C (Maybe Bool) Result: one matcher, no unifiers. Success.

Target: C beta Result: no matchers, two unifiers: {beta |-> Maybe Bool}, {beta |-> [a]}. We must rewrite beta, giving us {beta |-> [delta]}. Apply that substitution to C beta and try again.

Target: C [delta] Result one matcher, no unifiers. Success. NB: No further rewriting.

instance D a Bool
instance D Int b

Target: D alpha beta Result: no matchers, two unifiers: {alpha |-> Int}, {beta |-> Bool}. We must rewrite both alpha and beta (the union of the domains). Note that rewriting only one will never get us a unique solution.

A proof that this is correct would be nice.

What about type families? They need to be subject to the same scheme as type variables. Therefore, we would need the pure unifier to work in terms of generalized substitutions, whose domains can include both type variables (as now) and exactly-saturated type family applications (this is new). Once we define such beasts (and e.g. functions to apply them), then the type family case is no different than the type variable case.

Note that this scheme does not require ever rewriting deeply. The rewriter will be called with domain elements from the generalized substitutions (henceforth, gsubsts), and then the results of rewriting will be put into a gsubst, which is then applied. Indeed, this in itself might be a very nice optimization, because it means that we rewrite e.g. F Int Bool only once, not at each occurrence. There is repeated effort at analyzing and re-analyzing the top-level structure of the target. If that becomes a problem, then we can just be cleverer about reusing the information in the ranges of the returned gsubsts from unification; I claim such an optimization is possible, though I don't wish to work out the details right now.

The recursive use of rewriting when normalizing type family applications would have to use a similar scheme, so that it only rewrites as much as necessary to get a type family to reduce.

Some salutary effects of making the rewriter work only at top-level:

A. can_eq_nc' does not need to track whether types have been rewritten or not. Right at the start, we would see if either type involved might make any progress at all when rewritten (viz, is either a type variable or type family application) and then proceed accordingly.

B. The unifier would now work with gsubsts, not ordinary substs. This means that the entire "core flattening" algorithm would no longer be needed. Hooray!

C. We can get rid of CycleBreakerTv (and its 200-line Note!), which exists only to prevent overly-eager deep rewriting.

D. Kicking out becomes much simpler: we kick out precisely when the rewriter makes a difference on a type. If we have an equality a ~ Maybe b, learning more about b cannot possibly help us. I believe that learning more about b would kick that constraint out today. Thus: less kicking out, and less time spent solving. I believe the kick-out criteria would also become dead simple, unlike its current hard-won state.

E. anyRewritableTyVar would look only at top-level. (CDictCan constraints would want to record information about what type variables / type family applications are holding up getting solved; these would be their kick-out criteria.)

F. Because rewriting isn't recursive, we would no longer need an occurs-check before adding a constraint to the inert set. That is, [G] a ~ Maybe a wouldn't cause the solver to loop.

G. We still would need an occurs-check before unifying a meta-variable. This is easy, though: just zonk the RHS and perform the occurs check. This nicely separates out the two different concerns around looping in the solver and infinite structures.

H. This makes type family reduction act properly lazily, not the arbitrary-reduction plan we have right now.

I think this can be implemented in nicely separable pieces.

1. Rewrite the core unifier to use gsubsts, deleting the core flattener. (At this point, fix #19107 (closed).)
2. Make the rewriter top-level, but provide a deep rewriter that runs the top-level one recursively. Use the top-level one in can_eq_nc' and canIrred. Kill CycleBreakerTvs.
3. Implement the fancy scheme for rewriting class constraints.

Each of these should be wins independently, but work very nicely together.

Edited Jan 06, 2021 by Richard Eisenberg

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information