Distinguish "shallow" and "deep" free variables of a type
(See this comment in #17323 for the origin of this ticket.)
GHC decorates every binder (say in a forall) with its kind, of course, but it also decorates every occurrence with its kind.
Let's say that the "shallow" free variables of a type are like this:
shallowFvs (Var v) = {v}
shallowFvs (App t1 t2) = shallowFvs t1 `union` shallowFvs t2
shallowFVs (Forall a t) = shallowFvs (tyVarKind a) `union`
(shallowFvs t `deleteVar` a)
That is, we include the fvs of the kind of a binder, but not of occurrences.
Let's say that "deep" free vars of a type is the result of closing over kinds
deepFvs t = closeOverKinds (shallowFvs t)
At the moment tyCoVarsOfType
always does deep freevars. Its
implementation is not as above, but the above is a decent
specification.
We have discovered that for the substitition invariants
 see Note [The substitution invariant]
in TyCoSubst

"deep" is unnecessary. Consider this type.
forall k. forall (a:k). forall k. T (a:k)
Is this OK? Note that
 The occurrence of ‘a’ has exactly the same kind as its binding site (of course)
 But there is an intervening forall k, with the same k.
So yes, it is absolutely fine. Having kinds inside occurrences is a GHCspecific trick. It’s morally equivalent to not having kinds inside occurrences, but carrying a type environment Gamma mapping variables to their kinds. In particular, the inner forall k does not capture the ‘k’ in the kind of ‘a’. We should understand a’s kind in the context of its binding site, not its occurrence.
So "deep" is unncessary for substitution. And in the type checker, our current algorithm is wrong. Consider
T (a :: k1) (a :: k2)
where k1
zonks to k2
. This could conceivably happen during type inference, even if the type is well kinded. So then, it seems that k1 and k2 are in the "deep" free vars of this type? If so, we'd have to look in the kind of each occurrence of a
, even though we'd seen a
before.
This is terribly painful, slow, and never matters. (Conceivable exception: tidying before prettyprinting during debug tracing.)
But substitution needs only shallow free vars, which sidesteps this problem. So, when doing captureavoiding substitution, using the inscope set to avoid capture, we can use shallow free vars; i.e. ignore the kinds of occurrences.
So, in Note [The substitution invariant]
in TyCoSubst
,
we should speak only of shallow free variables.
Why does this matter?

Efficiency. In quite a few places we establish an inscope set by gathering free variables of the types involved. Shallow free variables is enough, and is less work.

During type inference cannot guarantee that every occurrence has the same kind. One occurrence might be
(a :: kappa)
while another is(a :: TYPE k)
, becausekappa
is a unification variable that has been unified withType
 but we have not yet zonked everything.Now asking for the deep free vars of such a type is problematic. Should it include both k and kappa? The current implementation assumes that once we've encountered
a
once we don't need to look its kind a second time  but here the two kinds differ!So it's probably bogus to ask for the deep free vars of a type during type inference, when different occurrences might have different kinds (albeit the same if we were to zonk). Yet that's just what we do when we are about to substitute! Richard and I encountered this bug (a substitntioninvariant ASSERT fail) when reducing the amount of zonking in the type inference engine.
Conclusion: when checking the substitution invariants, we should
consider only the “shallow” free variables of a type; that is,
ignoring the kinds of occurrences. (But including the kinds on
binders of course; see shallowFvs
above.)
Implemenattion considerations.
We don't really want to write two freevar finders. So the simple
approach is to write shallowFvs
and closeOverKinds
,and make both
efficient. The former is easy: use a an accumulating parameter
for the result, and a "exclude" set of variables not to include.
Thus:
shallowFvs ty = shallow_fvs emptyVarSet ty emptyVarSet
shallow_fvs :: VarSet  Do not include these in the result
> Type
> VarSet  Accumulator
> VarSet  Result
shallow_fvs excl (Var v) acc
 v `elemVarSet` excl = acc
 v `elemVarSet` acc = acc
 otherwise = acc `extendVarSet` v
shallow_fvs excl (App t1 t2) acc
= shallow_fvs excl t1 $
shallow_fvs excl t2 acc
shallow_fvs excl (Forall a t) acc
= shallow_fvs excl (tyVarKind a) $
shallow_fvs (excl `extendVarSet` a) t acc
Now Richard and I see no reason why closeOverKinds
should be inefficient.
Here's a possible implementation:
deepFvs :: Type > VarSet
deepFvs t = closeOverKinds (shallowFvs t)
closeOverKinds :: VarSet > VarSet
closeOverKinds vs
= go vs vs
where
go :: VarSet  Add the kinds of these
> VasSet  Accumulator
> VarSet  Result
go vs acc
 isEmptyVarsSet vs
= acc
 otherwise
= go kvs (acc `unionVarSet` kvs)
where
k v inner_acc = shallow_fvs acc (tyVarKind v) inner_acc
kvs = foldVarSet k emptyVarSet vs
And here is another one, which is simpler and maybe more efficient.
deepFvs :: Type > VarSet
deepFvs ty = deep_fvs ty emptyVarSet
deep_fvs :: Type
> VarSet  Accumulator, closed over kidns
> VarSet
deep_fvs ty acc = close_over_kinds (shallowFvs ty) acc
close_over_kinds vs acc
= foldVarSet k acc vs
where
k :: Var > VarSet > VarSet
k v vs  v `elemvVarSet` accc
= acc
 otherwise
= deep_fvs (tyVarKind v) $
acc `extendVarSet` v
Relevant variables
Not only should we distinguish shallow fvs and deep fvs, but we should also distinguish relevant fvs, which are a subset of the shallow ones.
Definition: The relevent fvs of a type are the shallow fvs that appear outside coercions.
Question: Should the relevant fvs of a type t :: k
also include the (relevant) fvs of k
? Richard thinks so. The property below holds true regardless. Should it also be closed over kinds? Richard thinks no. The property below holds true regardless. But these changes mean that the relevant variables are not a subset of the shallow variables. Richard thinks that's OK.
As #17655 explains, if t1
and t2
are eqType
, they will certainly have the same relevant fvs, but they may differ in their irrelevant fvs. This means that GHC should never accept or reject a program based on the presence of an irrelevant fv. As we go through places where fvs are collected, we should think about whether relevant fvs are sufficient. One example: in substitution, the relevant fvs are not sufficient, because we might end up capturing a covar if we look only for relevant fvs. But in the solver, we need to find a way to rely only on relevant fvs in any occurschecks.