Distinguishing Type and Term variables by using different data types to represent them
Making Var, Id, TyVar, TyCoVar, TcTyVar and friends different types
I think we should restructure the Var and its synonyms TyVar, Id, TcTyVar, and TyCoVar into multiple separate datatypes, to enforce their correct usage through the typesystem (and clean up a lot of assertions and lurking bugs).
The MR addressing this ticket should make all the above types distinct.
I started doing this over the weekend in !10544 (closed) and its turning out surprisingly well.
One could probably tackle #16982 simultaneously.
Pros
- Compile time guarantees about correct usage of e.g.
TyVarvsIdvsTyCoVar - No longer need to panic and rely on assertions everywhere when the argument despite being e.g. an
Idreally being something else - Easier to reason about
Cons
Highlighted by @AndreasK:
-
This is likely to break every GHC API program and plugin in existence.
- romes: I think this can be mitigated with pattern synonyms and the
VarLikeclass (see Design section)
- romes: I think this can be mitigated with pattern synonyms and the
-
type TyCoVar = Either TyVar CoVarseems like a performance trap. -
In terms of performance impact the new definition for Var introduces a indirection without any benefit. You should probably unbox the fields to avoid this. But if we do this there is no runtime benefit to making the distinguish via different data types so it might just be easier to introduce newtypes to distinguish value and type ids.
Design
Specifically, I propose (in !10544 (closed))
data Var
= TV TyVar
| TTV TcTyVar
| I Id
-- | Type or kind Variable
data TyVar = ...
-- | Type variable that might be a metavariable
data TcTyVar = ...
-- | Identifier
--
-- A term level identifier.
data Id = ...
-- | Coercion Variable
type CoVar = Id
-- | Type or Coercion Variable
type TyCoVar = Either TyVar CoVar
I further suggest having a typeclass VarLike with functions varType, realUnique and varName to replace the field selectors of the same name we used to have.
It is possibly worth considering having the Id constructor retain its meaning (that is, it constructs a Var rather than an Id, and the Id constructor is named something else). We could do this with a pattern synonym.
Newtype based alternative
We could express the same idea using newtypes in a manner like this.
data Var
= TyVar' { ... }
| TcTyVar' { ... }
| Id' { ... }
newtype Id = Id Var
newtype TyVar = TyVar Var
newtype TcTyVar = TcTyVar Var
- This would provide the same level of type safety to developers as separate data types.
- It would make it simpler to update existing code to this scheme compared to using data types.
- It would sidestep the need for a
VarLiketypeclass as functions operation on the general notion of a variable could still be written over theVartype just as we do now. - There is precedent for this with
NonVoidwhich serves a similar purpose and has worked very well in my opinion. - This change (in absence of compiler bugs) should have no affect on compiler performance at all.
- There might be some cases where specializing a function to a particular kind of
Varwould be beneficial, which is harder to achieve using this approach.