Distinguishing Type and Term variables by using different data types to represent them
Making Var, Id, TyVar, TyCoVar, TcTyVar and friends different types
I think we should restructure the Var
and its synonyms TyVar
, Id
, TcTyVar
, and TyCoVar
into multiple separate datatypes, to enforce their correct usage through the typesystem (and clean up a lot of assertions and lurking bugs).
The MR addressing this ticket should make all the above types distinct.
I started doing this over the weekend in !10544 and its turning out surprisingly well.
One could probably tackle #16982 simultaneously.
Pros
- Compile time guarantees about correct usage of e.g.
TyVar
vsId
vsTyCoVar
- No longer need to panic and rely on assertions everywhere when the argument despite being e.g. an
Id
really being something else - Easier to reason about
Cons
Highlighted by @AndreasK:
-
This is likely to break every GHC API program and plugin in existence.
- romes: I think this can be mitigated with pattern synonyms and the
VarLike
class (see Design section)
- romes: I think this can be mitigated with pattern synonyms and the
-
type TyCoVar = Either TyVar CoVar
seems like a performance trap. -
In terms of performance impact the new definition for Var introduces a indirection without any benefit. You should probably unbox the fields to avoid this. But if we do this there is no runtime benefit to making the distinguish via different data types so it might just be easier to introduce newtypes to distinguish value and type ids.
Design
Specifically, I propose (in !10544)
data Var
= TV TyVar
| TTV TcTyVar
| I Id
-- | Type or kind Variable
data TyVar = ...
-- | Type variable that might be a metavariable
data TcTyVar = ...
-- | Identifier
--
-- A term level identifier.
data Id = ...
-- | Coercion Variable
type CoVar = Id
-- | Type or Coercion Variable
type TyCoVar = Either TyVar CoVar
I further suggest having a typeclass VarLike
with functions varType
, realUnique
and varName
to replace the field selectors of the same name we used to have.
It is possibly worth considering having the Id
constructor retain its meaning (that is, it constructs a Var
rather than an Id
, and the Id
constructor is named something else). We could do this with a pattern synonym.
Newtype based alternative
We could express the same idea using newtypes in a manner like this.
data Var
= TyVar' { ... }
| TcTyVar' { ... }
| Id' { ... }
newtype Id = Id Var
newtype TyVar = TyVar Var
newtype TcTyVar = TcTyVar Var
- This would provide the same level of type safety to developers as separate data types.
- It would make it simpler to update existing code to this scheme compared to using data types.
- It would sidestep the need for a
VarLike
typeclass as functions operation on the general notion of a variable could still be written over theVar
type just as we do now. - There is precedent for this with
NonVoid
which serves a similar purpose and has worked very well in my opinion. - This change (in absence of compiler bugs) should have no affect on compiler performance at all.
- There might be some cases where specializing a function to a particular kind of
Var
would be beneficial, which is harder to achieve using this approach.