Video: Types and Classes (23'53")
Id
, TyVar
, TyCon
, DataCon
, and Class
Data types for Haskell entities: For each kind of Haskell entity (identifier, type variable, type constructor, data constructor, class) GHC has a data type to represent it. Here they are:
-
Type constructors are represented by the
TyCon
type (compiler/GHC/Core/TyCon.hs). -
Classes are represented by the
Class
type (compiler/GHC/Core/Class.hs). -
Data constructors are represented by the
DataCon
type (compiler/GHC/Core/DataCon.hs). -
Pattern synonyms are represented by the
PatSyn
type (compiler/GHC/Core/PatSyn.hs). -
Term variables
Id
and type variablesTyVar
are both represented by theVar
type (compiler/GHC/Types/Var.hs).
All of these entities have a Name
, but that's about all they have in common. However they are sometimes treated uniformly:
-
A
TyThing
(compiler/GHC/Core/TyCo/Rep.hs) is simply the sum of all four:data TyThing = AnId Id | AConLike ConLike | ATyCon TyCon | AClass Class data ConLike = RealDataCon DataCon | PatSynCon PatSyn
For example, a type environment is a map from
Name
toTyThing
. (The fact that aName
tells what name space it belongs to allow, for example, identically named values and types to sit in a single map.)
All these data types are implemented as a big record of information that tells you everything about the entity. For example, a TyCon
contains a list of its data constructors; a DataCon
contains its type (which mentions its TyCon
); a Class
contains the Id
s of all its method selectors; and an Id
contains its type (which mentions type constructors and classes).
So you can see that the GHC data structures for entities is a graph not tree: everything points to everything else. This makes it very convenient for the consumer, because there are accessor functions with simple types, such as idType :: Id -> Type
. But it means that there has to be some tricky almost-circular programming ("knot-tying") in the type checker, which constructs the entities. See tying the knot for more details on this process.
Type variables and term variables
Type variables and term variables are represented by a single data type, Var
, thus (compiler/GHC/Types/Var.hs):
type Id = Var
type TyVar = Var
It's incredibly convenient to use a single data type for both, rather than using one data type for term variables and one for type variables. For example:
- Finding the free variables of a term gives a set of variables (both type and term variables):
exprFreeVars :: CoreExpr -> VarSet
. - We only need one lambda constructor in Core:
Lam :: Var -> CoreExpr -> CoreExpr
.
The Var
type distinguishes the two sorts of variable; indeed, it makes somewhat finer distinctions (compiler/GHC/Types/Var.hs):
data Var
= TyVar {
varName :: !Name,
realUnique :: FastInt, -- Key for fast comparison
tyVarKind :: Kind,
isCoercionVar :: Bool }
| TcTyVar { -- Used only during type inference
varName :: !Name,
realUnique :: FastInt,
tyVarKind :: Kind,
tcTyVarDetails :: TcTyVarDetails }
| GlobalId { -- Used for imported Ids, dict selectors etc
varName :: !Name, -- Always an External or WiredIn Name
realUnique :: FastInt,
idType :: Type,
idInfo :: IdInfo,
gblDetails :: GlobalIdDetails }
| LocalId { -- Used for locally-defined Ids (see NOTE below)
varName :: !Name,
realUnique :: FastInt,
idType :: Type,
idInfo :: IdInfo,
lclDetails :: LocalIdDetails }
Every Var
has fields varName::Name
and a realUnique::FastInt
. The latter is identical to the Unique
in the former, but is cached in the Var
for fast comparison.
Here are some per-flavour notes:
-
TyVar
is self explanatory.
-
TcTyVar
is used during type-checking only. Once type checking is finished, there are no more
TcTyVar
s. -
LocalId
is used for term variables bound in the module being compiled. More specifically, a
LocalId
is bound either within an expression (lambda, case, local let), or at the top level of the module being compiled.-
The
IdInfo
of aLocalId
may change as the simplifier repeatedly bashes on it. -
A
LocalId
carries a flag saying whether it's exported. This is useful for knowing whether we can discard it if it is not used.data LocalIdDetails = NotExported -- Not exported; may be discarded as dead code. | Exported -- Exported; keep alive
-
-
GlobalId
is used for fixed, immutable, top-level term variables, notably ones that are imported from other modules. This means that, for example, the optimizer won't change its properties.
- Always has an
External
orWiredIn
Name, and hence has aUnique
that is globally unique across the whole of a GHC invocation. - Always bound at top level.
- The
IdInfo
of aGlobalId
is completely fixed. - All implicit Ids (data constructors, class method selectors, record selectors and the like) are are
GlobalId
s from birth, even the ones defined in the module being compiled. - When finding the free variables of an expression (
exprFreeVars
), we only collectLocalIds
and ignoreGlobalIds
.
- Always has an
All the value bindings in the module being compiled (whether top level or not) are LocalId
s until the CoreTidy phase. In the CoreTidy phase, all top-level bindings are made into GlobalId
s. This is the point when a LocalId
becomes "frozen" and becomes a fixed, immutable GlobalId
.
GlobalIdDetails
and implict Ids
GlobalId
s are further classified by their GlobalIdDetails
. This type is defined in compiler/GHC/Types/Id/Info.hs, because it mentions other structured types such as DataCon
. Unfortunately it is used in Var.hs so there's a hi-boot knot to get it there. Anyway, here's the declaration (elided a little):
data GlobalIdDetails
= VanillaGlobal -- Imported from elsewhere, a default method Id.
| RecordSelId { ... } -- Record selector
| DataConWorkId DataCon -- The Id for a data constructor *worker*
| DataConWrapId DataCon -- The Id for a data constructor *wrapper*
| ClassOpId Class -- An operation of a class
| PrimOpId PrimOp -- The Id for a primitive operator
| FCallId ForeignCall -- The Id for a foreign call
| NotGlobalId -- Used as a convenient extra return value from globalIdDetails
Some GlobalId
s are called implicit Id
s. These are Id
s that are defined by a declaration of some other entity (not just an ordinary variable binding). For example:
- The selectors of a record type
- The method selectors of a class
- The worker and wrapper Id for a data constructor
It's easy to distinguish these Ids, because the GlobalIdDetails
field says what kind of thing it is: GHC.Types.Id.isImplicitId :: Id -> Bool
.