Simon Peyton Jones · 87ea2213
--- a/commentary/compiler/entity-types.md
+++ b/commentary/compiler/entity-types.md
+
+\[ Up: [Commentary/Compiler/HscMain](commentary/compiler/hsc-main) \]
+
+# Data types for Haskell entities: `Id`, `TyVar`, `TyCon`, `DataCon`, and `Class`
+
+
+For each kind of Haskell entity (identifier, type variable, type constructor, data constructor, class) GHC has a data type to represent it.  Here they are:
+
+- **Type constructors** are represtented by the `TyCon` type ([compiler/types/TyCon.lhs](/trac/ghc/browser/ghc/compiler/types/TyCon.lhs)).
+- **Clases** are represtented by the `Class` type ([compiler/types/Class.lhs](/trac/ghc/browser/ghc/compiler/types/Class.lhs)).
+- **Data constructors** are represtented by the `DataCon` type ([compiler/basicTypes/DataCon.lhs](/trac/ghc/browser/ghc/compiler/basicTypes/DataCon.lhs)).
+- **Term variables**`Id` and **type variables**`TyVar` are both represented by the `Var` type ([compiler/basicTypes/Var.lhs](/trac/ghc/browser/ghc/compiler/basicTypes/Var.lhs)).
+
+
+All of these entities have a `Name`, but that's about all they have in common.  However they are sometimes treated uniformly:
+
+- A **TyThing** ([compiler/types/TypeRep.lhs](/trac/ghc/browser/ghc/compiler/types/TypeRep.lhs)) is simply the sum of all four:
+
+  ```wiki
+  data TyThing = AnId     Id
+  	     | ADataCon DataCon
+  	     | ATyCon   TyCon
+  	     | AClass   Class
+  ```
+
+  For example, a type environment is a map from `Name` to `TyThing`.
+
+
+All these data types are implemented as a big record of information that tells you everything about the entity.  For example, a `TyCon` contains a list of its data constructors; a `DataCon` contains its type (which mentions its `TyCon`); a `Class` contains the `Id`s of all its method selectors; and an `Id` contains its type (which mentions type constructors and classes).  
+
+
+So you can see that the GHC data structures for entities is a *graph* not tree: everything points to everything else.  This makes it very convenient for the consumer, because there are accessor functions with simple types, such as `idType :: Id -> Type`.  But it means that there has to be some tricky almost-circular programming ("knot-tying") in the type checker, which constructs the entities.
+
+## Type variables and term variables
+
+
+Type variables and term variables are represented by a single data type, `Var`, thus ([compiler/basicTypes/Var.lhs](/trac/ghc/browser/ghc/compiler/basicTypes/Var.lhs)):
+
+```wiki
+type Id    = Var
+type TyVar = Var
+```
+
+
+It's incredibly convenient to use a single data type for both, rather than using one data type for term variables and one for type variables.  For example:
+
+- Finding the free variables of a term gives a set of variables (both type and term variables): `exprFreeVars :: CoreExpr -> VarSet`.
+- We only need one lambda constructor in Core: `Lam :: Var -> CoreExpr -> CoreExpr`.
+
+
+The `Var` type distinguishes the two sorts of variable; indeed, it makes somewhat finer distinctions ([compiler/basicTypes/Var.lhs](/trac/ghc/browser/ghc/compiler/basicTypes/Var.lhs)):
+
+```wiki
+data Var
+  = TyVar {
+	varName    :: !Name,
+	realUnique :: FastInt,		-- Key for fast comparison
+					-- Identical to the Unique in the name,
+					-- cached here for speed
+	tyVarKind :: Kind }
+
+  | TcTyVar { 				-- Used only during type inference
+	varName        :: !Name,
+	realUnique     :: FastInt,
+	tyVarKind      :: Kind,
+	tcTyVarDetails :: TcTyVarDetails }
+
+  | GlobalId { 			-- Used for imported Ids, dict selectors etc
+	varName    :: !Name,	-- Always an External or WiredIn Name
+	realUnique :: FastInt,
+   	idType     :: Type,
+	idInfo     :: IdInfo,
+	gblDetails :: GlobalIdDetails }
+
+  | LocalId { 			-- Used for locally-defined Ids (see NOTE below)
+	varName    :: !Name,
+	realUnique :: FastInt,
+   	idType     :: Type,
+	idInfo     :: IdInfo,
+	lclDetails :: LocalIdDetails }
+```
+
+<table><tr><th>`TyVar`</th>
+<td>is self explanatory.
+</td></tr></table>
+
+<table><tr><th>`TcTyVar`</th>
+<td>is used during type-checking only.  Once type checking is finished, there are no more `TcTyVar`s.
+</td></tr></table>
+
+<table><tr><th>`LocalId`</th>
+<td>is used for term variables bound *in the module being compiled*.   More specifically, a `LocalId` is bound either *within* an expression (lambda, case, local let), or at the top level of the module being compiled.
+
+- The `IdInfo` of a `LocalId` may change as the simplifier repeatedly bashes on it.
+- A `LocalId` carries a flag saying whether it's exported. This is useful for knowing whether we can discard it if it is not used.
+
+  ```wiki
+  data LocalIdDetails 
+    = NotExported	-- Not exported; may be discarded as dead code.
+    | Exported	-- Exported; keep alive
+  ```
+
+</td></tr></table>
+
+<table><tr><th>`GlobalId`</th>
+<td>is used for fixed, immutable, top-level term variables, notably ones that are imported from other modules.
+
+- A `GlobalId` always has an `External` or `WiredIn`[Name](commentary/compiler/name-type), and hence has a `Unique` that is globally unique across the whole of a GHC invocation.
+- The `IdInfo` of a `GlobalId` is completely fixed.
+- All implicit Ids (data constructors, class method selectors, record selectors and the like) are are `GlobalId`s from birth, even the ones defined in the module being compiled.
+- When finding the free variables of an expression (`exprFreeVars`), we only collect `LocalIds` and ignore `GlobalIds`.
+
+</td></tr></table>
+
+
+All the value bindings in the module being compiled (whether top level or not) are `LocalId`s until the CoreTidy phase. In the CoreTidy phase, all top-level bindings are made into `GlobalId`s. This is the point when a `LocalId` becomes "frozen" and becomes a fixed, immutable `GlobalId`. 
+
+## `GlobalIdDetails` and implict Ids
+
+`GlobalId`s are further classified by their `GlobalIdDetails`.  This type is defined in [compiler/basicTypes/IdInfo](/trac/ghc/browser/ghc/compiler/basicTypes/IdInfo), because it mentions other structured types such as `DataCon`. Unfortunately it is *used* in Var.lhs so there's a hi-boot knot to get it there. Anyway, here's the declaration (elided a little):
+
+```wiki
+data GlobalIdDetails
+  = VanillaGlobal		-- Imported from elsewhere, a default method Id.
+  | RecordSelId { ... }		-- Record selector
+  | DataConWorkId DataCon	-- The Id for a data constructor *worker*
+  | DataConWrapId DataCon	-- The Id for a data constructor *wrapper*
+  | ClassOpId Class		-- An operation of a class
+  | PrimOpId PrimOp		-- The Id for a primitive operator
+  | FCallId ForeignCall		-- The Id for a foreign call
+  | NotGlobalId			-- Used as a convenient extra return value from globalIdDetails
+```
+
+
+Some `GlobalId`s are called **implicit `Id`s**. These are `Id`s that are defined by a declaration of some other entity (not just an ordinary variable binding).  For example:
+
+- The selectors of a record type
+- The method selectors of a class
+- The worker and wrapper Id for a data constructor
+
+
+It's easy to distinguish these Ids, because the `GlobalIdDetails` field says what kind of thing it is: `Id.isImplicitId :: Id -> Bool`.