Video: Abstract Syntax Types (1hr03')
HsSyn
types
The The program is initially parsed into "HsSyn
", a collection of data types that describe the full abstract syntax of Haskell. HsSyn
is a pretty big collection of types: there are 52 data types at last count. Many are pretty trivial, but a few have a lot of constructors (HsExpr
has 40). HsSyn
represents Haskell in its full glory, complete with all syntactic sugar.
The HsSyn
modules live in the compiler/GHC/Hs directory. Each module declares a related group of declarations, and gives their pretty-printer.
- compiler/GHC/Hs.hs: the root module. It exports everything you need, and it's generally what you should import.
- compiler/GHC/Hs/Binds.hs: bindings.
- compiler/GHC/Hs/ImpExp.hs: imports and exports.
- compiler/GHC/Hs/Decls.hs: top-level declarations.
- compiler/GHC/Hs/Expr.hs: expressions, match expressions, comprehensions.
- compiler/GHC/Hs/Lit.hs: literals.
- compiler/GHC/Hs/Pat.hs: patterns.
- compiler/GHC/Hs/Type.hs: types.
- compiler/GHC/Hs/Utils.hs: utility functions (no data types).
There is significant mutual recursion between modules, and hence a couple of hs-boot
files.
HsSyn
with type information
Decorating The type checker adds type information to the syntax tree, otherwise leaving it as undisturbed as possible. This is done in two ways:
-
Some constructors have a field of type
PostTcType
, which is just a synonym forType
. For example:data HsExpr id = ... | ExplicitList PostTcType [LHsExpr id] | ... type PostTcType = Type placeHolderType :: PostTcType placeHolderType = panic "Evaluated the place holder for a PostTcType"
An
ExplicitList
represents the explicit list construct in Haskell (e.g. "[2, 4, 1]
"). The parser fills thePostTcType
field with an error thunkHsTypes.placeHolderType
; and the renamer does not touch it. The typechecker figures out the type, and fills in the value. So until the type checker, we cannot examine or print thePostTcType
fields.The error thunks mean that we can't conveniently pretty-print the
PostTcType
fields, because the pretty-printer would poke the error thunks when run on pre-typchecked code. We could have definedPostTcType
to beMaybe Type
, but that would have meant unwrapping lots ofJust
constructors, which is messy. It would be nicer to parameteriseHsSyn
over thePostTcType
fields. Thus:type RnHsBinds = HsBinds Name () -- After renaming type TcHsBinds = HsBinds Id Type -- After type checking
This would be a Good Thing to do.
-
In a few cases, the typechecker moves from one constructor to another. Example:
data HsPat id = ... | ConPatIn (Located id) (HsConDetails id (LPat id)) | ConPatOut (Located DataCon) [TyVar] -- Existentially bound type variables [id] -- Ditto dictionaries (DictBinds id) -- Bindings involving those dictionaries (HsConDetails id (LPat id)) Type -- The type of the pattern ...
The parser and renamer use
ConPatIn
; the typechecker generates aConPatOut
. This naming convention is used consistently. -
There are a few constructors added by type checker (rather than replacing an input constructor), particularly:
-
HsWrap
, in theHsExpr
type. -
AbsBinds
, in theHsBinds
type.
These are invariably to do with type abstraction and application, since Haskell source is implicitly generalized and instantiated, whereas GHC's intermediate form is explicitly generalized and instantiated.
-
Source Locations
HsSyn
makes heavy use of the Located
type (compiler/GHC/Types/SrcLoc.hs):
data Located e = L SrcSpan e
A Located t
is just a pair of a SrcSpan
(which describes the source location of t
) and a syntax tree t
. The module SrcLoc
defines two other types:
-
SrcLoc
specifies a particular source location: (filename, line number, character position) -
SrcSpan
specifes a range of source locations: (filename, start line number and character position, end line number and character position)
More details in compiler/GHC/Types/SrcLoc.hs.
Naming convention within the code: "LHs
" means located Haskell, e.g.
type LHsBinds n = Located (HsBinds n)