Handling of Source Locations in Trees that Grow
This wikipage describes a design for putting source locations inside a new extension point of TTG.
The motivation is to allow GHC to sprinkle source locations across many nodes of the syntax tree,
without forcing every client of HsSyn
to do so. For example Template Haskell probably does not want to.
(NB: This wiki page was overhauled in Jan 2020. If you came here via a link, a very similar design to this was called 'Solution D' in the previous version: see the version history.)
Design
Recall that the TTG paper indexes each data type with a type index, and defines types with two extension points:
- Each data type has an extension constructor.
- Each data construcutor has an extension field.
For example:
data Expr x = Var (XVar x) (Var x)
| App (XApp x) (Expr x) (Expr x)
| Lam (XLam x) (Var x) (Expr x)
| XExpr (XXExpr x)
type family XVar x
type family XApp x
type family XLam x
type family XXExpr x
Here the type index is x
; the extension constructor is XExpr
; and
the extension fields are XVar x
, XApp x
, etc (i.e. the first field
of each constructor). The types XVar
, XApp
etc are type families,
that can be extended with new instances as you add new indexing types.
We add a third extension point:
- Selected fields, often the recursive uses, have an extension wrapper,
XRec
.
Thus:
data Expr x = Var (XVar x) (Var x)
| App (XApp x) (XRec x (Expr x)) (XRec x (Expr x))
| Lam (XLam x) (XRec x (Var x)) (XRec x (Expr x))
| XExpr (XXExpr x)
type family XRec x a
Again, XRec
is typically a type family. For some indices (say Vanilla
)
we can easily elide all these XRec
wrappers:
type instance XRec Vanilla a = a
But for GHC, with type index GhcPass p
, we can use it to add a source location for each XRec
:
type instance XRec (GhcPass p) a = Located a
Note: There is nothing inherently recursive about XRec
. It has that name
because it is often
used to wrap recursive fields, such as the arguments to App
above.
But it is also used to wrap non-recursive fields, such as the XRec x (Var x)
field of
Lam
.
For name bikeshedding, see #17587 (closed).)
Using XRec in GHC
Becoming specific to GHC, wherever we used Located
in the AST previously, we now use XRec pass
:
-type LConDecl pass = Located (ConDecl pass)
+type LConDecl pass = XRec pass (ConDecl pass)
data ConDecl pass
= ConDeclGADT { ... }
| ConDeclH98
{ ...
- , con_name :: Located (IdP pass)
+ , con_name :: XRec pass (IdP pass)
...
}
When we have an LHsDecl (GhcPass p)
this yields the same AST as before, which makes refactoring much easier.
The idea of this refactoring is that XRec pass
really just replaces the usage of Located
across the AST everywhere,
this can be inside the LHs*
type synonyms or record fields.
Remember: We have to replace every Located
in the AST to achieve the goal of moving SrcLoc
s to an extension point.
Here are some other usecases of XRec
across the AST:
type HsDeriving pass = XRec pass [LHsDerivingClause pass]
type LHsDerivingClause pass = XRec pass (HsDerivingClause pass)
-- essentially expanding to:
type HsDeriving pass = XRec pass [XRec pass (HsDerivingClause pass)]
(Located
could be interlieved with other functors (e.g. []
) before wrapping anything GHC AST again.)
data HsDataDefn pass
= HsDataDefn { ...
dd_cType :: Maybe (XRec pass CType),
...
}
(Located
was sometimes wrapped around non-AST data like CType
or simply Bool
)
Some of the thinking
- We want to have an AST that can be used for both TH and normal Hs, for example. This means we need to store source locations in an extension point of TTG (trees that grow).
- In #15495 we agreed that the currently available extension points in TTG don't suffice to annotate the AST with
SrcLoc
s in a satisfyingly type-safe manner: Something akin to 'Ping-pong' style is desireable. - Using a type family
XRec
instead ofLocated
enables the extension points to carry source locations, or not, or even something else, everywhere thatLocated
would be used today.
Applications for this include:
- Using the GHC TTG AST for TemplateHaskell. TH doesn't have any
SrcLoc
s attached to it, so it would usetype instance XRec TemplateHaskell a = a
- Attaching api annotations to the GHC TTG AST directly, instead of through the
pm_annotations
field inParsedModule
. This is outlined as one possible approach in the wiki page 'in tree api annotations'.
Changes
In GHC
GHC's functions' bodies will mostly not need to change (with some exceptions). This refactor pretty much only touches the types. For example:
-isForeignImport :: LForeignDecl pass -> Bool
+isForeignImport :: LForeignDecl (GhcPass p) -> Bool
isForeignImport (L _ (ForeignImport {})) = True
isForeignImport _ = False
This function uses locations (the pattern match on L
), so it is GHC-specific, and this fact is now reflected in its type LForeignDecl (GhcPass p) -> Bool
.
And lastly, some instance declarations that used TypeSynonymInstances
now need to be expanded, now that we're using type families inside those type synonyms.
In Haddock
Haddock doesn't use GhcPass p
, but it uses source locations and GHC's AST heavily. Luckily we can just define an XRec
instance for their pass datakind: DocNameI
:
type instance XRec DocNameI a = Located a
All functions that both GHC and Haddock depend on need to be polymorphic over the pass
type variable however. Luckily again, there are not many of these functions, they mostly live in Ghc/Hs/Utils.hs
. They will need another constraint: XRec pass (Match pass) ~ Located (Match pass)
, for example.
Appendix
Status
The design described here is not merged into GHC as of yet. Current status is:
- An older variant of the design (pre #17587 (closed)) is implemented and merged for
Pat.hs
. - The newer variant for
Pat.hs
is implemented in !2315 (closed) - The design is applied throughout GHC in !2315 (closed), but that is still WIP.
For previous status/discussions, see the older version of this file or the related issues.
Related Issues / MRs
- Initial discussion: #15495
-> Discussion settles in the design of !1970 (closed) (From now on referred to as the '
LPat
experiment') - Follow up ticket: #17587 (closed) -> Simplifies extension point design and makes it more flexible
- Follow up merge request: !2315 (closed) (Expanding the '
LPat
experiment' across the compiler) - Relevant applications for this:
'Ping-pong style'
Say we have an expression type Expr
. Ping-pong style refers to the recursion being made through a type synonym LExpr
, for example, which would allow adding source locations easily:
data Expr
= Add LExpr LExpr
| Mul LExpr LExpr
| Num Int
type LExpr = Located Expr
newtype Located a = ...
This way it is possible to distinguish an expression which has source locations attached and one which doesn't have source locations attached on the type level. LExpr
doesn't unify with Expr
.
If this were done with the constructor extension point of TTG, then one would lose some type safety: There would no longer be a guaruntee that there will always be a Located
layer between the Expr
layers in our huge expression sandwich.
There are two very relevant comments in #15495.