GHC issueshttps://gitlab.haskell.org/ghc/ghc/-/issues2022-07-12T13:58:53Zhttps://gitlab.haskell.org/ghc/ghc/-/issues/21834ApplicativeStmt is only introduced in the renamer, so can move into a TTG con...2022-07-12T13:58:53ZAlan ZimmermanApplicativeStmt is only introduced in the renamer, so can move into a TTG constructor extensionThis is a proposed cleanup for TTGThis is a proposed cleanup for TTGRodrigo MesquitaRodrigo Mesquitahttps://gitlab.haskell.org/ghc/ghc/-/issues/21628TTG: On GHC.Data.FastString2024-01-30T18:29:57ZRodrigo MesquitaTTG: On GHC.Data.FastStringTo keep following the TTG goal of a GHC-independent AST, we want to remove all imports of `GHC.*` from the `Language.Haskell.*` hierarchy. One of these imports is `FastString`.
~~After some discussion below we've decided to move `FastSt...To keep following the TTG goal of a GHC-independent AST, we want to remove all imports of `GHC.*` from the `Language.Haskell.*` hierarchy. One of these imports is `FastString`.
~~After some discussion below we've decided to move `FastString` from the `GHC.Data` module hierarchy to the "independent" `Data` one.~~
~~This issue should be closed after `FastString` lives in this independent space, and all `Language.Haskell.*` imports of `FastString` then import `Data.FastString` rather than `GHC.Data.FastString`~~
Currently, `FastString` from `GHC.Data.FastString` is used multiple times in the client-independent AST living in `Language.Haskell.Syntax`.
The occurrences are:
```hs
data HsExpr id
= ...
| HsOverLabel (XOverLabel p) FastString
...
| HsQuasiQuote -- See Note [Quasi-quote overview] in GHC.Tc.Gen.Splice
(XQuasiQuote id)
(IdP id) -- Splice point
(IdP id) -- Quoter
SrcSpan -- The span of the enclosed string
FastString -- The enclosed string
...
data HsLit id
= ...
| HsString (XHsString x) {- SourceText -} FastString
data OverLitVal
= ...
| HsIsString !SourceText !FastString -- ^ String-looking literals
data HsTyLit
= ...
| HsStrTy SourceText FastString
newtype ModuleName = ModuleName FastString
```
I am wondering about how I can get rid of FastString in Language.Haskell.Syntax, and so by raising this issue I mean to discuss ways to solve this, and get started on implementing said solution
Perhaps some strategy resembling `IdP`...
Thank you,
romes
CC: @Ericson2314https://gitlab.haskell.org/ghc/ghc/-/issues/21592TTG: the language-haskell package2023-08-07T23:11:12ZRodrigo MesquitaTTG: the language-haskell packageThis ticket describes the plan and follows the implementation of separating the Haskell AST and parsing stage to packages `haskell-syntax` and `haskell-parser` to achieve modularity and code reuse across clients
We propose the following...This ticket describes the plan and follows the implementation of separating the Haskell AST and parsing stage to packages `haskell-syntax` and `haskell-parser` to achieve modularity and code reuse across clients
We propose the following goals to reach this package separation:
1. A module hierarchy `Language.Haskell.Syntax` which contains the base Haskell AST. Just the data types, very few functions, since there is virtually nothing you can do without knowing *any* of the type family instances.
2. A module hierarchy `Language.Haskell.Parser` that defines `HsExpr Parsed`, and has the properties enumerated below. Presumably this package buys into the full `Anno` infrastructure to decorate the parse tree.
3. Rejig GHC to use `HsExpr Parsed` rather than `HsExpr (GhcPass Parsed)`.
4. Move the first two from `ghc` to an `haskell-syntax` package.
5. Figure out what needs to be done to create a `haskell-parser` package, later.
----
Properties:
1. A (pre-processed) string can be parsed into `HsExpr Parsed`. Let's call this operation `parse`. (By "pre-processed", we mean that there is no CPP left in the input string.)
2. `HsExpr Parsed` can be pretty-printed to a string. Let's call this operation `print`.
3. `parse` and `print` are exact inverses, in both directions.
4. `HsExpr Parsed` is a suitable data structure to use as the input to a compilation pipeline. It is structured and recursive.
5. `HsExpr Parsed` depends on a minimum of infrastructure. In particular, no aspects of e.g. type-checking or code generation should be depended on by any module defining the `HsExpr` type, nor the `parse` or `print` functions.
6. `HsExpr` is extensible, fitting with the TTG framework for extension in other phases; the unextended `HsExpr` contains the essence of the Haskell AST, with the parts that are useful to many syntax processors.
---
For (1) we need to remove all dependencies of GHC from all L.H.S modules. This is well underway (move HsModule) https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8228 (move MatchGroup Origin field) https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8301 (TTG for splices) https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7821 (removing other dependencies) https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8308 (dealing with FastString) https://gitlab.haskell.org/ghc/ghc/-/issues/21628. We need to keep reviewing these and enforcing a criteria: the base AST should represent the complexity of the source code, but doesn't need to keep e.g. annotations, that's a Parser extension to the AST.
For (2) we should start moving the parser modules to `Language.Haskell.Parser`. I don't think we should do this yet, (1) must be complete first
For (3) we need (2)
For (4), we can have it before (3), but need (1,2).
Let's keep tuning this description with the plan.
----
For history, here is the original ticket description:
This issue follows a previous discussion in a TTG related MR.
I want to further this discussion, in particular because of some design decisions concerning my MR https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8228 (see below)
Quoting @rae in https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4782#note_415958:
> I want to add a small new thought: What if the result of > parsing weren't GhcPs, but Parsed. That is, we have a client-independent phase indicator Parsed :: Type that lives in > > the client-independent space. GhcPass would just encompass GhcRn and GhcTc. For me, the fact that you can parse into > > and pretty-print from an AST is its fundamental property, and a great part to start any design discussion.
> One challenge is that, to really have the parse/pretty-print property, you need information about whitespace, comments, > keyword spellings, unicode syntax, and such. This information is mostly useless after parsing. Since our TTG structure
> offers no way to remove fields, putting this information in the base tree might make it less usable downstream. That > > might be why we use extension points for this data today.
> So maybe I advocate for a compromise position: put Parser as a general, client-independent pass. All the source annotation stuff would also live in the client-independent space. But design the AST so that there are extensions for Parser, around annotations it is unlikely for clients to need. It would be a judgment call around what information is likely useless to clients (for example, one could argue that the distinction between (x `f`) and f x is "useless"), but I think we are likely to make mostly good judgments around this.
I'm currently working in moving HsModule to L.H.S in https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8228,
and when dealing with GHC.Hs.ImpExp, I arrive at this:
```hs
-- | Imported or exported entity.
data IE pass
= IEVar (XIEVar pass) (LIEWrappedName pass (IdP pass))
...
| IEGroup (XIEGroup pass) Int (LHsDoc pass) -- ^ Doc section heading
| IEDoc (XIEDoc pass) (LHsDoc pass) -- ^ Some documentation
| IEDocNamed (XIEDocNamed pass) String -- ^ Reference to named doc
```
The first reason why I brought this discussion up again is because of `HsDoc`.
I was trying to understand whether `IEGroup` and `IEDoc` should be moved to the GHC specific part, or whether `HsDoc` should be moved to the client independent stage.
In light of the comment advocating the independent `Parsed` stage which is also independent from the independent base AST, I would say that `HsDoc` would be a good example of something to be put in the independent `Parsed` stage, but not in the base AST.
For now I'll move the constructors depending on `HsDoc` to the GHC specific part, and if we later on pick up the `Parsed` idea, we can move them to there.
I mixed a bit of https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8228 talk here, but to motivate further discussion.
For comments regarding this specific TTG Module x HsDoc discussion use the linked MR
Thank you,
~romesRodrigo MesquitaRodrigo Mesquitahttps://gitlab.haskell.org/ghc/ghc/-/issues/21262TTG follow-up: Outputable orphans2022-07-06T12:18:20ZRodrigo MesquitaTTG follow-up: Outputable orphansFollowup in cleaning-up after separating the AST from GHC-Pass !4778; regarding the orphan `Outputable` instances in `compiler/GHC/Hs/Expr.hs`
> It makes me wonder whether all the Outputable instances should be in GHC.Hs.Extension.GhcPa...Followup in cleaning-up after separating the AST from GHC-Pass !4778; regarding the orphan `Outputable` instances in `compiler/GHC/Hs/Expr.hs`
> It makes me wonder whether all the Outputable instances should be in GHC.Hs.Extension.GhcPass. I think they wouldn't be orphans there.
> After we add more TTG parameters to allow using non-GHC non-tie-the-know-oh-wait-hs-boots-everywhere, we can generalize these instances so they don't assume GhcPass.
> Outputable itself assumes some things about names and other GHC which would probably be lifted.
> So I don't know what's best in the short and medium terms, but I hope in the long term we can have some nice GHC-agnostic bring-your-own-configuration-record pretty printing. This will also help with #17957 and getting rid of the unsafe global cfg stuff.
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4778#note_324909Rodrigo MesquitaRodrigo Mesquitahttps://gitlab.haskell.org/ghc/ghc/-/issues/15885Enhancing COMPLETE pragma to support pattern synonyms with polymorphic (outpu...2021-02-22T21:15:08ZShayan-NajdEnhancing COMPLETE pragma to support pattern synonyms with polymorphic (output) typesOn our work on the [new front-end AST for GHC](https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow) based on [TTG](https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow/TreesThatGrowGuidance), we would like to use [...On our work on the [new front-end AST for GHC](https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow) based on [TTG](https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow/TreesThatGrowGuidance), we would like to use [a pattern synonym](https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow/HandlingSourceLocations) similar to the following:
```hs
pattern LL :: HasSrcSpan a => SrcSpan -> SrcSpanLess a -> a
pattern LL s m <- (decomposeSrcSpan -> (m , s))
where
LL s m = composeSrcSpan (m , s)
```
We know that any match on `LL` patterns, makes the pattern matching total, as it uses a view pattern with a total output pattern (i.e., in `decomposeSrcSpan -> (m , s)`, the pattern `(m , s)` is total).
As far as I understand, currently COMPLETE pragmas cannot be used with such a polymorphic pattern synonym.
I believe we need to enhance COMPLETE pragmas to support such pattern synonyms.
This can be done either syntactically, or (preferably) type-directed.
For example, we should be able to write `{-# COMPLETE LL #-}` or `{-# COMPLETE LL :: HasSrcSpan a => a #-}`.
In the type-directed approach
a. the totality checker \*may\* need to track, at least, the set of required constraints of pattern synonyms mentioned in a COMPLETE pragma; and
b. the order of pattern synonyms mentioned in a pragma should be taken into account (as noted by \@carter).
For example, in the case of `LL`, `HasSrcSpan a` is a required type constraint.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.6.2 |
| Type | Task |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Enhancing COMPLETE pragma to support pattern synonyms with polymorphic (output) types","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"8.6.3","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.6.2","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Task","description":"On our work on the [https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow new front-end AST for GHC] based on [https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow/TreesThatGrowGuidance TTG], we would like to use [https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow/HandlingSourceLocations a pattern synonym] similar to the following:\r\n\r\n{{{#!hs\r\npattern LL :: HasSrcSpan a => SrcSpan -> SrcSpanLess a -> a\r\npattern LL s m <- (decomposeSrcSpan -> (m , s))\r\n where\r\n LL s m = composeSrcSpan (m , s)\r\n}}}\r\n\r\nWe know that any match on `LL` patterns, makes the pattern matching total, as it uses a view pattern with a total output pattern (i.e., in `decomposeSrcSpan -> (m , s)`, the pattern `(m , s)` is total).\r\n\r\nAs far as I understand, currently COMPLETE pragmas cannot be used with such a polymorphic pattern synonym.\r\nI believe we need to enhance COMPLETE pragmas to support such pattern synonyms.\r\n\r\nThis can be done either syntactically, or (preferably) type-directed.\r\n\r\nFor example, we should be able to write `{-# COMPLETE LL #-}` or `{-# COMPLETE LL :: HasSrcSpan a => a #-}`.\r\n\r\nIn the type-directed approach\r\na. the totality checker *may* need to track, at least, the set of required constraints of pattern synonyms mentioned in a COMPLETE pragma; and\r\nb. the order of pattern synonyms mentioned in a pragma should be taken into account (as noted by @carter).\r\n\r\nFor example, in the case of `LL`, `HasSrcSpan a` is a required type constraint.","type_of_failure":"OtherFailure","blocking":[]} -->8.6.3https://gitlab.haskell.org/ghc/ghc/-/issues/15495Handling Source Locations via TTG2020-11-09T22:25:46ZShayan-NajdHandling Source Locations via TTG## Problem
The current implementation of [TTG HsSyn AST](https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow/TreesThatGrowGuidance) in GHC stores source locations for terms of a datatype `Exp` in a separate wrapper datatype ...## Problem
The current implementation of [TTG HsSyn AST](https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow/TreesThatGrowGuidance) in GHC stores source locations for terms of a datatype `Exp` in a separate wrapper datatype `LExp` which is mutually recursive with `Exp` such that every recursive reference to `Exp` is done \*\*indirectly\*\*, via a reference to the wrapper datatype `LExp` (see the example code below). We refer to this style of storing source locations as the ping-pong style.
Besides the indirection and the resulting complications of the ping-pong style, there are two key problems with it:
* It bakes-in the source locations in the base TTG AST, forcing all instances to store source locations, even if they don't need them.For example, TH AST does not carry source locations, or even within GHC, there are generated terms without source locations.
* It results in a form of conceptual redundancy: source locations are tree decorations and they belong in the extension points.
These issues are discussed in
* [TTG wiki home page](https://gitlab.haskell.org/ghc/ghc/wikis/implementing-trees-that-grow), and its sub-pages, especiallly:
* [Handling source locations](https://gitlab.haskell.org/ghc/ghc/wikis/implementing-trees-that-grow/handling-source-locations)
* [TTG guidance](https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow/TreesThatGrowGuidance)
## Solution
We can move the source location decorations to a wrapper constructor and remove the ping-pong style.
This can be done smoothly, mechanically, and gradually by using a getter/setter methods for source locations.
More details can be found at [the related wiki page](https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow/HandlingSourceLocations).
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ----------------------- |
| Version | |
| Type | Task |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | alanz, bgamari, simonpj |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Handling Source Locations via TTG","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"⊥","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":["alanz","bgamari","simonpj"],"type":"Task","description":"== Problem ==\r\nThe current implementation of [https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow/TreesThatGrowGuidance TTG HsSyn AST] in GHC stores source locations for terms of a datatype `Exp` in a separate wrapper datatype `LExp` which is mutually recursive with `Exp` such that every recursive reference to `Exp` is done **indirectly**, via a reference to the wrapper datatype `LExp` (see the example code below). We refer to this style of storing source locations as the ping-pong style.\r\n\r\nBesides the indirection and the resulting complications of the ping-pong style, there are two key problems with it: \r\n\r\na. It bakes-in the source locations in the base TTG AST, forcing all instances to store source locations, even if they don't need them.For example, TH AST does not carry source locations, or even within GHC, there are generated terms without source locations. \r\n\r\nb. It results in a form of conceptual redundancy: source locations are tree decorations and they belong in the extension points.\r\n (see https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow/TreesThatGrowGuidance TTG Guidance])\r\n\r\n== Solution ==\r\nWe can move the source location decorations to a wrapper constructor and remove the ping-pong style.\r\nThis can be done smoothly, mechanically, and gradually by using a getter/setter methods for source locations.\r\nMore details can be found at [https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow/HandlingSourceLocations the related wiki page]. ","type_of_failure":"OtherFailure","blocking":[]} -->⊥https://gitlab.haskell.org/ghc/ghc/-/issues/15252syn_arg_wraps and syn_res_wrap are only populated after typechecking2022-09-19T19:52:04ZMatthew Pickeringsyn_arg_wraps and syn_res_wrap are only populated after typecheckingThe definition for `SyntaxExpr` has two fields which are only populated after type checking. `SyntaxExpr` should have an extension point which contains these two fields.
```
data SyntaxExpr p = SyntaxExpr { syn_expr :: HsExpr p ...The definition for `SyntaxExpr` has two fields which are only populated after type checking. `SyntaxExpr` should have an extension point which contains these two fields.
```
data SyntaxExpr p = SyntaxExpr { syn_expr :: HsExpr p
, syn_arg_wraps :: [HsWrapper]
, syn_res_wrap :: HsWrapper }
```
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.4.3 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"syn_arg_wraps and syn_res_wrap are only populated after typechecking","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"8.6.1","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.4.3","keywords":["newcomer"],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"The definition for `SyntaxExpr` has two fields which are only populated after type checking. `SyntaxExpr` should have an extension point which contains these two fields.\r\n\r\n{{{\r\n data SyntaxExpr p = SyntaxExpr { syn_expr :: HsExpr p \r\n , syn_arg_wraps :: [HsWrapper] \r\n , syn_res_wrap :: HsWrapper } \r\n}}}","type_of_failure":"OtherFailure","blocking":[]} -->https://gitlab.haskell.org/ghc/ghc/-/issues/14429Remove constraint types from HsExtension, post TTG2019-08-02T15:32:49ZAlan ZimmermanRemove constraint types from HsExtension, post TTGOnce Trees that Grow is landed on the hsSyn AST, remove the constraint types from `HsExtension.hs`
Hopefully `DataId`, `HasSourceText`, `OutputableX` etc can all go.
<details><summary>Trac metadata</summary>
| Trac field |...Once Trees that Grow is landed on the hsSyn AST, remove the constraint types from `HsExtension.hs`
Hopefully `DataId`, `HasSourceText`, `OutputableX` etc can all go.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.2.1 |
| Type | Task |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | sh.najd |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Remove constraint types from HsExtension, post TTG","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.2.1","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":["sh.najd"],"type":"Task","description":"Once Trees that Grow is landed on the hsSyn AST, remove the constraint types from `HsExtension.hs`\r\n\r\nHopefully `DataId`, `HasSourceText`, `OutputableX` etc can all go.","type_of_failure":"OtherFailure","blocking":[]} -->Alan ZimmermanAlan Zimmermanhttps://gitlab.haskell.org/ghc/ghc/-/issues/14428Rework HsValBindsLR2019-08-02T15:34:17ZAlan ZimmermanRework HsValBindsLROnce Trees that Grow has been applied to the hsSyn AST, rework `HsValBindsLR` to simplify it.
From a comment on https://phabricator.haskell.org/D4147
> Nothing here gives any clue that this is intended for the output of the renamer. An...Once Trees that Grow has been applied to the hsSyn AST, rework `HsValBindsLR` to simplify it.
From a comment on https://phabricator.haskell.org/D4147
> Nothing here gives any clue that this is intended for the output of the renamer. And typechecker I think.
> Plus I wonder if we'd be better served by
```hs
data HsValBindsLR idL idR
= ValBinds
[(RecFlag, LHsBindsLR idL idR)]
[LSig GhcRn]
```
> Then the parser can generate a giant singleton Rec and the renamer can sort it out. Less fuss.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.2.1 |
| Type | Task |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Rework HsValBindsLR","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.2.1","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Task","description":"Once Trees that Grow has been applied to the hsSyn AST, rework `HsValBindsLR` to simplify it.\r\n\r\nFrom a comment on https://phabricator.haskell.org/D4147\r\n\r\n> Nothing here gives any clue that this is intended for the output of the renamer. And typechecker I think.\r\n\r\n> Plus I wonder if we'd be better served by\r\n\r\n{{{#!hs\r\ndata HsValBindsLR idL idR\r\n = ValBinds\r\n [(RecFlag, LHsBindsLR idL idR)]\r\n [LSig GhcRn]\r\n}}}\r\n\r\n> Then the parser can generate a giant singleton Rec and the renamer can sort it out. Less fuss.","type_of_failure":"OtherFailure","blocking":[]} -->Alan ZimmermanAlan Zimmermanhttps://gitlab.haskell.org/ghc/ghc/-/issues/23935Empty Haddock comments no longer occur in the AST as `HsDoc`2023-09-14T19:20:21ZamesgenEmpty Haddock comments no longer occur in the AST as `HsDoc`## Summary
Consider the following two type signatures.
```haskell
foo :: {- |-} A -> B
bar :: {- | -} A -> B
```
Comparing the AST (with `-haddock`) of `foo` and `bar`, note that `foo` does not contain a `HsDoc` (searchf or `WithHsDocId...## Summary
Consider the following two type signatures.
```haskell
foo :: {- |-} A -> B
bar :: {- | -} A -> B
```
Comparing the AST (with `-haddock`) of `foo` and `bar`, note that `foo` does not contain a `HsDoc` (searchf or `WithHsDocIdentifiers`), but `bar` does:
<table>
<tr><th>
`foo`</th><th>
`bar`</th></tr>
<tr>
<td>
```haskell
(L
(SrcSpanAnn (EpAnn
(Anchor
{ <interactive>:1:1-20 }
(UnchangedAnchor))
(AnnListItem
[])
(EpaComments
[])) { <interactive>:1:1-20 })
(SigD
(NoExtField)
(TypeSig
(EpAnn
(Anchor
{ <interactive>:1:1-3 }
(UnchangedAnchor))
(AnnSig
(AddEpAnn AnnDcolon (EpaSpan { <interactive>:1:5-6 }))
[])
(EpaComments
[]))
[(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:1-3 })
(Unqual
{OccName: foo}))]
(HsWC
(NoExtField)
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:15-20 })
(HsSig
(NoExtField)
(HsOuterImplicit
(NoExtField))
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:15-20 })
(HsFunTy
(EpAnn
(Anchor
{ <interactive>:1:15 }
(UnchangedAnchor))
(NoEpAnns)
(EpaComments
[]))
(HsUnrestrictedArrow
(L
(TokenLoc
(EpaSpan { <interactive>:1:17-18 }))
(HsNormalTok)))
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:15 })
(HsTyVar
(EpAnn
(Anchor
{ <interactive>:1:15 }
(UnchangedAnchor))
[]
(EpaComments
[]))
(NotPromoted)
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:15 })
(Unqual
{OccName: A}))))
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:20 })
(HsTyVar
(EpAnn
(Anchor
{ <interactive>:1:20 }
(UnchangedAnchor))
[]
(EpaComments
[]))
(NotPromoted)
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:20 })
(Unqual
{OccName: B}))))))))))))
```
</td>
<td>
```haskell
(L
(SrcSpanAnn (EpAnn
(Anchor
{ <interactive>:1:1-21 }
(UnchangedAnchor))
(AnnListItem
[])
(EpaComments
[])) { <interactive>:1:1-21 })
(SigD
(NoExtField)
(TypeSig
(EpAnn
(Anchor
{ <interactive>:1:1-3 }
(UnchangedAnchor))
(AnnSig
(AddEpAnn AnnDcolon (EpaSpan { <interactive>:1:5-6 }))
[])
(EpaComments
[]))
[(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:1-3 })
(Unqual
{OccName: bar}))]
(HsWC
(NoExtField)
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:16-21 })
(HsSig
(NoExtField)
(HsOuterImplicit
(NoExtField))
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:16-21 })
(HsFunTy
(EpAnn
(Anchor
{ <interactive>:1:16 }
(UnchangedAnchor))
(NoEpAnns)
(EpaComments
[]))
(HsUnrestrictedArrow
(L
(TokenLoc
(EpaSpan { <interactive>:1:18-19 }))
(HsNormalTok)))
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:16 })
(HsDocTy
(EpAnnNotUsed)
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:16 })
(HsTyVar
(EpAnn
(Anchor
{ <interactive>:1:16 }
(UnchangedAnchor))
[]
(EpaComments
[]))
(NotPromoted)
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:16 })
(Unqual
{OccName: A}))))
(L
{ <interactive>:1:8-14 }
(WithHsDocIdentifiers
(NestedDocString
(HsDocStringNext)
(L
{ <interactive>:1:8-14 }
(HsDocStringChunk
" ")))
[]))))
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:21 })
(HsTyVar
(EpAnn
(Anchor
{ <interactive>:1:21 }
(UnchangedAnchor))
[]
(EpaComments
[]))
(NotPromoted)
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:21 })
(Unqual
{OccName: B}))))))))))))
```
</td>
</tr>
</table>
Is there a particular reason for this? In GHC 8.10, the AST contained Haddock comments in both cases.
Concrete effects of this behavior:
- It makes the job of formatters like Ormolu (see issues [1068](https://github.com/tweag/ormolu/pull/1068), [1065](https://github.com/tweag/ormolu/issues/1065), [726](https://github.com/tweag/ormolu/issues/726)) that check of AST discrepancies automatically harder than necessary, as eg a natural rewrite from
```haskell
foo ::
-- |
--
A ->
B
```
to
```haskell
foo ::
-- |
A ->
B
```
contains a Haddock comment in the AST in the first snippet, but not in the second.
- A nice Haddock trick by @tomjaguarpaw1 ([blog post](http://h2.jaguarpaw.co.uk/posts/improving-the-typed-process-documentation/), search for "Forced type signatures to wrap") does [no longer work](https://github.com/tweag/ormolu/pull/1068#issuecomment-1707237587).
Ideally, the behavior would be changed as it was in 8.10; I could try to do that in case this behavior is not intentional.
## Environment
* GHC version used: Any GHC since 9.0 (I think this change is due to !2377)https://gitlab.haskell.org/ghc/ghc/-/issues/19218Reduce deps of AST, consider moving it into own package2022-08-29T08:23:12ZJohn EricsonReduce deps of AST, consider moving it into own package## Motivation
The end goal of TTG is to share extensible data types for Haskell's syntax between GHC and other projects. But if we are to do that, we should have those data types live in a separate package.
- This is more robust at pr...## Motivation
The end goal of TTG is to share extensible data types for Haskell's syntax between GHC and other projects. But if we are to do that, we should have those data types live in a separate package.
- This is more robust at preventing entanglement than e.g. "the parser dependencies test"
- Eventually things might stabilize enough that the base extensible AST doesn't have a breaking change every GHC release. That mean less churn on PVP bounds for consumers of just the AST and not the rest of GHC.
- Hopefully, this can be the beginning of modularizing GHC more broadly :).
The [relevant TTG wiki page is here](https://gitlab.haskell.org/ghc/ghc/-/wikis/implementing-trees-that-grow)
## Proposal
The big question is of course how we get there. The key trick is to do work getting ready to separate the AST into it's own package before actually doing it, because its too much of a refactor to do all at once.
- #18936 This talks about how to remove a major part of the GHC-specific stuff from the module
- In https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4778#note_323748 @SPJ talks about using a different module hierarchy to indicate to other programmers the intent for these modules tot be separate from GHC. I think that's a great idea.
## Specific tasks
```bash
git grep -h '^import' compiler/Language/Haskell/Syntax | sort
```
is a crude way to track progress
- [x] #21242 Separate module AST from GHC pass.
- [ ] #21592 `FastString`
- [ ] `Arity`
- [ ] `ConTag`
- [ ] Lint deps from reappearing, prior to the package being split off.https://gitlab.haskell.org/ghc/ghc/-/issues/20039Refactor Anno type family and expcitly mark out annotation types in the AST.2021-06-29T14:45:01ZZubinRefactor Anno type family and expcitly mark out annotation types in the AST.The `Anno` type family is currently used to attach exactprint meta data to `SrcSpan`s via the `XRec` wrapper type family:
```haskell
type instance XRec (GhcPass p) a = GenLocated (Anno a) a
```
There is currently a [plan to move this i...The `Anno` type family is currently used to attach exactprint meta data to `SrcSpan`s via the `XRec` wrapper type family:
```haskell
type instance XRec (GhcPass p) a = GenLocated (Anno a) a
```
There is currently a [plan to move this information directly into the syntax tree](https://gitlab.haskell.org/ghc/ghc/-/wikis/api-annotations#token-information-in-the-syntax-tree-plan-b). However, this will be a major refactoring and has an indeterminate ETA. Meanwhile,
these instances get in the way of other people trying to use TTG to define custom "passes" over the AST (like @fendor's WIP rework of !3866) and result in terrible type errors(https://paste.tomsmeding.com/PtDRKhYf).
There are a lot of instances of `Anno` defined for general purpose types, like `[..]`, `Maybe ...`, `(..,..)`, which might be used in many contexts in the AST, requiring different annotations for each. However, currently these are tied to a single context and will always be given a single type of exactprint annotations.
For example, the following instances of `Anno` are OK, since they are defined for a specific kind of AST construct:
```haskell
type instance Anno (RuleBndr (GhcPass p)) = SrcSpan
type instance Anno (RuleDecl (GhcPass p)) = SrcSpanAnnA
type instance Anno (DerivStrategy (GhcPass p)) = SrcSpan
```
However, some instances are for more generic types that might be used in different contexts:
```haskell
-- For CompleteMatchSig
type instance Anno [LocatedN RdrName] = SrcSpan
type instance Anno [LocatedN Name] = SrcSpan
type instance Anno [LocatedN Id] = SrcSpan
```
`[LocatedN Name]` is a very general purpose type that may occur in multiple places in the AST. We can easily imagine some kind of future
construct which would require a `[Located Name]` field with a `SrcSpanAnnA` annotation instead. However, `Anno` implicitly ties it to the
current usage in `CompleteMatchSig`, which means that this type cannot currently be used anywhere else without significant refactoring.
Then here are even more horrific instances of `Anno`...
```haskell
type instance Anno [LocatedA ((StmtLR (GhcPass pl) (GhcPass pr) (LocatedA (HsExpr (GhcPass pr)))))] = SrcSpanAnnL
type instance Anno [LocatedA ((StmtLR (GhcPass pl) (GhcPass pr) (LocatedA (HsCmd (GhcPass pr)))))] = SrcSpanAnnL
```
One way to fix this could be to define a newtype wrapper that has a phantom type that determines the type of the annotation:
```haskell
newtype Annotated a x = Annotated { unAnnotate :: x }
type instance Anno (Annotated a x) = a
```
Then we can use `Annotated SrcSpan [LocatedN Name]` when we don't need additional annotations, `Annotated SrcSpanAnnA [LocatedN Name]` when we need `[LocatedN Name]` annotated with list annotations and so on.
~~I also propose the following refactoring to `GenLocated`, since we always need a `SrcSpan` regardless of the annotation type:~~(Scratch this idea, it doesn't seem to be worth it)
```haskell
data GenLocated a l e = L a l e
type instance XRec (GhcPass p) a = GenLocated (Anno a) SrcSpan a
type LocatedA = GenLocated AnnListItem SrcSpan
type LocatedN = GenLocated NameAnn SrcSpan
type Located = GenLocated () SrcSpan
-- For elements without annotations:
type instance Anno (RuleBndr (GhcPass p)) = ()
-- Or possibly
data NoAnn = NoAnn
type Located = GenLocated NoAnn SrcSpan
type instance Anno (RuleBndr (GhcPass p)) = NoAnn
```
~~Then we can get rid of the `type SrcAnn ann = SrcSpanAnn' (EpAnn ann)` and `type SrcSpanAnnA = SrcAnn AnnListItem` etc. types.~~
/cc @alanz @int-index @fendorHannes SiebenhandlHannes Siebenhandlhttps://gitlab.haskell.org/ghc/ghc/-/issues/19932Reduce AST & parser dependencies2023-08-12T14:02:18ZRichard Eisenbergrae@richarde.devReduce AST & parser dependenciesIn !5719, the number of parser dependencies edged up a bit, to the dissatisfaction of @shayne-fletcher-da. See https://blog.shaynefletcher.org/2020/10/ghc-lib-parser-module-count.html for more info. This ticket is about how to get the nu...In !5719, the number of parser dependencies edged up a bit, to the dissatisfaction of @shayne-fletcher-da. See https://blog.shaynefletcher.org/2020/10/ghc-lib-parser-module-count.html for more info. This ticket is about how to get the number of modules transitively depended on by the parser down.
I actually just tried to do it, but it's not so easy. Here is what I learned:
* SOURCE imports matter, because a SOURCE import still has to be in the same package as its importing module. Thus, if the parser depends on some low-level module that SOURCE-imports a high-level one, we're in for trouble.
* A key problem is that the parser transitively depends on `GHC.Driver.Env`, which brings in `GHC.Driver.Hooks` and transitively odd things like `GHC.Cmm`. No no no.
* It's hard to figure out exactly where this link happens, absent a visualization tool (which I did not try to set up).
* I found at least one way in, via `GHC.Hs.Expr`, which SOURCE-imports `GHC.Tc.Types`, which imports `GHC.Driver.Env`. So a good next step would be to not do this.
* Of course, the parser has to depend on its AST, exported from `GHC.Hs.Expr`. But does it? It really only needs the `GhcPs` variant of the AST. So I propose breaking `GHC.Hs.Expr` into `GHC.Hs.Expr.Parser`, `GHC.Hs.Expr.Rename`, and `GHC.Hs.Expr.Tc`, each of which includes the definitions needed for its pass of the compiler. It seems likely we'd be able to get `GHC.Hs.Expr.Parser` not to depend on `GHC.Tc.Types`, and thus perhaps not on `GHC.Driver.Env`. I have not tried this, at all, because it would be fairly major surgery, and it would make looking up type instances for e.g. `XHsPar` harder. This seems like a promising way forward, however.
I'm moving on from this challenge now. This ticket is merely to serve as a small brain dump of what I accomplished. I pushed some (working) code to the `wip/lower-parser-deps` branch, which anyone is free to take over. That commit successfully drops the dependency on `GHC.Core.Lint`, and breaks up `GHC.Cmm.Expr` in an attempt to kill the dependency on `GHC.Cmm`. That last effort succeeded at breaking up `GHC.Cmm.Expr`, but did not actually lose any dependency. (It actually picked up two new dependencies, because I created two new modules.) Perhaps picking up this branch is helpful, or perhaps not. I leave it to the next person to decide.
## Test files that track dependencies, and thus progress:
- [`testsuite/tests/count-deps/CountDepsAst.stdout`](https://gitlab.haskell.org/ghc/ghc/-/blob/master/testsuite/tests/count-deps/CountDepsAst.stdout)
- [`testsuite/tests/count-deps/CountDepsParsr.stdout`](https://gitlab.haskell.org/ghc/ghc/-/blob/master/testsuite/tests/count-deps/CountDepsParser.stdout)https://gitlab.haskell.org/ghc/ghc/-/issues/18764Strict TTG extension fields2022-02-23T17:03:30ZVladislav ZavialovStrict TTG extension fieldsCurrently, the TTG extension fields are not strict:
```
data HsType pass
= ...
| HsTyVar (XTyVar pass) PromotionFlag (LIdP pass)
| HsAppTy (XAppTy pass) (LHsType pass) (LHsType pass)
| ...
```
But they very well could be:
``...Currently, the TTG extension fields are not strict:
```
data HsType pass
= ...
| HsTyVar (XTyVar pass) PromotionFlag (LIdP pass)
| HsAppTy (XAppTy pass) (LHsType pass) (LHsType pass)
| ...
```
But they very well could be:
```
data HsType pass
= ...
| HsTyVar !(XTyVar pass) PromotionFlag (LIdP pass)
| HsAppTy !(XAppTy pass) (LHsType pass) (LHsType pass)
| ...
```
This would allow one to "exclude" certain constructors by using `Void`:
```
type instance XTyVar GhcTc = Void
```
And then the pattern match exhaustiveness checker would see that `HsTyVar` is an impossible case, without an explicit call to `absurd`. We already do that for sum extension fields, but for some reason not for all of them.
It's easy to recover laziness by using a wrapper, at the cost of a pointer indirection:
```
data Box a = Box a
```
I propose to make all extension fields throughout the compiler strict. Thoughts?Vladislav ZavialovVladislav Zavialovhttps://gitlab.haskell.org/ghc/ghc/-/issues/18758Remove NoGhcTc, allow HsType GhcTc, HsDecl GhcTc2023-04-11T11:15:20ZVladislav ZavialovRemove NoGhcTc, allow HsType GhcTc, HsDecl GhcTc## Background
Currently, we carefully avoid `HsType GhcTc` or `HsDecl GhcTc`, by means of the `NoGhcTc` type family:
```
| HsAppType (XAppTypeE p) -- After typechecking: the type argument
(LHsExpr p)
(LHsWcType ...## Background
Currently, we carefully avoid `HsType GhcTc` or `HsDecl GhcTc`, by means of the `NoGhcTc` type family:
```
| HsAppType (XAppTypeE p) -- After typechecking: the type argument
(LHsExpr p)
(LHsWcType (NoGhcTc p)) -- ^ Visible type application
```
The primary reason for this is that kind-checking and desugaring of types are intertwined. We mostly work with `TcType` instead of `HsType GhcTc` because it's more convenient in some places (e.g. in `unifyKind` and `unifyType`).
## Motivation A
A better architecture would be to have similar pipelines for terms and types:
* `HsExpr GhcPs -> HsExpr GhcRn -> HsExpr GhcTc`
* `HsType GhcPs -> HsType GhcRn -> HsType GhcTc`
This would allow us to talk about e.g. `HsDecl GhcTc`. For example, when discussing #12088, there was an idea of a refactoring that would separate `tcTyClDecl` and zonking. But then we'd like the type of `tcTyClDecl` to be:
```haskell
tcTyClDecl :: LTyClDecl GhcRn -> TcM (LTyClDecl GhcTc)
```
And that's not currently possible.
## Motivation B
This would facilitate fixing #15824, for instance, as we could use `HsType GhcTc` as the input to `GHC.Tc.Gen.Splice.reifyType`. This way, we would retain the `HsOpTy` and `HsAppTy` distinction.
## Partial Solution
In order to address Motivation B, we would need to properly embed coercions into `HsType GhcTc` and start using it throughout the type checker. However, that would be a very major, intrusive refactoring. Before we do that, there's a stopgap solution that could be used to address Motivation A.
Define the following `XXType` instance:
```
type instance XXType GhcTc = HsTypeTc
data HsTypeTc = HsTypeTc TcType SDoc
```
Then `HsType GhcTc` would only ever use `XHsType (HsTypeTc ty doc)`. The fields are as follows:
* `TcType` is the kind-checked, desugared type
* `SDoc` is the result of pretty printing `HsType GhcRn`, before parentheses and infix operators were discarded
This is sufficient to let us talk about `HsType GhcTc` and `HsDecl GhcTc`, and remove the `NoGhcTc` type family.
## Full Solution
The full solution would involve using `HsType GhcTc` throughout the type checker, rewriting zonking and unification to work over `HsType GhcTc`, and so on. It would address Motivation A, and also let us remove the notion of `TcType`: the type checker would work with `HsType GhcTc`, and `Type` would be only used in Core. That would be a nice improvement, as we could remove `TcTyVar` and `AnonArgFlag` (maybe something else?) from the syntax of Core.
## Completion
1. The partial solution is implemented.
I think we should start with the partial solution, so that's what this ticket is about. The full solution will require much more thought and design effort, so we can get back to it later.Vladislav ZavialovVladislav Zavialovhttps://gitlab.haskell.org/ghc/ghc/-/issues/18155Replace `CoPat` data type with `HsWrapper2022-03-02T18:27:06ZJohn EricsonReplace `CoPat` data type with `HsWrapper!2553 Fixed up the pattern syntax a bit with TTG, and turned `CoPat` into a datatype to sometimes be used in instances for `XXPat`. But that data type should be replaced with `HsWrapper Pat`. See https://gitlab.haskell.org/ghc/ghc/-/mer...!2553 Fixed up the pattern syntax a bit with TTG, and turned `CoPat` into a datatype to sometimes be used in instances for `XXPat`. But that data type should be replaced with `HsWrapper Pat`. See https://gitlab.haskell.org/ghc/ghc/-/merge_requests/2553#note_253928 for details.
CC @rae @int-indexhttps://gitlab.haskell.org/ghc/ghc/-/issues/23447Where should "tokens" live in the abstract syntax tree?2023-12-21T21:18:04ZSimon Peyton JonesWhere should "tokens" live in the abstract syntax tree?Language.Haskell.Syntax is a *compiler-independent* data type for the Haskell abstract
syntax tree. It is designed to be [extensible using Trees that Grow](https://gitlab.haskell.org/ghc/ghc/-/wikis/implementing-trees-that-grow).
The q...Language.Haskell.Syntax is a *compiler-independent* data type for the Haskell abstract
syntax tree. It is designed to be [extensible using Trees that Grow](https://gitlab.haskell.org/ghc/ghc/-/wikis/implementing-trees-that-grow).
The question that this ticket addresses is **where should we store information about the precise position of the keywords and punctuation of the program?**.
Progress:
* !11716: move tokens for `HsLet` into the extension field, and `EpAnn` stuff into the `<xrec-stuff>` field
* !11756: move tokens into `GhcPs` extension fields
There has been some discussion in the past:
* The ["API annotations" wiki page](https://gitlab.haskell.org/ghc/ghc/-/wikis/api-annotations)
* #19623
* #22558
* MRs in flight: !9476 !9477
* [ghc-devs discussion thread (July 23)](https://mail.haskell.org/pipermail/ghc-devs/2023-July/021305.html)
## Tokens
We use the term **tokens** for the "keywords and punctuation".
We already have the type `HsToken` defined in `Language.Haskell.Syntax`,
defined as follows:
```
type LHsToken tok p = XRec p (HsToken tok)
data HsToken (tok :: Symbol) = HsTok
```
So `LHsToken p "wombat"` represents the keyword `wombat`, with the "wombat" in the type giving some helpful documentation. The main payload is the `XRec` part which allows a client to record the location of the token.
## Motivation
Why do we want to store those tokens in the syntax tree at all? Use cases:
1. Refactoring tools could parse the source program, modify a small part of it, and print it back into the source file. The formatting of unmodified parts should be preserved, so we need the locations of every token (that's called "exactprinting").
2. Haddock needs to associate documentation comments with AST nodes. Doing so in the parser is very difficult, so we just accumulate the comments in a list and insert them back into the tree in a separate pass. We need token location information to do this.
In other cases, those tokens are an annoyance:
1. `template-haskell`, as well as any other GHC API client that generates ASTs, doesn't have token locations and has to fill them with `noHsTok`.
2. The renamer, the type checker, and the desugarer have no use for those tokens. Passing them around is a distraction from the actual renaming/type-checking/desugaring logic.
## Possible Approaches
There are two general approaches:
* **Token Plan A**. Tokens are not part of the *abstract* syntax tree, and do not belong in Language.Haskell.Syntax at all. If you want to store that stuff, do it in an extension field.
* **Token Plan B**. It is often helpful to be able to reproduce *precisely* what the
programmer wrote (so called "exact-print"). That means knowing precisely where the keywords and
punctuation were. Rather than duplicating this rendering/pretty-printing code separately for each tool, it would be nice to do it once, in Language.Haskell.Syntax
One might argue that this makes our AST less abstract, so it’s actually a concrete syntax tree. But Language.Haskell.Syntax already retain some information uncharacteristic of a proper AST, such as parentheses (with `HsPar`), so adding token information is arguably appropriate.
Currently in GHC HEAD we have mainly Plan A, with a spinkling of Plan B. For example:
```
data HsExpr p
= ...
| HsPar (XPar p)
!(LHsToken "(" p)
(LHsExpr p) -- ^ Parenthesised expr; see Note [Parens in HsSyn]
!(LHsToken ")" p)
```
But we have no clear decision or plan. Hence this ticket.
## Details about Plan A
To put the token information in the extension fields, a client of Language.Hasekll.Syntax
would do something like this. Here is the declaration of `HsExpr`:
```haskell
data HsExpr p
= ...
| HsLet (XLet p)
(LHsLocalBinds p)
(LHsExpr p)
```
The question is then downstream API users should consume these annotations while still being able to extend the AST themselves. I think this can be accomplished by introducing a new pass transformer, `WithExact`:
```haskell
-- | A pass @p@ augmented with information necessary for exact-printing.
data WithExact p
```
We can then introduce the appropriate type family instances to capture tokens as necessary. For instance, `let` might look like:
```haskell
type instance XLet (WithExact p) = (XLet p, (LHsToken p "let", LHsToken p "in"))
```
The various GHC passes would then be defined as:
```haskell
type GhcPs = WithExact (GhcPass 'Parsed)
type GhcRn = WithExact (GhcPass 'Renamed)
type GhcTc = WithExact (GhcPass 'Typechecked)
```
Now the extension field is always a pair, of the previous `XLet p` information, and a tuple of tokens. If there are no tokens for a constructor `K` one could say
```
type instance XK (WithExact p) = XK p
```
Alternatively one could use a data type with named fields:
```
type instance XLet (WithExact p) = ExactLet p
data ExactLet p = ExactLet { exactLetLet :: !(LHsToken p "let")
, exactLetIn :: !(LHsToken p "in")
, exactLetX :: XLet p
}
```
but that seems overkill: the tokens are already self-documenting.
## Details about Plan B
In Plan B we directly put the tokens in the tree, *not* in extension fields.
We can do so using two different stles:
* **Token Plan B1**: keep the tokens together in a tuple.
* **Token Plan BN**: spread the tokens across the data constructor in suggestive places.
For example, for `HsLet` here is what Plan B1 looks like:
```
data HsExpr p
= ...
...
| HsLet (XLet p)
+ (LHsToken p "let", LHsToken p "in")
(LHsLocalBinds p)
(LHsExpr p)
```
And here is the same for Plan BN:
```
data HsExpr p
= ...
...
| HsLet (XLet p)
+ (LHsToken p "let")
(LHsLocalBinds p)
+ (LHsToken p "in")
(LHsExpr p)
```
## Comparing plans
Plan A advantages:
* Clients can completely ignore all the exact-print stuff. With Plan B they have to handle those fields, if only to pass them on. With Plan B1 that is not too bad (one field), but it's pretty tiresome with Plan BN.
* Runtime: Plan A doesn't have to pay for exact-print information if it doesn't use it. Plan B allocates more: every data constructor gets more fields, and each pass needs to copy those fields into a new copy of the construtor. Plan B1 is better than Plan BN in this respect.
* Generated code: some clients (such as GHC) *generate* HsSyn, e.g. by desugaring source. For this generated code, the location of the tokens makes no sense. Plan A does not force programmers to invent fake tokens; Plan B does.
Plan B advantages:
* The Big Adantage is to be able to write a single, client-independent exact-print pretty-printer.
* The data type declaration for Plan BN looks quite perspicuous: the tokens appear in the data type interspersed with the non-token arguments, just as in the concrete syntax.
* When a GHC pass uses the extension field, it doesn't need to worry about pairing it up with the exact-print information.
## Missing information
The main benefit of Plan B is that we can make a single exact-print implementation,
in Language.Haskell.Syntax. But that means more than putting `LHsToken` in the
tree: it means that exact-print has to be able to get `SrcSpan`s out of `XRec`.
How does it do that? We need to see that design; otherwise we don't know if
we'll get the payoff.