GHC issueshttps://gitlab.haskell.org/ghc/ghc/-/issues2023-09-14T19:20:21Zhttps://gitlab.haskell.org/ghc/ghc/-/issues/23935Empty Haddock comments no longer occur in the AST as `HsDoc`2023-09-14T19:20:21ZamesgenEmpty Haddock comments no longer occur in the AST as `HsDoc`## Summary
Consider the following two type signatures.
```haskell
foo :: {- |-} A -> B
bar :: {- | -} A -> B
```
Comparing the AST (with `-haddock`) of `foo` and `bar`, note that `foo` does not contain a `HsDoc` (searchf or `WithHsDocId...## Summary
Consider the following two type signatures.
```haskell
foo :: {- |-} A -> B
bar :: {- | -} A -> B
```
Comparing the AST (with `-haddock`) of `foo` and `bar`, note that `foo` does not contain a `HsDoc` (searchf or `WithHsDocIdentifiers`), but `bar` does:
<table>
<tr><th>
`foo`</th><th>
`bar`</th></tr>
<tr>
<td>
```haskell
(L
(SrcSpanAnn (EpAnn
(Anchor
{ <interactive>:1:1-20 }
(UnchangedAnchor))
(AnnListItem
[])
(EpaComments
[])) { <interactive>:1:1-20 })
(SigD
(NoExtField)
(TypeSig
(EpAnn
(Anchor
{ <interactive>:1:1-3 }
(UnchangedAnchor))
(AnnSig
(AddEpAnn AnnDcolon (EpaSpan { <interactive>:1:5-6 }))
[])
(EpaComments
[]))
[(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:1-3 })
(Unqual
{OccName: foo}))]
(HsWC
(NoExtField)
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:15-20 })
(HsSig
(NoExtField)
(HsOuterImplicit
(NoExtField))
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:15-20 })
(HsFunTy
(EpAnn
(Anchor
{ <interactive>:1:15 }
(UnchangedAnchor))
(NoEpAnns)
(EpaComments
[]))
(HsUnrestrictedArrow
(L
(TokenLoc
(EpaSpan { <interactive>:1:17-18 }))
(HsNormalTok)))
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:15 })
(HsTyVar
(EpAnn
(Anchor
{ <interactive>:1:15 }
(UnchangedAnchor))
[]
(EpaComments
[]))
(NotPromoted)
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:15 })
(Unqual
{OccName: A}))))
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:20 })
(HsTyVar
(EpAnn
(Anchor
{ <interactive>:1:20 }
(UnchangedAnchor))
[]
(EpaComments
[]))
(NotPromoted)
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:20 })
(Unqual
{OccName: B}))))))))))))
```
</td>
<td>
```haskell
(L
(SrcSpanAnn (EpAnn
(Anchor
{ <interactive>:1:1-21 }
(UnchangedAnchor))
(AnnListItem
[])
(EpaComments
[])) { <interactive>:1:1-21 })
(SigD
(NoExtField)
(TypeSig
(EpAnn
(Anchor
{ <interactive>:1:1-3 }
(UnchangedAnchor))
(AnnSig
(AddEpAnn AnnDcolon (EpaSpan { <interactive>:1:5-6 }))
[])
(EpaComments
[]))
[(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:1-3 })
(Unqual
{OccName: bar}))]
(HsWC
(NoExtField)
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:16-21 })
(HsSig
(NoExtField)
(HsOuterImplicit
(NoExtField))
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:16-21 })
(HsFunTy
(EpAnn
(Anchor
{ <interactive>:1:16 }
(UnchangedAnchor))
(NoEpAnns)
(EpaComments
[]))
(HsUnrestrictedArrow
(L
(TokenLoc
(EpaSpan { <interactive>:1:18-19 }))
(HsNormalTok)))
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:16 })
(HsDocTy
(EpAnnNotUsed)
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:16 })
(HsTyVar
(EpAnn
(Anchor
{ <interactive>:1:16 }
(UnchangedAnchor))
[]
(EpaComments
[]))
(NotPromoted)
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:16 })
(Unqual
{OccName: A}))))
(L
{ <interactive>:1:8-14 }
(WithHsDocIdentifiers
(NestedDocString
(HsDocStringNext)
(L
{ <interactive>:1:8-14 }
(HsDocStringChunk
" ")))
[]))))
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:21 })
(HsTyVar
(EpAnn
(Anchor
{ <interactive>:1:21 }
(UnchangedAnchor))
[]
(EpaComments
[]))
(NotPromoted)
(L
(SrcSpanAnn (EpAnnNotUsed) { <interactive>:1:21 })
(Unqual
{OccName: B}))))))))))))
```
</td>
</tr>
</table>
Is there a particular reason for this? In GHC 8.10, the AST contained Haddock comments in both cases.
Concrete effects of this behavior:
- It makes the job of formatters like Ormolu (see issues [1068](https://github.com/tweag/ormolu/pull/1068), [1065](https://github.com/tweag/ormolu/issues/1065), [726](https://github.com/tweag/ormolu/issues/726)) that check of AST discrepancies automatically harder than necessary, as eg a natural rewrite from
```haskell
foo ::
-- |
--
A ->
B
```
to
```haskell
foo ::
-- |
A ->
B
```
contains a Haddock comment in the AST in the first snippet, but not in the second.
- A nice Haddock trick by @tomjaguarpaw1 ([blog post](http://h2.jaguarpaw.co.uk/posts/improving-the-typed-process-documentation/), search for "Forced type signatures to wrap") does [no longer work](https://github.com/tweag/ormolu/pull/1068#issuecomment-1707237587).
Ideally, the behavior would be changed as it was in 8.10; I could try to do that in case this behavior is not intentional.
## Environment
* GHC version used: Any GHC since 9.0 (I think this change is due to !2377)https://gitlab.haskell.org/ghc/ghc/-/issues/23916Proposed refactoring: Make lambda and \case use the same constructor.2023-10-02T08:53:27ZAndrei BorzenkovProposed refactoring: Make lambda and \case use the same constructor.The `\case` and `\cases` AST nodes are defined by a single constructor, but lambda isn't:
```haskell
| HsLam (XLam p) (MatchGroup p (LHsExpr p))
| HsLamCase (XLamCase p) LamCaseVariant (MatchGroup p (LHsExpr p)...The `\case` and `\cases` AST nodes are defined by a single constructor, but lambda isn't:
```haskell
| HsLam (XLam p) (MatchGroup p (LHsExpr p))
| HsLamCase (XLamCase p) LamCaseVariant (MatchGroup p (LHsExpr p))
```
`XLam` actually is `NoExtField`, so we can merge these data constructors without any additional problems. The main change:
```diff
data LamCaseVariant
= LamCase -- ^ `\case`
| LamCases -- ^ `\cases`
+ | Lambda -- ^ `\`
data HsExpr p =
...
- | HsLam (XLam p) (MatchGroup p (LHsExpr p))
...
```
And the same for `HsCmdLam`https://gitlab.haskell.org/ghc/ghc/-/issues/23893Proposed refactoring: parameterize Match over Pat2023-09-01T09:40:54ZVladislav ZavialovProposed refactoring: parameterize Match over PatThe data type `Match` in `compiler/Language/Haskell/Syntax/Expr.hs` is defined as follows:
```haskell
data Match p body
= Match {
m_ext :: XCMatch p body,
m_ctxt :: HsMatchContext p,
-- See Note [m_ctxt in Ma...The data type `Match` in `compiler/Language/Haskell/Syntax/Expr.hs` is defined as follows:
```haskell
data Match p body
= Match {
m_ext :: XCMatch p body,
m_ctxt :: HsMatchContext p,
-- See Note [m_ctxt in Match]
m_pats :: [LPat p], -- The patterns
m_grhss :: (GRHSs p body)
}
| XMatch !(XXMatch p body)
```
There's the `body` parameter that can be instantiated to `LHsExpr` or `LHsCmd`, while the patterns are hardcoded to use `LPat`. As it turned out in https://gitlab.haskell.org/ghc/ghc/-/merge_requests/11109#note_521920, we could also make use of parameterization over `pat`:
```diff
- m_pats :: [LPat p], -- The patterns
+ m_pats :: [pat], -- The patterns
```
An attempt to make this change revealed that it has enough knock-on effects that I suggest we do it as a separate refactoring, so that !11109 can focus on actually adding support for `@`-binders.https://gitlab.haskell.org/ghc/ghc/-/issues/23447Where should "tokens" live in the abstract syntax tree?2023-12-21T21:18:04ZSimon Peyton JonesWhere should "tokens" live in the abstract syntax tree?Language.Haskell.Syntax is a *compiler-independent* data type for the Haskell abstract
syntax tree. It is designed to be [extensible using Trees that Grow](https://gitlab.haskell.org/ghc/ghc/-/wikis/implementing-trees-that-grow).
The q...Language.Haskell.Syntax is a *compiler-independent* data type for the Haskell abstract
syntax tree. It is designed to be [extensible using Trees that Grow](https://gitlab.haskell.org/ghc/ghc/-/wikis/implementing-trees-that-grow).
The question that this ticket addresses is **where should we store information about the precise position of the keywords and punctuation of the program?**.
Progress:
* !11716: move tokens for `HsLet` into the extension field, and `EpAnn` stuff into the `<xrec-stuff>` field
* !11756: move tokens into `GhcPs` extension fields
There has been some discussion in the past:
* The ["API annotations" wiki page](https://gitlab.haskell.org/ghc/ghc/-/wikis/api-annotations)
* #19623
* #22558
* MRs in flight: !9476 !9477
* [ghc-devs discussion thread (July 23)](https://mail.haskell.org/pipermail/ghc-devs/2023-July/021305.html)
## Tokens
We use the term **tokens** for the "keywords and punctuation".
We already have the type `HsToken` defined in `Language.Haskell.Syntax`,
defined as follows:
```
type LHsToken tok p = XRec p (HsToken tok)
data HsToken (tok :: Symbol) = HsTok
```
So `LHsToken p "wombat"` represents the keyword `wombat`, with the "wombat" in the type giving some helpful documentation. The main payload is the `XRec` part which allows a client to record the location of the token.
## Motivation
Why do we want to store those tokens in the syntax tree at all? Use cases:
1. Refactoring tools could parse the source program, modify a small part of it, and print it back into the source file. The formatting of unmodified parts should be preserved, so we need the locations of every token (that's called "exactprinting").
2. Haddock needs to associate documentation comments with AST nodes. Doing so in the parser is very difficult, so we just accumulate the comments in a list and insert them back into the tree in a separate pass. We need token location information to do this.
In other cases, those tokens are an annoyance:
1. `template-haskell`, as well as any other GHC API client that generates ASTs, doesn't have token locations and has to fill them with `noHsTok`.
2. The renamer, the type checker, and the desugarer have no use for those tokens. Passing them around is a distraction from the actual renaming/type-checking/desugaring logic.
## Possible Approaches
There are two general approaches:
* **Token Plan A**. Tokens are not part of the *abstract* syntax tree, and do not belong in Language.Haskell.Syntax at all. If you want to store that stuff, do it in an extension field.
* **Token Plan B**. It is often helpful to be able to reproduce *precisely* what the
programmer wrote (so called "exact-print"). That means knowing precisely where the keywords and
punctuation were. Rather than duplicating this rendering/pretty-printing code separately for each tool, it would be nice to do it once, in Language.Haskell.Syntax
One might argue that this makes our AST less abstract, so it’s actually a concrete syntax tree. But Language.Haskell.Syntax already retain some information uncharacteristic of a proper AST, such as parentheses (with `HsPar`), so adding token information is arguably appropriate.
Currently in GHC HEAD we have mainly Plan A, with a spinkling of Plan B. For example:
```
data HsExpr p
= ...
| HsPar (XPar p)
!(LHsToken "(" p)
(LHsExpr p) -- ^ Parenthesised expr; see Note [Parens in HsSyn]
!(LHsToken ")" p)
```
But we have no clear decision or plan. Hence this ticket.
## Details about Plan A
To put the token information in the extension fields, a client of Language.Hasekll.Syntax
would do something like this. Here is the declaration of `HsExpr`:
```haskell
data HsExpr p
= ...
| HsLet (XLet p)
(LHsLocalBinds p)
(LHsExpr p)
```
The question is then downstream API users should consume these annotations while still being able to extend the AST themselves. I think this can be accomplished by introducing a new pass transformer, `WithExact`:
```haskell
-- | A pass @p@ augmented with information necessary for exact-printing.
data WithExact p
```
We can then introduce the appropriate type family instances to capture tokens as necessary. For instance, `let` might look like:
```haskell
type instance XLet (WithExact p) = (XLet p, (LHsToken p "let", LHsToken p "in"))
```
The various GHC passes would then be defined as:
```haskell
type GhcPs = WithExact (GhcPass 'Parsed)
type GhcRn = WithExact (GhcPass 'Renamed)
type GhcTc = WithExact (GhcPass 'Typechecked)
```
Now the extension field is always a pair, of the previous `XLet p` information, and a tuple of tokens. If there are no tokens for a constructor `K` one could say
```
type instance XK (WithExact p) = XK p
```
Alternatively one could use a data type with named fields:
```
type instance XLet (WithExact p) = ExactLet p
data ExactLet p = ExactLet { exactLetLet :: !(LHsToken p "let")
, exactLetIn :: !(LHsToken p "in")
, exactLetX :: XLet p
}
```
but that seems overkill: the tokens are already self-documenting.
## Details about Plan B
In Plan B we directly put the tokens in the tree, *not* in extension fields.
We can do so using two different stles:
* **Token Plan B1**: keep the tokens together in a tuple.
* **Token Plan BN**: spread the tokens across the data constructor in suggestive places.
For example, for `HsLet` here is what Plan B1 looks like:
```
data HsExpr p
= ...
...
| HsLet (XLet p)
+ (LHsToken p "let", LHsToken p "in")
(LHsLocalBinds p)
(LHsExpr p)
```
And here is the same for Plan BN:
```
data HsExpr p
= ...
...
| HsLet (XLet p)
+ (LHsToken p "let")
(LHsLocalBinds p)
+ (LHsToken p "in")
(LHsExpr p)
```
## Comparing plans
Plan A advantages:
* Clients can completely ignore all the exact-print stuff. With Plan B they have to handle those fields, if only to pass them on. With Plan B1 that is not too bad (one field), but it's pretty tiresome with Plan BN.
* Runtime: Plan A doesn't have to pay for exact-print information if it doesn't use it. Plan B allocates more: every data constructor gets more fields, and each pass needs to copy those fields into a new copy of the construtor. Plan B1 is better than Plan BN in this respect.
* Generated code: some clients (such as GHC) *generate* HsSyn, e.g. by desugaring source. For this generated code, the location of the tokens makes no sense. Plan A does not force programmers to invent fake tokens; Plan B does.
Plan B advantages:
* The Big Adantage is to be able to write a single, client-independent exact-print pretty-printer.
* The data type declaration for Plan BN looks quite perspicuous: the tokens appear in the data type interspersed with the non-token arguments, just as in the concrete syntax.
* When a GHC pass uses the extension field, it doesn't need to worry about pairing it up with the exact-print information.
## Missing information
The main benefit of Plan B is that we can make a single exact-print implementation,
in Language.Haskell.Syntax. But that means more than putting `LHsToken` in the
tree: it means that exact-print has to be able to get `SrcSpan`s out of `XRec`.
How does it do that? We need to see that design; otherwise we don't know if
we'll get the payoff.https://gitlab.haskell.org/ghc/ghc/-/issues/21834ApplicativeStmt is only introduced in the renamer, so can move into a TTG con...2022-07-12T13:58:53ZAlan ZimmermanApplicativeStmt is only introduced in the renamer, so can move into a TTG constructor extensionThis is a proposed cleanup for TTGThis is a proposed cleanup for TTGRodrigo MesquitaRodrigo Mesquitahttps://gitlab.haskell.org/ghc/ghc/-/issues/21628TTG: On GHC.Data.FastString2024-01-30T18:29:57ZRodrigo MesquitaTTG: On GHC.Data.FastStringTo keep following the TTG goal of a GHC-independent AST, we want to remove all imports of `GHC.*` from the `Language.Haskell.*` hierarchy. One of these imports is `FastString`.
~~After some discussion below we've decided to move `FastSt...To keep following the TTG goal of a GHC-independent AST, we want to remove all imports of `GHC.*` from the `Language.Haskell.*` hierarchy. One of these imports is `FastString`.
~~After some discussion below we've decided to move `FastString` from the `GHC.Data` module hierarchy to the "independent" `Data` one.~~
~~This issue should be closed after `FastString` lives in this independent space, and all `Language.Haskell.*` imports of `FastString` then import `Data.FastString` rather than `GHC.Data.FastString`~~
Currently, `FastString` from `GHC.Data.FastString` is used multiple times in the client-independent AST living in `Language.Haskell.Syntax`.
The occurrences are:
```hs
data HsExpr id
= ...
| HsOverLabel (XOverLabel p) FastString
...
| HsQuasiQuote -- See Note [Quasi-quote overview] in GHC.Tc.Gen.Splice
(XQuasiQuote id)
(IdP id) -- Splice point
(IdP id) -- Quoter
SrcSpan -- The span of the enclosed string
FastString -- The enclosed string
...
data HsLit id
= ...
| HsString (XHsString x) {- SourceText -} FastString
data OverLitVal
= ...
| HsIsString !SourceText !FastString -- ^ String-looking literals
data HsTyLit
= ...
| HsStrTy SourceText FastString
newtype ModuleName = ModuleName FastString
```
I am wondering about how I can get rid of FastString in Language.Haskell.Syntax, and so by raising this issue I mean to discuss ways to solve this, and get started on implementing said solution
Perhaps some strategy resembling `IdP`...
Thank you,
romes
CC: @Ericson2314https://gitlab.haskell.org/ghc/ghc/-/issues/21592TTG: the language-haskell package2023-08-07T23:11:12ZRodrigo MesquitaTTG: the language-haskell packageThis ticket describes the plan and follows the implementation of separating the Haskell AST and parsing stage to packages `haskell-syntax` and `haskell-parser` to achieve modularity and code reuse across clients
We propose the following...This ticket describes the plan and follows the implementation of separating the Haskell AST and parsing stage to packages `haskell-syntax` and `haskell-parser` to achieve modularity and code reuse across clients
We propose the following goals to reach this package separation:
1. A module hierarchy `Language.Haskell.Syntax` which contains the base Haskell AST. Just the data types, very few functions, since there is virtually nothing you can do without knowing *any* of the type family instances.
2. A module hierarchy `Language.Haskell.Parser` that defines `HsExpr Parsed`, and has the properties enumerated below. Presumably this package buys into the full `Anno` infrastructure to decorate the parse tree.
3. Rejig GHC to use `HsExpr Parsed` rather than `HsExpr (GhcPass Parsed)`.
4. Move the first two from `ghc` to an `haskell-syntax` package.
5. Figure out what needs to be done to create a `haskell-parser` package, later.
----
Properties:
1. A (pre-processed) string can be parsed into `HsExpr Parsed`. Let's call this operation `parse`. (By "pre-processed", we mean that there is no CPP left in the input string.)
2. `HsExpr Parsed` can be pretty-printed to a string. Let's call this operation `print`.
3. `parse` and `print` are exact inverses, in both directions.
4. `HsExpr Parsed` is a suitable data structure to use as the input to a compilation pipeline. It is structured and recursive.
5. `HsExpr Parsed` depends on a minimum of infrastructure. In particular, no aspects of e.g. type-checking or code generation should be depended on by any module defining the `HsExpr` type, nor the `parse` or `print` functions.
6. `HsExpr` is extensible, fitting with the TTG framework for extension in other phases; the unextended `HsExpr` contains the essence of the Haskell AST, with the parts that are useful to many syntax processors.
---
For (1) we need to remove all dependencies of GHC from all L.H.S modules. This is well underway (move HsModule) https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8228 (move MatchGroup Origin field) https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8301 (TTG for splices) https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7821 (removing other dependencies) https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8308 (dealing with FastString) https://gitlab.haskell.org/ghc/ghc/-/issues/21628. We need to keep reviewing these and enforcing a criteria: the base AST should represent the complexity of the source code, but doesn't need to keep e.g. annotations, that's a Parser extension to the AST.
For (2) we should start moving the parser modules to `Language.Haskell.Parser`. I don't think we should do this yet, (1) must be complete first
For (3) we need (2)
For (4), we can have it before (3), but need (1,2).
Let's keep tuning this description with the plan.
----
For history, here is the original ticket description:
This issue follows a previous discussion in a TTG related MR.
I want to further this discussion, in particular because of some design decisions concerning my MR https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8228 (see below)
Quoting @rae in https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4782#note_415958:
> I want to add a small new thought: What if the result of > parsing weren't GhcPs, but Parsed. That is, we have a client-independent phase indicator Parsed :: Type that lives in > > the client-independent space. GhcPass would just encompass GhcRn and GhcTc. For me, the fact that you can parse into > > and pretty-print from an AST is its fundamental property, and a great part to start any design discussion.
> One challenge is that, to really have the parse/pretty-print property, you need information about whitespace, comments, > keyword spellings, unicode syntax, and such. This information is mostly useless after parsing. Since our TTG structure
> offers no way to remove fields, putting this information in the base tree might make it less usable downstream. That > > might be why we use extension points for this data today.
> So maybe I advocate for a compromise position: put Parser as a general, client-independent pass. All the source annotation stuff would also live in the client-independent space. But design the AST so that there are extensions for Parser, around annotations it is unlikely for clients to need. It would be a judgment call around what information is likely useless to clients (for example, one could argue that the distinction between (x `f`) and f x is "useless"), but I think we are likely to make mostly good judgments around this.
I'm currently working in moving HsModule to L.H.S in https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8228,
and when dealing with GHC.Hs.ImpExp, I arrive at this:
```hs
-- | Imported or exported entity.
data IE pass
= IEVar (XIEVar pass) (LIEWrappedName pass (IdP pass))
...
| IEGroup (XIEGroup pass) Int (LHsDoc pass) -- ^ Doc section heading
| IEDoc (XIEDoc pass) (LHsDoc pass) -- ^ Some documentation
| IEDocNamed (XIEDocNamed pass) String -- ^ Reference to named doc
```
The first reason why I brought this discussion up again is because of `HsDoc`.
I was trying to understand whether `IEGroup` and `IEDoc` should be moved to the GHC specific part, or whether `HsDoc` should be moved to the client independent stage.
In light of the comment advocating the independent `Parsed` stage which is also independent from the independent base AST, I would say that `HsDoc` would be a good example of something to be put in the independent `Parsed` stage, but not in the base AST.
For now I'll move the constructors depending on `HsDoc` to the GHC specific part, and if we later on pick up the `Parsed` idea, we can move them to there.
I mixed a bit of https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8228 talk here, but to motivate further discussion.
For comments regarding this specific TTG Module x HsDoc discussion use the linked MR
Thank you,
~romesRodrigo MesquitaRodrigo Mesquitahttps://gitlab.haskell.org/ghc/ghc/-/issues/21263TTG follow-up: Template Haskell related2022-06-01T21:49:48ZRodrigo MesquitaTTG follow-up: Template Haskell relatedThese are TODOs regarding TTG to follow https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4782:
Quoting @simonpj:
I see that `UntypedSpliceFlavour` is used only in the client specific `GHC.Hs.Expr`; but it is defined in the client-in...These are TODOs regarding TTG to follow https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4782:
Quoting @simonpj:
I see that `UntypedSpliceFlavour` is used only in the client specific `GHC.Hs.Expr`; but it is defined in the client-independent `L.H.S.Expr`. Let's move it next to `PendingRnSplice`.
In L.H.S.Expr I see
```hs
| HsQuasiQuote -- See Note [Quasi-quote overview] in GHC.Tc.Gen.Splice
```
But that Note doesn't exist. Can we find it from history?
In L.H.S.Expr I see
```hs
data HsSplice id
= HsTypedSplice -- $$z or $$(f 4)
(XTypedSplice id)
SpliceDecoration -- Whether $$( ) variant found, for pretty printing
(IdP id) -- A unique name to identify this splice point
(LHsExpr id) -- See Note [Pending Splices]
```
The `(IdP id)` part is very GHC-specific. It should be in XTypedSplice.
Formally in #16830
- [x] `HsSpliced` should be moved to `XSplice`. See the TODO there from @alanz.
From @Ericson2314
- [x] Inline `HsSplice` altogether!
Just as we now have `Hs{Untyped,Typed}Bracket`, untyped and typed splices, and quasiquotes, are all distinct syntax, and come from morally distinct extensions. They should have separate top-level AST nodes too, for that reason, and for consistency's sake!
To close this issue we should
- [x] Move `UntypedSpliceFlavour` to client specific `GHC.Hs.Expr�` next to `PendingRnSplice`
- [x] Find `Note [Quase-quote overview]` in `GHC.Tc.Gen.Splice` that no longer exists from history
- [x] Refactor `IdP id` GHC-specific part into `XTypedSplice` from the constructor of `HsSplice` in L.H.S.ExprRodrigo MesquitaRodrigo Mesquitahttps://gitlab.haskell.org/ghc/ghc/-/issues/21262TTG follow-up: Outputable orphans2022-07-06T12:18:20ZRodrigo MesquitaTTG follow-up: Outputable orphansFollowup in cleaning-up after separating the AST from GHC-Pass !4778; regarding the orphan `Outputable` instances in `compiler/GHC/Hs/Expr.hs`
> It makes me wonder whether all the Outputable instances should be in GHC.Hs.Extension.GhcPa...Followup in cleaning-up after separating the AST from GHC-Pass !4778; regarding the orphan `Outputable` instances in `compiler/GHC/Hs/Expr.hs`
> It makes me wonder whether all the Outputable instances should be in GHC.Hs.Extension.GhcPass. I think they wouldn't be orphans there.
> After we add more TTG parameters to allow using non-GHC non-tie-the-know-oh-wait-hs-boots-everywhere, we can generalize these instances so they don't assume GhcPass.
> Outputable itself assumes some things about names and other GHC which would probably be lifted.
> So I don't know what's best in the short and medium terms, but I hope in the long term we can have some nice GHC-agnostic bring-your-own-configuration-record pretty printing. This will also help with #17957 and getting rid of the unsafe global cfg stuff.
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4778#note_324909Rodrigo MesquitaRodrigo Mesquitahttps://gitlab.haskell.org/ghc/ghc/-/issues/21242Separate Module AST from GhcPass2022-08-29T08:21:39ZJohn EricsonSeparate Module AST from GhcPass#18936 was closed out, but there was one glaring omission: `HsModule`.
In #17642 / !2423, `HsModule` was made momomorphic in `GhcPs`. But per the principle that the unextended AST should be "complete", I think that should be reverted, a...#18936 was closed out, but there was one glaring omission: `HsModule`.
In #17642 / !2423, `HsModule` was made momomorphic in `GhcPs`. But per the principle that the unextended AST should be "complete", I think that should be reverted, and a new `Language.Haskell.Syntax.Module` made.
What @int-index [wrote](https://gitlab.haskell.org/ghc/ghc/-/issues/17642#note_245191) in #17642 is also interesting (to me at least) and relevant here:
> Actually, I would rather see the opposite happen. The renamer could take `HsModule GhcPs` as input and produce `HsModule GhcRn` as output.
>
> It is fairly strange that we have
>
> ```
> type ParsedSource = Located (HsModule GhcPs)
> type RenamedSource = (HsGroup GhcRn, [LImportDecl GhcRn], Maybe [(LIE GhcRn, Avails)],
> Maybe LHsDocString)
> type TypecheckedSource = LHsBinds GhcTc
> ```
>
> instead of
>
> ```
> type ParsedSource = Located (HsModule GhcPs)
> type RenamedSource = Located (HsModule GhcRn)
> type TypecheckedSource = Located (HsModule GhcTc)
> ```
@RyanGlScott scott pointed out issues like `[HsDecl p]` vs `HsGroup p` would need to be resolved, but this doesn't daunt me. We are dealing with somewhat similar issues in !4782 right now (though I think !4782 is harder!), and so I don't doubt we could fix things here to.
(Probably the answer is the `HsGroup` should just be included in `HsModule`. `HsGroup` is currently *not* included in any other GHC-agnostic AST node.)
Finally, note per https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4778#note_324914 this will necessitate splitting `GHC.Hs.Doc` and `GHC.Hs.ImpExp` too.9.6.1Rodrigo MesquitaRodrigo Mesquitahttps://gitlab.haskell.org/ghc/ghc/-/issues/20415HsMatchContext GhcTc story2024-01-31T13:59:31ZArtyom KuznetsovHsMatchContext GhcTc story`HsMatchContext` has `FunRhs` constructor with `mc_fun` field which is only used for pretty printing (and as it turned out, in HIE). Since removal of `NoGhcTc` from that field, we now have to deal with having `IdP` there which introduces...`HsMatchContext` has `FunRhs` constructor with `mc_fun` field which is only used for pretty printing (and as it turned out, in HIE). Since removal of `NoGhcTc` from that field, we now have to deal with having `IdP` there which introduces several complications.
## Approach 0 (mpickering): Use `IdP (NoGhcTc p)` in `HsMatchCtxt`( !9624 )
If we accept the existing of `NoGhcTc` then the simplest implementation is to replace the `mc_fun` field
in `HsMatchContext` with `IdP (NoGhcTc p)` so that the field contains a RdrName during parsing and a `Name` as context.
This is precisely the right approach to take because the "Context" is something which we define, and in the typechecker the context can't be an Id, precisely because we haven't constructed that Id yet when we need to provide the context for which we will construct the Id.
## Approach 1: an new type family for `mc_fun` instead of `IdP`
Instead of having `IdP` in `mc_fun` we have `CtxIdP` type family which is defined like this:
```haskell
type instance CtxIdP (GhcPass p) = CtxIdGhcP p
type family CtxIdGhcP p where
CtxIdGhcP 'Parsed = RdrName
CtxIdGhcP 'Renamed = Name
CtxIdGhcP 'Typechecked = Name
```
This approach was deemed unsatisfactory because:
* This is basically a local reimplementation of `NoGhcTc`. We lose the consistency of "`GhcPs` things have `RdrName`s in them, `GhcRn` things have `Name`s in them and `GhcTc` things have `Id`s in them".
* This results in a fairly invasive change because we change the inner structure of a data type here and also because of `OutputableBndr`. In order to pretty print `CtxIdP` we need to know that it's outputable. This results in a bunch of extra `OutputableBndr` constraints everywhere.
## Approach 2: `SDoc` in `mc_fun`
Since `mc_fun` is only used for pretty printing, and having to add a lot of new `Outputable` constraints everywhere is annoying, why not just put `SDoc` there? This would let us to remove pass parametrization from `HsMatchContext` and `HsStmtContext` completely.
Well, turns out, there is another obstacle: HIE. We need `Name` for a `ToHie (HsMatchContext p)` instance. Me, @int-index and @simonpj talked about this and concluded that we don't know enough about HIE to determine whether it would break things to somehow avoid having that instance at all or deal with this some other way.
One thing that's known for sure is that it is used by Haddock one way or another, because using `Id` there resulted in some broken Haddock tests: https://gitlab.haskell.org/hithroc/ghc/-/jobs/764890#L4106
## Approach 3 (current approach): Keep `IdP` in `mc_fun` and work around having `Id` there for `GhcTc`
This turned out to be the least invasive approach, yet not without its drawbacks.
The main problem of this approach lies within `tcMonoBinds`. `tcMatchesFun` needs to put an `Id` in `mc_fun`, but in `tcMonoBinds` we don't get the `mono_id` until after we have `rhs_ty` which we get in `tcMatchesFun`. This was solved by using `fixM` and making a thunk of `rhs_ty` to put into `mono_id` and then `mono_id` is passed to `tcMatchesFun` which then fills in that thunk.
The drawback is that `fixM` code is fragile, we rely on the fact that `tcMatchesFun` doesn't look at `varType` of `mono_id`. This might result in some fun debugging experience for someone who accidentally changes some code and inspects that value. Dancing around black holes as @rae calls it I believe (and I'm really fond of this metaphor)
Another thing of note that having `fixM` here should let us get rid of `TcIdBndr_ExpType`, however that turned out to be problematic, because `TcIdBndr_ExpType` is used in `zonkTidyTcLclEnvs` to determine whether to zonk or not and without that it always zonks, which results in inspection of the value and an infinite loop.
Relevant merge request: !5579
## What is the best way forward?
All 3 approaches seem unsatisfactory. Seems like Approach 2 is the closest to being ideal: we don't want to introduce something like `CtxIdP` and we don't want `fixM`. Approach 2 achieves that if someone figures out how to deal with HIE.https://gitlab.haskell.org/ghc/ghc/-/issues/20151Clean up HsOverLit's ol_witness field2021-11-18T01:30:28ZGergő ÉrdiClean up HsOverLit's ol_witness fieldCurrently, `HsOverLit` has an `ol_witness` field that is used very inconsistently:
* Before renaming, it's vestigial
* After renaming, but before typechecking, it contains the coercion operator's name, e.g. `fromIntegral`
* After typech...Currently, `HsOverLit` has an `ol_witness` field that is used very inconsistently:
* Before renaming, it's vestigial
* After renaming, but before typechecking, it contains the coercion operator's name, e.g. `fromIntegral`
* After typechecking, it contains the value witness, e.g. `fromIntegral 42`
This is very confusing and also it suggests a bogus degree of freedom: that we can sidestep the rebindable syntax resolution by providing our own `ol_witness`es on the input to renaming.
"Trees that grow" of course gives us a way out: move `ol_witness` from `HsOverLit` to `OverLitTc`, add `OverLitRn` with a new `ol_from_fun` field (which is the *name* of the coercion operator, not a full-blown expression), and adapt all existing code.Gergő ÉrdiGergő Érdihttps://gitlab.haskell.org/ghc/ghc/-/issues/20039Refactor Anno type family and expcitly mark out annotation types in the AST.2021-06-29T14:45:01ZZubinRefactor Anno type family and expcitly mark out annotation types in the AST.The `Anno` type family is currently used to attach exactprint meta data to `SrcSpan`s via the `XRec` wrapper type family:
```haskell
type instance XRec (GhcPass p) a = GenLocated (Anno a) a
```
There is currently a [plan to move this i...The `Anno` type family is currently used to attach exactprint meta data to `SrcSpan`s via the `XRec` wrapper type family:
```haskell
type instance XRec (GhcPass p) a = GenLocated (Anno a) a
```
There is currently a [plan to move this information directly into the syntax tree](https://gitlab.haskell.org/ghc/ghc/-/wikis/api-annotations#token-information-in-the-syntax-tree-plan-b). However, this will be a major refactoring and has an indeterminate ETA. Meanwhile,
these instances get in the way of other people trying to use TTG to define custom "passes" over the AST (like @fendor's WIP rework of !3866) and result in terrible type errors(https://paste.tomsmeding.com/PtDRKhYf).
There are a lot of instances of `Anno` defined for general purpose types, like `[..]`, `Maybe ...`, `(..,..)`, which might be used in many contexts in the AST, requiring different annotations for each. However, currently these are tied to a single context and will always be given a single type of exactprint annotations.
For example, the following instances of `Anno` are OK, since they are defined for a specific kind of AST construct:
```haskell
type instance Anno (RuleBndr (GhcPass p)) = SrcSpan
type instance Anno (RuleDecl (GhcPass p)) = SrcSpanAnnA
type instance Anno (DerivStrategy (GhcPass p)) = SrcSpan
```
However, some instances are for more generic types that might be used in different contexts:
```haskell
-- For CompleteMatchSig
type instance Anno [LocatedN RdrName] = SrcSpan
type instance Anno [LocatedN Name] = SrcSpan
type instance Anno [LocatedN Id] = SrcSpan
```
`[LocatedN Name]` is a very general purpose type that may occur in multiple places in the AST. We can easily imagine some kind of future
construct which would require a `[Located Name]` field with a `SrcSpanAnnA` annotation instead. However, `Anno` implicitly ties it to the
current usage in `CompleteMatchSig`, which means that this type cannot currently be used anywhere else without significant refactoring.
Then here are even more horrific instances of `Anno`...
```haskell
type instance Anno [LocatedA ((StmtLR (GhcPass pl) (GhcPass pr) (LocatedA (HsExpr (GhcPass pr)))))] = SrcSpanAnnL
type instance Anno [LocatedA ((StmtLR (GhcPass pl) (GhcPass pr) (LocatedA (HsCmd (GhcPass pr)))))] = SrcSpanAnnL
```
One way to fix this could be to define a newtype wrapper that has a phantom type that determines the type of the annotation:
```haskell
newtype Annotated a x = Annotated { unAnnotate :: x }
type instance Anno (Annotated a x) = a
```
Then we can use `Annotated SrcSpan [LocatedN Name]` when we don't need additional annotations, `Annotated SrcSpanAnnA [LocatedN Name]` when we need `[LocatedN Name]` annotated with list annotations and so on.
~~I also propose the following refactoring to `GenLocated`, since we always need a `SrcSpan` regardless of the annotation type:~~(Scratch this idea, it doesn't seem to be worth it)
```haskell
data GenLocated a l e = L a l e
type instance XRec (GhcPass p) a = GenLocated (Anno a) SrcSpan a
type LocatedA = GenLocated AnnListItem SrcSpan
type LocatedN = GenLocated NameAnn SrcSpan
type Located = GenLocated () SrcSpan
-- For elements without annotations:
type instance Anno (RuleBndr (GhcPass p)) = ()
-- Or possibly
data NoAnn = NoAnn
type Located = GenLocated NoAnn SrcSpan
type instance Anno (RuleBndr (GhcPass p)) = NoAnn
```
~~Then we can get rid of the `type SrcAnn ann = SrcSpanAnn' (EpAnn ann)` and `type SrcSpanAnnA = SrcAnn AnnListItem` etc. types.~~
/cc @alanz @int-index @fendorHannes SiebenhandlHannes Siebenhandlhttps://gitlab.haskell.org/ghc/ghc/-/issues/19932Reduce AST & parser dependencies2023-08-12T14:02:18ZRichard Eisenbergrae@richarde.devReduce AST & parser dependenciesIn !5719, the number of parser dependencies edged up a bit, to the dissatisfaction of @shayne-fletcher-da. See https://blog.shaynefletcher.org/2020/10/ghc-lib-parser-module-count.html for more info. This ticket is about how to get the nu...In !5719, the number of parser dependencies edged up a bit, to the dissatisfaction of @shayne-fletcher-da. See https://blog.shaynefletcher.org/2020/10/ghc-lib-parser-module-count.html for more info. This ticket is about how to get the number of modules transitively depended on by the parser down.
I actually just tried to do it, but it's not so easy. Here is what I learned:
* SOURCE imports matter, because a SOURCE import still has to be in the same package as its importing module. Thus, if the parser depends on some low-level module that SOURCE-imports a high-level one, we're in for trouble.
* A key problem is that the parser transitively depends on `GHC.Driver.Env`, which brings in `GHC.Driver.Hooks` and transitively odd things like `GHC.Cmm`. No no no.
* It's hard to figure out exactly where this link happens, absent a visualization tool (which I did not try to set up).
* I found at least one way in, via `GHC.Hs.Expr`, which SOURCE-imports `GHC.Tc.Types`, which imports `GHC.Driver.Env`. So a good next step would be to not do this.
* Of course, the parser has to depend on its AST, exported from `GHC.Hs.Expr`. But does it? It really only needs the `GhcPs` variant of the AST. So I propose breaking `GHC.Hs.Expr` into `GHC.Hs.Expr.Parser`, `GHC.Hs.Expr.Rename`, and `GHC.Hs.Expr.Tc`, each of which includes the definitions needed for its pass of the compiler. It seems likely we'd be able to get `GHC.Hs.Expr.Parser` not to depend on `GHC.Tc.Types`, and thus perhaps not on `GHC.Driver.Env`. I have not tried this, at all, because it would be fairly major surgery, and it would make looking up type instances for e.g. `XHsPar` harder. This seems like a promising way forward, however.
I'm moving on from this challenge now. This ticket is merely to serve as a small brain dump of what I accomplished. I pushed some (working) code to the `wip/lower-parser-deps` branch, which anyone is free to take over. That commit successfully drops the dependency on `GHC.Core.Lint`, and breaks up `GHC.Cmm.Expr` in an attempt to kill the dependency on `GHC.Cmm`. That last effort succeeded at breaking up `GHC.Cmm.Expr`, but did not actually lose any dependency. (It actually picked up two new dependencies, because I created two new modules.) Perhaps picking up this branch is helpful, or perhaps not. I leave it to the next person to decide.
## Test files that track dependencies, and thus progress:
- [`testsuite/tests/count-deps/CountDepsAst.stdout`](https://gitlab.haskell.org/ghc/ghc/-/blob/master/testsuite/tests/count-deps/CountDepsAst.stdout)
- [`testsuite/tests/count-deps/CountDepsParsr.stdout`](https://gitlab.haskell.org/ghc/ghc/-/blob/master/testsuite/tests/count-deps/CountDepsParser.stdout)https://gitlab.haskell.org/ghc/ghc/-/issues/19252Clean-up after separating AST from GhcPass !47782022-03-19T14:14:51ZJohn EricsonClean-up after separating AST from GhcPass !4778The following discussions from !4778 should be addressed:
- [x] @rae started a [discussion](https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4778#note_324909): (+1 comment)
> This one has a helpful comment. It makes me wonder w...The following discussions from !4778 should be addressed:
- [x] @rae started a [discussion](https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4778#note_324909): (+1 comment)
> This one has a helpful comment. It makes me wonder whether all the `Outputable` instances should be in `GHC.Hs.Extension.GhcPass`. I think they wouldn't be orphans there.
>
> But maybe that's better for another commit, as orphans really aren't too terrible.
- [x] @rae started a [discussion](https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4778#note_324910): (+7 comments)
> Ew. I know why you want to do this. But it shouldn't be too hard to fix the problem first before doing this mega patch. For example, `HsBracketOut` should be implemented with extension points.
>
> I suppose you could convince me to let this patch in first and then remove these type families very soon thereafter, just because of the practical annoyance of keeping this patch alive.
- [x] @rae started a [discussion](https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4778#note_324911): (+1 comment)
> Another small abomination, like `HsDoRn`.
- [x] @rae started a [discussion](https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4778#note_324914): (+2 comments)
> I'm surprised this file has no further changes. Isn't this stuff part of the general Haskell AST?
>
> Ditto `Doc`.
- [x] @rae started a [discussion](https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4778#note_324920): (+1 comment)
> There are no references to this Note. I'm not quite sure where they should be, but there should be references *somewhere*. Maybe at the top of every list of `import`s within `Language.Haskell.Syntax`?
- [x] @rae started a [discussion](https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4778#note_324922): (+1 comment)
> I suppose this should go away, too.
- [x] @rae started a [discussion](https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4778#note_324926):
> This Note is both in the old file and this new one.
- [x] @rae started a [discussion](https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4778#note_324927):
> This Note is duplicated.
- [x] @rae started a [discussion](https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4778#note_324928):
> This Note is duplicated.
- [x] @rae started a [discussion](https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4778#note_324929):
> This Note is duplicated.
- [x] @rae started a [discussion](https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4778#note_324930):
> This Note is duplicated.
- [x] @Ericson2314 started a [discussion](https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4778#note_325410): (+1 comment)
> @rae Thanks for the thorough review, and also thanks for pointing out `--color-moved`.
>
> I've done the easy fixes, but I am a bit unsure where the duplicated notes *ought* to go --- that's party why they were duplicated in the first place. Notes having a habit of sometimes accumulating in one place or otherwise being non-local makes this feel less than straightforward to me.
>
> If anyone has an opinion on a few cases or a guiding principle, I'd be happy to follow it.
- [x] @alanz started a [discussion](https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4778#note_325648): (+1 comment)
> As discussed on IRC, I suggest merging this so the rebase burden is gone, then deciding at leisure where the notes go in a follow upJohn EricsonJohn Ericsonhttps://gitlab.haskell.org/ghc/ghc/-/issues/19218Reduce deps of AST, consider moving it into own package2022-08-29T08:23:12ZJohn EricsonReduce deps of AST, consider moving it into own package## Motivation
The end goal of TTG is to share extensible data types for Haskell's syntax between GHC and other projects. But if we are to do that, we should have those data types live in a separate package.
- This is more robust at pr...## Motivation
The end goal of TTG is to share extensible data types for Haskell's syntax between GHC and other projects. But if we are to do that, we should have those data types live in a separate package.
- This is more robust at preventing entanglement than e.g. "the parser dependencies test"
- Eventually things might stabilize enough that the base extensible AST doesn't have a breaking change every GHC release. That mean less churn on PVP bounds for consumers of just the AST and not the rest of GHC.
- Hopefully, this can be the beginning of modularizing GHC more broadly :).
The [relevant TTG wiki page is here](https://gitlab.haskell.org/ghc/ghc/-/wikis/implementing-trees-that-grow)
## Proposal
The big question is of course how we get there. The key trick is to do work getting ready to separate the AST into it's own package before actually doing it, because its too much of a refactor to do all at once.
- #18936 This talks about how to remove a major part of the GHC-specific stuff from the module
- In https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4778#note_323748 @SPJ talks about using a different module hierarchy to indicate to other programmers the intent for these modules tot be separate from GHC. I think that's a great idea.
## Specific tasks
```bash
git grep -h '^import' compiler/Language/Haskell/Syntax | sort
```
is a crude way to track progress
- [x] #21242 Separate module AST from GHC pass.
- [ ] #21592 `FastString`
- [ ] `Arity`
- [ ] `ConTag`
- [ ] Lint deps from reappearing, prior to the package being split off.https://gitlab.haskell.org/ghc/ghc/-/issues/18936Separate AST from GhcPass2022-05-24T19:50:22ZJohn EricsonSeparate AST from GhcPassOne of the end-goals of TTG is a Haskell AST that can be used by GHC and other projects alike.
An impediment of this that the the TTG instances are in the same module as the data types which use the families. This means there is no way ...One of the end-goals of TTG is a Haskell AST that can be used by GHC and other projects alike.
An impediment of this that the the TTG instances are in the same module as the data types which use the families. This means there is no way to even try to get a Haskell AST type without it's GHC-phase-specific ornaments. It would be good to separate:
- [x] Separate TTG type families from `GhcPass` / `Pass`, currently both are in `GHC.Hs.Extension`.
- [x] Separate AST data types from GHC-specific TTG type family instances, currently both are in the other `GHC.Hs.*` modules.
In the short term, I think this is best accomplished my baking more modules for just the AST types and families. Lots of helper functions are actually `GhcPass`-specific, (perhaps more than they need to be), but I wouldn't bother trying to deal with that at first as it leads to more complicated types / more type classes, which is controversial.
(This came up in the discussion of !1957, and also the mailing list on reducing the modules depended on by `ghc-lib-parser`)
CC @hsyl20https://gitlab.haskell.org/ghc/ghc/-/issues/18802Typecheck record update via desugaring2022-05-26T07:45:30ZSimon Peyton JonesTypecheck record update via desugaringThere are quite a few tickets relating to inadequacies of record update, including
* #18311
* #10856
* #2595
* #10808
* #3632: updating existentials if you do all of them at once
* #16501
* #21289
* #21158
Record update is a place whe...There are quite a few tickets relating to inadequacies of record update, including
* #18311
* #10856
* #2595
* #10808
* #3632: updating existentials if you do all of them at once
* #16501
* #21289
* #21158
Record update is a place where our general plan of typechecking the source syntax seems particularly hard to do. It wold be much easier (and correct by construction) to desugar (but still in HsSyn) and typecheck that.
Fortunately we now have a way to do that: our [Re-engineer rebindable syntax](https://gitlab.haskell.org/ghc/ghc/-/issues/17582) ticket, #17582. A lot of attention is paid there to maintaining good error messages, which is the main challenge of this approach.
So this ticket is to propose: let's use the work on #17582 to solve our record-update tickets.
See also #21158 for why this will be a breaking change.9.6.1https://gitlab.haskell.org/ghc/ghc/-/issues/18764Strict TTG extension fields2022-02-23T17:03:30ZVladislav ZavialovStrict TTG extension fieldsCurrently, the TTG extension fields are not strict:
```
data HsType pass
= ...
| HsTyVar (XTyVar pass) PromotionFlag (LIdP pass)
| HsAppTy (XAppTy pass) (LHsType pass) (LHsType pass)
| ...
```
But they very well could be:
``...Currently, the TTG extension fields are not strict:
```
data HsType pass
= ...
| HsTyVar (XTyVar pass) PromotionFlag (LIdP pass)
| HsAppTy (XAppTy pass) (LHsType pass) (LHsType pass)
| ...
```
But they very well could be:
```
data HsType pass
= ...
| HsTyVar !(XTyVar pass) PromotionFlag (LIdP pass)
| HsAppTy !(XAppTy pass) (LHsType pass) (LHsType pass)
| ...
```
This would allow one to "exclude" certain constructors by using `Void`:
```
type instance XTyVar GhcTc = Void
```
And then the pattern match exhaustiveness checker would see that `HsTyVar` is an impossible case, without an explicit call to `absurd`. We already do that for sum extension fields, but for some reason not for all of them.
It's easy to recover laziness by using a wrapper, at the cost of a pointer indirection:
```
data Box a = Box a
```
I propose to make all extension fields throughout the compiler strict. Thoughts?Vladislav ZavialovVladislav Zavialovhttps://gitlab.haskell.org/ghc/ghc/-/issues/18758Remove NoGhcTc, allow HsType GhcTc, HsDecl GhcTc2023-04-11T11:15:20ZVladislav ZavialovRemove NoGhcTc, allow HsType GhcTc, HsDecl GhcTc## Background
Currently, we carefully avoid `HsType GhcTc` or `HsDecl GhcTc`, by means of the `NoGhcTc` type family:
```
| HsAppType (XAppTypeE p) -- After typechecking: the type argument
(LHsExpr p)
(LHsWcType ...## Background
Currently, we carefully avoid `HsType GhcTc` or `HsDecl GhcTc`, by means of the `NoGhcTc` type family:
```
| HsAppType (XAppTypeE p) -- After typechecking: the type argument
(LHsExpr p)
(LHsWcType (NoGhcTc p)) -- ^ Visible type application
```
The primary reason for this is that kind-checking and desugaring of types are intertwined. We mostly work with `TcType` instead of `HsType GhcTc` because it's more convenient in some places (e.g. in `unifyKind` and `unifyType`).
## Motivation A
A better architecture would be to have similar pipelines for terms and types:
* `HsExpr GhcPs -> HsExpr GhcRn -> HsExpr GhcTc`
* `HsType GhcPs -> HsType GhcRn -> HsType GhcTc`
This would allow us to talk about e.g. `HsDecl GhcTc`. For example, when discussing #12088, there was an idea of a refactoring that would separate `tcTyClDecl` and zonking. But then we'd like the type of `tcTyClDecl` to be:
```haskell
tcTyClDecl :: LTyClDecl GhcRn -> TcM (LTyClDecl GhcTc)
```
And that's not currently possible.
## Motivation B
This would facilitate fixing #15824, for instance, as we could use `HsType GhcTc` as the input to `GHC.Tc.Gen.Splice.reifyType`. This way, we would retain the `HsOpTy` and `HsAppTy` distinction.
## Partial Solution
In order to address Motivation B, we would need to properly embed coercions into `HsType GhcTc` and start using it throughout the type checker. However, that would be a very major, intrusive refactoring. Before we do that, there's a stopgap solution that could be used to address Motivation A.
Define the following `XXType` instance:
```
type instance XXType GhcTc = HsTypeTc
data HsTypeTc = HsTypeTc TcType SDoc
```
Then `HsType GhcTc` would only ever use `XHsType (HsTypeTc ty doc)`. The fields are as follows:
* `TcType` is the kind-checked, desugared type
* `SDoc` is the result of pretty printing `HsType GhcRn`, before parentheses and infix operators were discarded
This is sufficient to let us talk about `HsType GhcTc` and `HsDecl GhcTc`, and remove the `NoGhcTc` type family.
## Full Solution
The full solution would involve using `HsType GhcTc` throughout the type checker, rewriting zonking and unification to work over `HsType GhcTc`, and so on. It would address Motivation A, and also let us remove the notion of `TcType`: the type checker would work with `HsType GhcTc`, and `Type` would be only used in Core. That would be a nice improvement, as we could remove `TcTyVar` and `AnonArgFlag` (maybe something else?) from the syntax of Core.
## Completion
1. The partial solution is implemented.
I think we should start with the partial solution, so that's what this ticket is about. The full solution will require much more thought and design effort, so we can get back to it later.Vladislav ZavialovVladislav Zavialov