Grammar for types and data/newtype constructors
Problem Statement
In the parser, we have to deal with a rather inconvenient syntactic ambiguity:
data T1 a = MkT1 a
data T2 a = Ord a => MkT2 a
- In
T1
, what follows after the=
sign is a data constructor declaration. - In
T2
, what follows is a type (of kindConstraint
).
We don't know which of the two we are parsing until we encounter =>
, and we cannot check for =>
without unlimited lookahead.
Similarly, for infix data constructors:
data T3 = X Int Bool
data T4 = X Int Bool :! String
- In
T3
,X Int Bool
is a data constructor declaration. - In
T4
,X Int Bool
is a type.
We don't know which one we're parsing until we encounter :!
.
Current Solution
Currently, we first parse into an intermediate representation:
data TyEl = TyElOpr RdrName | TyElOpd (HsType GhcPs)
| TyElKindApp SrcSpan (LHsType GhcPs)
| TyElUnpackedness UnpackednessPragma
From a list of such TyEl
, we then build either a type or a data/newtype constructor declaration:
mergeOps :: [Located TyEl] -> P (LHsType GhcPs)
mergeDataCon :: [Located TyEl] -> P ( Located RdrName, HsConDeclDetails GhcPs)
Unfortunately, mergeOps
and mergeDataCon
are basically hand-written state-machines, difficult to understand, maintain, and extend. When they were introduced, it seemed like a reasonable option, but a few things have changed:
- Handling of prefix vs infix
!
and~
has moved to the lexer. - Parsing of Haddock comments has moved into its own pass.
- A new approach for ambiguity resolution was developed (see
DisambECP
).
So now it makes sense to reconsider the problem and think of something better than mergeOps
/ mergeDataCon
.
Proposed Solution
DisambECP
works well for expressions/commands/patterns, so we can imagine a similar class, DisambTD
, for types and data constructors. Then in Parser.y
, we can write a proper grammar for types:
infixtype :: { forall b. DisambTD b => PV (Located b) }
infixtype
: ftype { ... }
| ftype tyop infixtype { ... }
| unpackedness infixtype { ... }
ftype :: { forall b. DisambTD b => PV (Located b) }
ftype
: atype { ... }
| tyop { ... }
| ftype tyarg { ... }
| ftype PREFIX_AT tyarg { ... }
tyarg :: { LHsType GhcPs }
tyarg
: atype { ... }
| unpackedness atype { ... }
Then happy
will generate the state machine, and we only need to write the reductions (using DisambTD
).
Implementation
The proposed solution is implemented in !2253 (closed)
Future Work
While the proposed change would be an improvement on its own, I also have a greater goal in mind (which is to unify term/type grammars). This ticket represents the first step in the following plan:
- Disambiguate types and data constructors using
DisambTD
. - Change parsing of
HsOpTy
to be left-associative instead of right-associative (to match the associativity used forOpApp
in terms). - Unify
DisambECP
andDisambTD
into a mega-classDisambECPTD
. Maybe with a better name. The key here is to reuse existing combinators fromDisambECP
, in particularmkHsAppPV
,mkHsAppTypePV
, andmkHsOpAppPV
. - Implement a prototype that will reveal the required breaking changes and write a proposal.
For what it's worth, I already tried doing step 4 right away, and found out about at least two required breaking changes (related to *
and forall
). But I didn't complete that prototype due to the scale of the needed changes. Steps 1—3 should make it more approachable.