Grammar for types and data/newtype constructors (#18506)
Before this patch, we parsed types into a reversed sequence
of operators and operands. For example, (F x y + G a b * X)
would be parsed as [X, *, b, a, G, +, y, x, F]
,
using a simple grammar:
tyapps
: tyapp
| tyapps tyapp
tyapp
: atype
| PREFIX_AT atype
| tyop
| unpackedness
Then we used a hand-written state machine to assemble this
either into a type, using mergeOps
,
or into a constructor, using mergeDataCon
.
This is due to a syntactic ambiguity:
data T1 a = MkT1 a
data T2 a = Ord a => MkT2 a
In T1
, what follows after the =
sign is a data/newtype constructor
declaration. However, in T2
, what follows is a type (of kind
Constraint
). We don't know which of the two we are parsing until we
encounter =>
, and we cannot check for =>
without unlimited lookahead.
This poses a few issues when it comes to e.g. infix operators:
data I1 = Int :+ Bool :+ Char -- bad
data I2 = Int :+ Bool :+ Char => MkI2 -- fine
By this issue alone we are forced into parsing into an intermediate representation and doing a separate validation pass.
However, should that intermediate representation be as low-level as a flat sequence of operators and operands?
Before GHC Proposal #229, the answer was Yes, due to some particularly nasty corner cases:
data T = ! A :+ ! B -- used to be fine, hard to parse
data T = ! A :+ ! B => MkT -- bad
However, now the answer is No, as this corner case is gone:
data T = ! A :+ ! B -- bad
data T = ! A :+ ! B => MkT -- bad
This means we can write a proper grammar for types, overloading it in
the DisambECP
style, see Note [Ambiguous syntactic categories]
.
With this patch, we introduce a new class, DisambTD
. Just like
DisambECP
is used to disambiguate between expressions, commands, and patterns,
DisambTD
is used to disambiguate between types and data/newtype constructors.
This way, we get a proper, declarative grammar for constructors and types:
infixtype
: ftype
| ftype tyop infixtype
| unpackedness infixtype
ftype
: atype
| tyop
| ftype tyarg
| ftype PREFIX_AT tyarg
tyarg
: atype
| unpackedness atype
And having a grammar for types means we are a step closer to using a single grammar for types and expressions.
Closes #18506 (closed).