Grammar for types and data/newtype constructors
Before this patch, we parsed types into a reversed sequence of operators and operands. For example, (F x y + G a b * X) would be parsed as [X, *, b, a, G, +, y, x, F], using a simple grammar: tyapps : tyapp | tyapps tyapp tyapp : atype | PREFIX_AT atype | tyop | unpackedness Then we used a hand-written state machine to assemble this either into a type, using 'mergeOps', or into a constructor, using 'mergeDataCon'. This is due to a syntactic ambiguity: data T1 a = MkT1 a data T2 a = Ord a => MkT2 a In T1, what follows after the = sign is a data/newtype constructor declaration. However, in T2, what follows is a type (of kind Constraint). We don't know which of the two we are parsing until we encounter =>, and we cannot check for => without unlimited lookahead. This poses a few issues when it comes to e.g. infix operators: data I1 = Int :+ Bool :+ Char -- bad data I2 = Int :+ Bool :+ Char => MkI2 -- fine By this issue alone we are forced into parsing into an intermediate representation and doing a separate validation pass. However, should that intermediate representation be as low-level as a flat sequence of operators and operands? Before GHC Proposal #229, the answer was Yes, due to some particularly nasty corner cases: data T = ! A :+ ! B -- used to be fine, hard to parse data T = ! A :+ ! B => MkT -- bad However, now the answer is No, as this corner case is gone: data T = ! A :+ ! B -- bad data T = ! A :+ ! B => MkT -- bad This means we can write a proper grammar for types, overloading it in the DisambECP style, see Note [Ambiguous syntactic categories]. With this patch, we introduce a new class, DisambTD. Just like DisambECP is used to disambiguate between expressions, commands, and patterns, DisambTD is used to disambiguate between types and data/newtype constructors. This way, we get a proper, declarative grammar for constructors and types: infixtype : ftype | ftype tyop infixtype | unpackedness infixtype ftype : atype | tyop | ftype tyarg | ftype PREFIX_AT tyarg tyarg : atype | unpackedness atype And having a grammar for types means we are a step closer to using a single grammar for types and expressions.
Showing
- compiler/GHC/Parser.y 24 additions, 17 deletionscompiler/GHC/Parser.y
- compiler/GHC/Parser/PostProcess.hs 166 additions, 339 deletionscompiler/GHC/Parser/PostProcess.hs
- testsuite/tests/parser/should_fail/T12045d.stderr 1 addition, 2 deletionstestsuite/tests/parser/should_fail/T12045d.stderr
- testsuite/tests/parser/should_fail/strictnessDataCon_B.stderr 1 addition, 2 deletions...suite/tests/parser/should_fail/strictnessDataCon_B.stderr
- testsuite/tests/parser/should_fail/typeops_A.stderr 2 additions, 1 deletiontestsuite/tests/parser/should_fail/typeops_A.stderr
- testsuite/tests/parser/should_fail/typeops_C.stderr 1 addition, 1 deletiontestsuite/tests/parser/should_fail/typeops_C.stderr
- testsuite/tests/parser/should_fail/unpack_empty_type.stderr 1 addition, 2 deletionstestsuite/tests/parser/should_fail/unpack_empty_type.stderr
- testsuite/tests/parser/should_fail/unpack_inside_type.stderr 5 additions, 1 deletiontestsuite/tests/parser/should_fail/unpack_inside_type.stderr
Loading
Please register or sign in to comment