Skip to content
Snippets Groups Projects
Commit 686e06c5 authored by Vladislav Zavialov's avatar Vladislav Zavialov Committed by Marge Bot
Browse files

Grammar for types and data/newtype constructors

Before this patch, we parsed types into a reversed sequence
of operators and operands. For example, (F x y + G a b * X)
would be parsed as [X, *, b, a, G, +, y, x, F],
using a simple grammar:

	tyapps
	  : tyapp
	  | tyapps tyapp

	tyapp
	  : atype
	  | PREFIX_AT atype
	  | tyop
	  | unpackedness

Then we used a hand-written state machine to assemble this
 either into a type,        using 'mergeOps',
     or into a constructor, using 'mergeDataCon'.

This is due to a syntactic ambiguity:

	data T1 a =          MkT1 a
	data T2 a = Ord a => MkT2 a

In T1, what follows after the = sign is a data/newtype constructor
declaration. However, in T2, what follows is a type (of kind
Constraint). We don't know which of the two we are parsing until we
encounter =>, and we cannot check for => without unlimited lookahead.

This poses a few issues when it comes to e.g. infix operators:

	data I1 = Int :+ Bool :+ Char          -- bad
	data I2 = Int :+ Bool :+ Char => MkI2  -- fine

By this issue alone we are forced into parsing into an intermediate
representation and doing a separate validation pass.

However, should that intermediate representation be as low-level as a
flat sequence of operators and operands?

Before GHC Proposal #229, the answer was Yes, due to some particularly
nasty corner cases:

	data T = ! A :+ ! B          -- used to be fine, hard to parse
	data T = ! A :+ ! B => MkT   -- bad

However, now the answer is No, as this corner case is gone:

	data T = ! A :+ ! B          -- bad
	data T = ! A :+ ! B => MkT   -- bad

This means we can write a proper grammar for types, overloading it in
the DisambECP style, see Note [Ambiguous syntactic categories].

With this patch, we introduce a new class, DisambTD. Just like
DisambECP is used to disambiguate between expressions, commands, and patterns,
DisambTD  is used to disambiguate between types and data/newtype constructors.

This way, we get a proper, declarative grammar for constructors and
types:

	infixtype
	  : ftype
	  | ftype tyop infixtype
	  | unpackedness infixtype

	ftype
	  : atype
	  | tyop
	  | ftype tyarg
	  | ftype PREFIX_AT tyarg

	tyarg
	  : atype
	  | unpackedness atype

And having a grammar for types means we are a step closer to using a
single grammar for types and expressions.
parent fbcb886d
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment