Skip to content
  • Vladislav Zavialov's avatar
    Grammar for types and data/newtype constructors · 686e06c5
    Vladislav Zavialov authored and Marge Bot's avatar Marge Bot committed
    Before this patch, we parsed types into a reversed sequence
    of operators and operands. For example, (F x y + G a b * X)
    would be parsed as [X, *, b, a, G, +, y, x, F],
    using a simple grammar:
    
    	tyapps
    	  : tyapp
    	  | tyapps tyapp
    
    	tyapp
    	  : atype
    	  | PREFIX_AT atype
    	  | tyop
    	  | unpackedness
    
    Then we used a hand-written state machine to assemble this
     either into a type,        using 'mergeOps',
         or into a constructor, using 'mergeDataCon'.
    
    This is due to a syntactic ambiguity:
    
    	data T1 a =          MkT1 a
    	data T2 a = Ord a => MkT2 a
    
    In T1, what follows after the = sign is a data/newtype constructor
    declaration. However, in T2, what follows is a type (of kind
    Constraint). We don't know which of the two we are parsing until we
    encounter =>, and we cannot check for => without unlimited lookahead.
    
    This poses a few issues when it comes to e.g. infix operators:
    
    	data I1 = Int :+ Bool :+ Char          -- bad
    	data I2 = Int :+ Bool :+ Char => MkI2  -- fine
    
    By this issue alone we are forced into parsing into an intermediate
    representation and doing a separate validation pass.
    
    However, should that intermediate representation be as low-level as a
    flat sequence of operators and operands?
    
    Before GHC Proposal #229, the answer was Yes, due to some particularly
    nasty corner cases:
    
    	data T = ! A :+ ! B          -- used to be fine, hard to parse
    	data T = ! A :+ ! B => MkT   -- bad
    
    However, now the answer is No, as this corner case is gone:
    
    	data T = ! A :+ ! B          -- bad
    	data T = ! A :+ ! B => MkT   -- bad
    
    This means we can write a proper grammar for types, overloading it in
    the DisambECP style, see Note [Ambiguous syntactic categories].
    
    With this patch, we introduce a new class, DisambTD. Just like
    DisambECP is used to disambiguate between expressions, commands, and patterns,
    DisambTD  is used to disambiguate between types and data/newtype constructors.
    
    This way, we get a proper, declarative grammar for constructors and
    types:
    
    	infixtype
    	  : ftype
    	  | ftype tyop infixtype
    	  | unpackedness infixtype
    
    	ftype
    	  : atype
    	  | tyop
    	  | ftype tyarg
    	  | ftype PREFIX_AT tyarg
    
    	tyarg
    	  : atype
    	  | unpackedness atype
    
    And having a grammar for types means we are a step closer to using a
    single grammar for types and expressions.
    686e06c5