Commit 686e06c5 authored by Vladislav Zavialov's avatar Vladislav Zavialov Committed by Marge Bot
Browse files

Grammar for types and data/newtype constructors

Before this patch, we parsed types into a reversed sequence
of operators and operands. For example, (F x y + G a b * X)
would be parsed as [X, *, b, a, G, +, y, x, F],
using a simple grammar:

	tyapps
	  : tyapp
	  | tyapps tyapp

	tyapp
	  : atype
	  | PREFIX_AT atype
	  | tyop
	  | unpackedness

Then we used a hand-written state machine to assemble this
 either into a type,        using 'mergeOps',
     or into a constructor, using 'mergeDataCon'.

This is due to a syntactic ambiguity:

	data T1 a =          MkT1 a
	data T2 a = Ord a => MkT2 a

In T1, what follows after the = sign is a data/newtype constructor
declaration. However, in T2, what follows is a type (of kind
Constraint). We don't know which of the two we are parsing until we
encounter =>, and we cannot check for => without unlimited lookahead.

This poses a few issues when it comes to e.g. infix operators:

	data I1 = Int :+ Bool :+ Char          -- bad
	data I2 = Int :+ Bool :+ Char => MkI2  -- fine

By this issue alone we are forced into parsing into an intermediate
representation and doing a separate validation pass.

However, should that intermediate representation be as low-level as a
flat sequence of operators and operands?

Before GHC Proposal #229, the answer was Yes, due to some particularly
nasty corner cases:

	data T = ! A :+ ! B          -- used to be fine, hard to parse
	data T = ! A :+ ! B => MkT   -- bad

However, now the answer is No, as this corner case is gone:

	data T = ! A :+ ! B          -- bad
	data T = ! A :+ ! B => MkT   -- bad

This means we can write a proper grammar for types, overloading it in
the DisambECP style, see Note [Ambiguous syntactic categories].

With this patch, we introduce a new class, DisambTD. Just like
DisambECP is used to disambiguate between expressions, commands, and patterns,
DisambTD  is used to disambiguate between types and data/newtype constructors.

This way, we get a proper, declarative grammar for constructors and
types:

	infixtype
	  : ftype
	  | ftype tyop infixtype
	  | unpackedness infixtype

	ftype
	  : atype
	  | tyop
	  | ftype tyarg
	  | ftype PREFIX_AT tyarg

	tyarg
	  : atype
	  | unpackedness atype

And having a grammar for types means we are a step closer to using a
single grammar for types and expressions.
parent fbcb886d
......@@ -1966,22 +1966,28 @@ type :: { LHsType GhcPs }
mult :: { LHsType GhcPs }
: btype { $1 }
btype :: { LHsType GhcPs }
: tyapps {% mergeOps (unLoc $1) }
tyapps :: { Located [Located TyEl] } -- NB: This list is reversed
: tyapp { sL1 $1 [$1] }
| tyapps tyapp { sLL $1 $> $ $2 : unLoc $1 }
tyapp :: { Located TyEl }
: atype { sL1 $1 $ TyElOpd (unLoc $1) }
-- See Note [Whitespace-sensitive operator parsing] in GHC.Parser.Lexer
| PREFIX_AT atype { sLL $1 $> $ (TyElKindApp (comb2 $1 $2) $2) }
| tyop { mapLoc TyElOpr $1 }
| unpackedness { sL1 $1 $ TyElUnpackedness (unLoc $1) }
: infixtype {% runPV $1 }
infixtype :: { forall b. DisambTD b => PV (Located b) }
: ftype { $1 }
| ftype tyop infixtype { $1 >>= \ $1 ->
$3 >>= \ $3 ->
mkHsOpTyPV $1 $2 $3 }
| unpackedness infixtype { $2 >>= \ $2 ->
mkUnpackednessPV $1 $2 }
ftype :: { forall b. DisambTD b => PV (Located b) }
: atype { mkHsAppTyHeadPV $1 }
| tyop { failOpFewArgs $1 }
| ftype tyarg { $1 >>= \ $1 ->
mkHsAppTyPV $1 $2 }
| ftype PREFIX_AT tyarg { $1 >>= \ $1 ->
mkHsAppKindTyPV $1 (getLoc $2) $3 }
tyarg :: { LHsType GhcPs }
: atype { $1 }
| unpackedness atype {% addUnpackednessP $1 $2 }
tyop :: { Located RdrName }
: qtyconop { $1 }
......@@ -2222,8 +2228,9 @@ forall :: { Located ([AddAnn], Maybe [LHsTyVarBndr Specificity GhcPs]) }
| {- empty -} { noLoc ([], Nothing) }
constr_stuff :: { Located (Located RdrName, HsConDeclDetails GhcPs) }
: tyapps {% do { c <- mergeDataCon (unLoc $1)
; return $ sL1 $1 c } }
: infixtype {% fmap (mapLoc (\b -> (dataConBuilderCon b,
dataConBuilderDetails b)))
(runPV $1) }
fielddecls :: { [LConDeclField GhcPs] }
: {- empty -} { [] }
......
This diff is collapsed.
T12045d.hs:11:16: error:
Unexpected kind application in a data/newtype declaration:
MkD @Nat Bool
Unexpected kind application in a data/newtype declaration: MkD @Nat
strictnessDataCon_B.hs:1:27: error:
{-# UNPACK #-} cannot appear inside a type.
strictnessDataCon_B.hs:1:42: error: parse error on input ‘}’
typeops_A.hs:1:12: error: Operator applied to too few arguments: +
typeops_A.hs:2:1: error:
parse error (possibly incorrect indentation or mismatched brackets)
typeops_C.hs:1:12: error: Operator applied to too few arguments: +
typeops_C.hs:1:14: error: Operator applied to too few arguments: +
unpack_empty_type.hs:3:19: error:
{-# UNPACK #-} must be applied to a type.
unpack_empty_type.hs:3:34: error: parse error on input ‘}’
unpack_inside_type.hs:3:25: error:
{-# UNPACK #-} cannot appear inside a type.
• Unexpected UNPACK annotation: {-# UNPACK #-}Int
UNPACK annotation cannot appear nested inside a type
• In the first argument of ‘Maybe’, namely ‘({-# UNPACK #-}Int)’
In the type ‘Maybe ({-# UNPACK #-}Int)’
In the definition of data constructor ‘T’
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment