|
|
# Proposal: [FixityResolution](fixity-resolution)
|
|
|
|
|
|
<table><tr><th> Ticket </th>
|
|
|
<th>[\#30](https://gitlab.haskell.org//haskell/prime/issues/30)</th></tr>
|
|
|
<tr><th> Dependencies </th>
|
|
|
<th></th></tr>
|
|
|
<tr><th> Related </th>
|
|
|
<th>[NegativeSyntax](negative-syntax)</th></tr></table>
|
|
|
|
|
|
## Compiler support
|
|
|
|
|
|
<table><tr><th> GHC </th>
|
|
|
<th> full (no flag)
|
|
|
</th></tr>
|
|
|
<tr><th> nhc98 </th>
|
|
|
<th> full (no flag)
|
|
|
</th></tr>
|
|
|
<tr><th> Hugs </th>
|
|
|
<th> full (no flag)
|
|
|
</th></tr>
|
|
|
<tr><th> UHC </th>
|
|
|
<th> full (no flag)
|
|
|
</th></tr>
|
|
|
<tr><th> JHC </th>
|
|
|
<th> full (no flag)
|
|
|
</th></tr>
|
|
|
<tr><th> LHC </th>
|
|
|
<th> full (no flag)
|
|
|
</th></tr></table>
|
|
|
|
|
|
## Summary
|
|
|
|
|
|
|
|
|
Change the way fixity resolution is specified in the Report. Remove fixity resolution from the context-free grammar, and specify it separately. This does change the syntax slightly by eliminating some impossible-to-parse-correctly cases, but in practice all existing compilers already implement the proposed syntax, not the current syntax.
|
|
|
|
|
|
## Description
|
|
|
|
|
|
|
|
|
This is a proposal that doesn't affect anything except the presentation of the report, because all Haskell implementations currently do it this way anyway. There are some legal Haskell programs, according to the report, that aren't accepted by current implementations (see below). I propose we make the language specification match the implementations.
|
|
|
|
|
|
### The Problem
|
|
|
|
|
|
|
|
|
The Haskell 98 context-free syntax includes fixity resolution as part of the grammar, with a fixed number of fixities (\[1..9\]), essentially using macro expansion to define the grammar. Apart from being really ugly, and not corresponding to any known implementation (long ago GHC used a yacc parser formulated like this, but not any more), this gives rise to some constructions that are very hard to parse correctly. For example, the expression:
|
|
|
|
|
|
|
|
|
```wiki
|
|
|
\x -> x == x == True
|
|
|
```
|
... | ... | @@ -51,11 +10,14 @@ The Haskell 98 context-free syntax includes fixity resolution as part of the gra |
|
|
should be legal, and parse as `(\x -> x == x) == True`. The reason is that == is nonfix (assuming the == in scope is from the Prelude), and this combined with the rule that says "lambda expressions extend as far to the right as possible", forces the lambda expression to end before the second `==`.
|
|
|
|
|
|
|
|
|
|
|
|
No known Haskell compiler actually parses this correctly. Admittedly it's a type error, but that's not the point - the grammar should be parsable.
|
|
|
|
|
|
|
|
|
|
|
|
Furthermore:
|
|
|
|
|
|
|
|
|
```wiki
|
|
|
do a == b == c
|
|
|
```
|
... | ... | @@ -64,302 +26,19 @@ Furthermore: |
|
|
This is legal syntax, and parses as `do { a == b } == c`, again because `==` is nonfix, and the second `==` closes the layout context by virtue of the layout rule's parse error condition.
|
|
|
|
|
|
|
|
|
|
|
|
As far as I know, no Haskell compiler gets this right.
|
|
|
|
|
|
### The Solution
|
|
|
|
|
|
|
|
|
These constructions rely on the parser being aware of operator fixities during parsing, which is an unreasonable restriction - it's often easier to follow imports after parsing a file, rather than during it.
|
|
|
|
|
|
|
|
|
Hence, I propose that we remove operator fixity resolution from the context-free grammar, and specify it separately. The context-free grammar would parse applications of infix operators as a flat list to be resolved separately. The examples above would parse but fail during fixity resolution.
|
|
|
|
|
|
|
|
|
This would also let us expand the current paultry set of 9 fixity levels to an arbitrary limit, if we so wished (but that introduces backwards compatibility issues).
|
|
|
|
|
|
|
|
|
Sample code for resolving fixities is here: [resolve.hs](/attachment/wiki/FixityResolution/resolve.hs)[](/raw-attachment/wiki/FixityResolution/resolve.hs). I believe this implements Haskell 98 fixity resolution, without prefix negation (prefix negation could be added, but we might not need to; see [NegativeSyntax](negative-syntax)). The core of the parser is 12 lines of code, with a few lines of datatype declarations and pretty-printing. Note that there is no upper limit on precedence levels, but there is a lower limit of zero. This code could serve as the basis for specifying fixity resolution in the Haskell' report.
|
|
|
|
|
|
## References
|
|
|
|
|
|
- [resolve.hs](/attachment/wiki/FixityResolution/resolve.hs)[](/raw-attachment/wiki/FixityResolution/resolve.hs)
|
|
|
|
|
|
## Report Delta
|
|
|
|
|
|
### Section 3 (Expressions)
|
|
|
|
|
|
**remove**
|
|
|
|
|
|
>
|
|
|
> In the syntax that follows, there are some families of nonterminals
|
|
|
> indexed by precedence levels (written as a superscript). Similarly,
|
|
|
> the nonterminals op, varop, and conop may have a double index: a
|
|
|
> letter l, r, or n for left-, right- or non-associativity and a
|
|
|
> precedence level. A precedence-level variable i ranges from 0 to 9; an
|
|
|
> associativity variable a varies over {l, r, n}. For example
|
|
|
|
|
|
```wiki
|
|
|
aexp -> ( expi+1 qop(a,i) )
|
|
|
```
|
|
|
|
|
|
>
|
|
|
> actually stands for 30 productions, with 10 substitutions for i and 3
|
|
|
> for a.
|
|
|
|
|
|
**replace**
|
|
|
|
|
|
```wiki
|
|
|
exp -> exp0 :: [context =>] type (expression type signature)
|
|
|
| exp0
|
|
|
|
|
|
expi -> expi+1 [qop(n,i) expi+1]
|
|
|
| lexpi
|
|
|
| rexpi
|
|
|
lexpi -> (lexpi | expi+1) qop(l,i) expi+1
|
|
|
lexp6 -> - exp7
|
|
|
rexpi -> expi+1 qop(r,i) (rexpi | expi+1)
|
|
|
```
|
|
|
|
|
|
**with**
|
|
|
|
|
|
```wiki
|
|
|
exp -> infixexp :: [context =>] type (expression type signature)
|
|
|
| infixexp
|
|
|
|
|
|
infixexp -> exp10 qop infixexp
|
|
|
| - infixexp
|
|
|
| exp10
|
|
|
```
|
|
|
|
|
|
**replace**
|
|
|
|
|
|
```wiki
|
|
|
| ( expi+1 qop(a,i) ) (left section)
|
|
|
| ( lexpi qop(l,i) ) (left section)
|
|
|
| ( qop(a,i)<-> expi+1 ) (right section)
|
|
|
| ( qop(r,i)<-> rexpi ) (right section)
|
|
|
```
|
|
|
|
|
|
**with**
|
|
|
|
|
|
```wiki
|
|
|
| ( infixexp qop ) (left section)
|
|
|
| ( qop<-> infixexp ) (right section)
|
|
|
```
|
|
|
|
|
|
**replace**
|
|
|
|
|
|
>
|
|
|
> Expressions involving infix operators are disambiguated by the
|
|
|
> operator's fixity (see Section 4.4.2). Consecutive unparenthesized
|
|
|
> operators with the same precedence must both be either left or right
|
|
|
> associative to avoid a syntax error. Given an unparenthesized
|
|
|
> expression "x qop(a,i) y qop(b,j) z", parentheses must be added
|
|
|
> around either "x qop(a,i) y" or "y qop(b,j) z" when i=j unless a=b=l
|
|
|
> or a=b=r.
|
|
|
|
|
|
**with
|
|
|
**
|
|
|
|
|
|
>
|
|
|
> Expressions involving infix operators are disambiguated by the
|
|
|
> operator's fixity (see Section 4.4.2). Consecutive unparenthesized
|
|
|
> operators with the same precedence must both be either left or right
|
|
|
> associative to avoid a syntax error. Given an unparenthesized
|
|
|
> expression "x qop(a,i) y qop(b,j) z" (where "qop(a,i)" means an
|
|
|
> operator with associativity "a" and precedence "i"), parentheses
|
|
|
> must be added around either "x qop(a,i) y" or "y qop(b,j) z" when
|
|
|
> i=j unless a=b=l or a=b=r.
|
|
|
|
|
|
>
|
|
|
> An example algorithm for resolving expressions involving infix
|
|
|
> operators is given in Section 9.6.
|
|
|
|
|
|
**remove**
|
|
|
|
|
|
>
|
|
|
> A note about parsing. Expressions that involve the interaction of
|
|
|
> fixities with the let/lambda meta-rule may be hard to parse. For
|
|
|
> example, the expression
|
|
|
|
|
|
```wiki
|
|
|
let x = True in x == x == True
|
|
|
```
|
|
|
|
|
|
>
|
|
|
> cannot possibly mean
|
|
|
|
|
|
```wiki
|
|
|
let x = True in (x == x == True)
|
|
|
```
|
|
|
|
|
|
>
|
|
|
> because (==) is a non-associative operator; so the expression must
|
|
|
> parse thus:
|
|
|
|
|
|
> >
|
|
|
> > (let x = True in (x == x)) == True
|
|
|
|
|
|
>
|
|
|
> However, implementations may well use a post-parsing pass to deal
|
|
|
> with fixities, so they may well incorrectly deliver the former
|
|
|
> parse. Programmers are advised to avoid constructs whose parsing
|
|
|
> involves an interaction of (lack of) associativity with the
|
|
|
> let/lambda meta-rule.
|
|
|
|
|
|
**replace**
|
|
|
|
|
|
>
|
|
|
> For the sake of clarity, the rest of this section shows the syntax of
|
|
|
> expressions without their precedences.
|
|
|
|
|
|
**with**
|
|
|
|
|
|
>
|
|
|
> For the sake of clarity, the rest of this section will assume that
|
|
|
> expressions involving infix operators have been resolved according
|
|
|
> to the fixities of the operators.
|
|
|
|
|
|
### Section 3.5 (Sections)
|
|
|
|
|
|
|
|
|
replace the grammar, as above. The text is still correct.
|
|
|
|
|
|
### Section 9 (Syntax reference)
|
|
|
|
|
|
|
|
|
Make the corresponding changes as for Section 3.
|
|
|
|
|
|
### Section 9.6, Resolving expressions involving infix operators
|
|
|
|
|
|
**New sub-section**
|
|
|
|
|
|
|
|
|
The following is an example implementation of fixity resolution for
|
|
|
Haskell expressions. The function \@resolve@ takes a list consisting
|
|
|
of alternating expressions and operators; i.e. an instance of the
|
|
|
\@infixexp@ non-terminal in the context-free grammar, and returns
|
|
|
either \@Just e@ where \@e@ is the resolved expression, or \@Nothing@ if
|
|
|
the input does not represent a valid expression. In a compiler, of
|
|
|
course, it would be better to return more information about the
|
|
|
operators involved for the purposes of producing a useful error message,
|
|
|
but the \@Maybe@ type will suffice to illustrate the algorithm here.
|
|
|
|
|
|
```wiki
|
|
|
data Op = Op String Prec Fixity deriving Eq
|
|
|
data Fixity = Leftfix | Rightfix | Nonfix deriving Eq
|
|
|
type Prec = Int
|
|
|
|
|
|
data Exp = Var Var | OpApp Exp Op Exp | Neg Exp deriving Eq
|
|
|
type Var = String
|
|
|
|
|
|
data Tok = TExp Exp | TOp Op | TNeg
|
|
|
|
|
|
resolve :: [Tok] -> Maybe Exp
|
|
|
resolve tokens = fmap fst $ parseNeg (Op "" (-1) Nonfix) tokens
|
|
|
where
|
|
|
parseNeg :: Op -> [Tok] -> Maybe (Exp,[Tok])
|
|
|
parseNeg op1 (TExp e1 : rest) = parse op1 e1 rest
|
|
|
parseNeg op1 (TNeg : rest)
|
|
|
= do guard (prec1 < 6)
|
|
|
(r, rest') <- parseNeg (Op "-" 6 Leftfix) rest
|
|
|
parse op1 (Neg r) rest'
|
|
|
where
|
|
|
Op _ prec1 fix1 = op1
|
|
|
|
|
|
parse :: Op -> Exp -> [Tok] -> Maybe (Exp, [Tok])
|
|
|
parse _ e1 [] = Just (e1, [])
|
|
|
parse op1 e1 (TOp op2 : rest)
|
|
|
-- case (1): check for illegal expressions
|
|
|
| prec1 == prec2 && (fix1 /= fix2 || fix1 == Nonfix)
|
|
|
= Nothing
|
|
|
|
|
|
-- case (2): op1 and op2 should associate to the left
|
|
|
| prec1 > prec2 || (prec1 == prec2 && fix1 == Leftfix)
|
|
|
= Just (e1, TOp op2 : rest)
|
|
|
|
|
|
-- case (3): op1 and op2 should associate to the right
|
|
|
| otherwise
|
|
|
= do (r,rest') <- parseNeg op2 rest
|
|
|
parse op1 (OpApp e1 op2 r) rest'
|
|
|
where
|
|
|
Op _ prec1 fix1 = op1
|
|
|
Op _ prec2 fix2 = op2
|
|
|
```
|
|
|
|
|
|
|
|
|
The algorithm works as follows. At each stage we have a call
|
|
|
|
|
|
```wiki
|
|
|
parse op1 E1 (op2 : tokens)
|
|
|
```
|
|
|
|
|
|
|
|
|
which means that we are looking at an expression like
|
|
|
|
|
|
```wiki
|
|
|
E0 `op1` E1 `op2` ... (*1)
|
|
|
```
|
|
|
|
|
|
|
|
|
(the caller holds E0). The job of \@parse@ is to build the expression
|
|
|
to the right of \@op1@, returning the expression and any remaining
|
|
|
input.
|
|
|
|
|
|
|
|
|
There are three cases to consider:
|
|
|
|
|
|
>
|
|
|
> (1) if \@op1@ and \@op2@ have the same precedence, but they do not
|
|
|
> have the same associativity, or they are declared to be nonfix, then
|
|
|
> the expression is illegal.
|
|
|
|
|
|
>
|
|
|
> (2) If \@op1@ has a higher precedence than \@op2@, or \@op1@ and \@op2@
|
|
|
> should left-associate, then we know that the expression to the right
|
|
|
> of \@op1@ is \@E1@, so we return this to the caller.
|
|
|
|
|
|
>
|
|
|
> (3) Otherwise, we know we want to build an expression of the form
|
|
|
> \@E1 `op2` R@. To find R, we call \@parseNeg op2 tokens@ to compute
|
|
|
> the expression to the right of \@op2@, namely \@R@ (more about
|
|
|
> \@parseNeg@ below, but essentially if \@tokens@ is of the form @(E2 :
|
|
|
> rest)@, then this is equivalent to \@parse op2 E2 rest@). Now, we
|
|
|
> have
|
|
|
|
|
|
> >
|
|
|
> > E0 `op1` (E1 `op2` R) `op3` …
|
|
|
|
|
|
>
|
|
|
> where \@op3@ is the next operator in the input. This is an instance of
|
|
|
> (\*1) above, so to continue we call parse, with the new E1 == (E1
|
|
|
> `op2` R)
|
|
|
|
|
|
|
|
|
To initialise the algorithm, we set \@op1@ to be an imaginary operator
|
|
|
with precedence lower than anyything else. Hence \@parse@ will consume
|
|
|
the whole input, and return the resulting expression.
|
|
|
|
|
|
Hence, I propose that we remove operator fixity resolution from the context-free grammar, and specify it separately. The context-free grammar would parse applications of infix operators as a flat list to be resolved separately. The examples above would parse but fail during fixity resolution.
|
|
|
|
|
|
The handling of the prefix negation operator, @-@, complicates matters
|
|
|
only slightly. Recall that prefix negation has the same fixity as
|
|
|
infix negation: left-associative with precedence 6. The operator to
|
|
|
the left of @-@, if there is one, must have precedence lower than 6
|
|
|
for the expression to be legal. The negation operator itself may
|
|
|
left-associate with operators of the same fixity (e.g. @+@). So for
|
|
|
example @-a + b@ is legal and resolves as @(-a) + b), but \@a + -b@ is
|
|
|
illegal.
|
|
|
|
|
|
|
|
|
The function \@parseNeg@ handles prefix negation. If we encounter a
|
|
|
negation operator, and it is legal in this position (the operator to
|
|
|
the left has precedence lower than 6), then we proceed in a similar
|
|
|
way to case (3) above: compute the argument to '-' by recursively
|
|
|
calling \@parseNeg@, and then continue by calling \@parse@.
|
|
|
This would also let us expand the current paultry set of 9 fixity levels to an arbitrary limit, if we so wished (but that introduces backwards compatibility issues).
|
|
|
|
|
|
|
|
|
Note that this algorithm is insensitive to the range and resolution of precedences.
|
|
|
There is no reason in principle that Haskell should be limited to
|
|
|
integral precedences in the range 1 to 10; a larger range, or
|
|
|
fractional values, would present no additional implementation
|
|
|
difficulties. |