... | ... | @@ -40,3 +40,49 @@ There are plenty more examples. A good feature of this approach is that the err |
|
|
|
|
|
|
|
|
The main point is this. If you are changing the parser, feel free to make it accept more programs than it does at the moment, provided you also add a later test that rejects the bad programs. Typically you need this flexibility if some new thing you want to add makes the pars ambiguous, and you need more context to disambiguate. Delicate hacking of the LR grammar is to be discouraged. It's very hard to maintain and debug.
|
|
|
|
|
|
## Avoiding right-recursion
|
|
|
|
|
|
|
|
|
Be sure to read [ this section](https://www.haskell.org/happy/doc/html/sec-sequences.html) of the Happy manual for tips on avoiding right recursion. In GHC, the preferred method is using a left-recursive `OrdList`, as below:
|
|
|
|
|
|
```wiki
|
|
|
foolist :: { OrdList Foo }
|
|
|
: foolist ',' foo { $1 `appOL` unitOL $3 }
|
|
|
| foolist ',' { $1 } -- optional, but recommended
|
|
|
| foo { unitOL $1 }
|
|
|
```
|
|
|
|
|
|
`OrdList` operationally works the same way as building a list in reverse (as in the Happy manual), but it makes it less likely you'll forget to call `reverse` when you need to get the `final` list out.
|
|
|
|
|
|
|
|
|
One interesting, non-obvious fact, is that if you \*do\* use a right-recursive parser, the "extra semi-colons" production should NOT be pluralized:
|
|
|
|
|
|
```wiki
|
|
|
foolist :: { [Foo] }
|
|
|
: foo ',' foolist { $1 : $3 }
|
|
|
| foo ',' { [$1] } -- NOT foolist
|
|
|
| foo { [$1] }
|
|
|
```
|
|
|
|
|
|
## Indentation
|
|
|
|
|
|
|
|
|
Probably the most complicated interaction between the lexer and parser is with regards to *whitespace-sensitive layout.* The most important thing to know is that the lexer understands layout, and will output virtual open/close curlies (productions `vocurly` and `vccurly`) as well as semicolons, which can then be used as part of productions in `Parser.y`. So for example, if you are writing a rule that will make use of indentation, you should accept both virtual and literal curlies:
|
|
|
|
|
|
```wiki
|
|
|
body :: { ([AddAnn]
|
|
|
,([LImportDecl RdrName], [LHsDecl RdrName])) }
|
|
|
: '{' top '}' { (moc $1:mcc $3:(fst $2)
|
|
|
, snd $2) }
|
|
|
| vocurly top close { (fst $2, snd $2)
|
|
|
```
|
|
|
|
|
|
|
|
|
Notice the use of `close` rather than `vccurly`: `close` is a production that accepts both `vccurly` and a Happy `error`; that is, if we encounter an error in parsing, we try exiting an indentation context and trying again. This ensures, for example, that the top-level context can be closed even if no virtual curly was output.
|
|
|
|
|
|
|
|
|
The top-level of a Haskell file does not automatically have a layout context; when there is no `module` keyword, a context is implicitly pushed using `missing_module_keyword`.
|
|
|
|
|
|
|
|
|
When writing grammars that accept semicolon-separated sequences, be sure to include a rule allowing for trailing semicolons (see the previous section), otherwise, you will reject layout. |