Skip to content

Draft: Better parsing error

This is a draft PR for issue #16955. I'm opening this to track work and get initial feedback on the design.

I'm using the error mechanism from happy to improve the parse error messages.

As an example, the following code:

foo x y z = bar [(x, y, z),
            (y, x, z,
            (x, x, y),
            (y, y, y)
  
data Chicken = CotCotCot

bar a b c d = print ((a, b, c, d) + "hello"

fact n = product [1..n]

bar = let
  a = 5
  in

Was given the following error message when parsed with current GHC:

[1 of 1] Compiling Main             ( ParseErrorTest.hs, ParseErrorTest.o )

ParseErrorTest.hs:6:1: error:
    parse error (possibly incorrect indentation or mismatched brackets)
  |
6 | data Chicken = CotCotCot
  | ^

With this pull request, the error message is as following:

[1 of 1] Compiling Main             ( ParseErrorTest.hs, ParseErrorTest.o )

ParseErrorTest.hs:1:17: error:
    Parse error with context: Parsing error: please close this list.
  |
1 | foo x y z = bar [(x, y, z),
  |                 ^

ParseErrorTest.hs:2:13: error:
    Parse error with context: Parsing error: please close this tuple brace.
  |
2 |             (y, x, z,
  |             ^

ParseErrorTest.hs:8:21: error:
    Parse error with context: Parsing error: please close this brace.
  |
8 | bar a b c d = print ((a, b, c, d) + "hello"
  |                     ^

ParseErrorTest.hs:14:3: error:
    Parse error with context: Parsing error: let/in clause without body. Please add a body.
   |
14 |   in
   |   ^^

You can see here that parse error are more informative and give a useful location. Also, all the parse error are reported, instead of the first one.

This is a work in progress MR, especially:

  • The code is not formatted. I'm discovering the ghc codebase as well as happy, so I'll need a bit of attempt to get something correctly formatted.
  • I'd like to handle more cases, especially parse error when there is no in in let ... in.

Design discussion

Right now parsing error are fatal and GHC does ends the parsing process if there is any parsing error.

However we can see that the parser is able to recover and produce a valid AST for most cases. It may be possible to continue the compilation process (first approximation, replacing the addError by addWarning). This way, parsing error and type error may appear aside.

It may lead to many type error if the parsing error leads to something totally incoherence, but it may also results in less distraction in the writing process. For example, right now, when I'm introducing a let in my code, my editor switches to "parse error" and all the other type errors / warnings disappears (see #16955 where I detail this).

We can even imagine that, thanks to -fdefer-type-error, parsing error may be pushed to runtime. I see some possible benefits from this, such as being able to run a test suite when in the middle of a refactoring.

Most parsing error leads to incomplete AST, which can be "fixed" with type holes for example:

let
   x = foo
   y = bar
in

(missing body for in), may be fixed to:

let
   x = foo
   y = bar
in _

(with -fdefer-type-holes, this error will only appears at runtime).

However, some parsing error can lead to perfectly fine code, for example:

l = [1,2,3

actually, the current code will generate the list [1,2,3]. We can keep it like it, or adds some runtime failures, such as:

l = [1,2,3] ++ error "Parse error: missing parenthesis"

I'll experiment this design space in followup commit.

Merge request reports