Heuristic and helpful message for layout/indentation errors
Motivation
Parse errors caused by incorrect layout are notoriously opaque. Currently if a parse error happens on a token that happens to be a virtual token inserted according to layout rules, we merely guess that it is indentation-related. Otherwise we don't mention indentation at all.
Often incorrect layout will cause the parser to descend down an incorrect sequence of productions, meaning it will report an error at a point after the layout token (#12483, #19097), meaning GHC will not even offer indentation as a possible cause.
Proposal
When there is a parse error, in certain circumstances we somehow inform the user of the virtual layout tokens that we saw recently.
Supposing the code was something like
do
let x = case () of
pat -> ()
...
and we somehow told the user that, actually, case of
ended before pat
, they would immediately recognize that this is not what they meant to write. They might not know how to write the correct thing (and I'm not sure we can offer a concrete fix in the general case), but at least they will know that it's probably not related to the BlockArguments
suggestion, nor the spelling of the ->
.
For what the "certain circumstances" are -- the parse error could be arbitrarily far away from the responsible layout token, and I do not have a complete/satisfying answer for this. However srcParseErr
already comes with a couple heuristics in lieu of "was there a suspicious thing in the last N chars?", and we could continue with that for a heuristic that could work well enough: if there has been a virtual closing brace or virtual semicolon in the last N tokens (I don't know of a layout error that's caused by a virtual opening brace).
As for the "somehow inform" -- an obvious option here seems to be to print some of the most recent tokens we've seen, inserting and highlighting (color?) the virtual ones.
How many tokens do we print? Again the real responsible layout token could be arbitrarily far back, but not so far that there has been an entire top-level declaration inbetween. If we parsed all the way to module M where{ ... A;B;C
then we assume that B is correctly separated from A, and thus a parse error in C cannot be related to A (could however be related to B being incorrectly separated from C). Similarly if there's explicit braces/semicolons we don't have to look past those.
Still that may be arbitrarily many tokens, thus arbitrarily much memory so we may want to introduce an arbitrary cutoff at N tokens.
It is noteworthy that the "source location" at which virtual tokens are inserted is immediately preceding whatever token comes after, meaning typical code would turn into something like
...
;x = case () of
{pat -> ()
};y = y
;...
which is amazingly bad. One possible solution here is to reindent the tokens based on real and virtual braces/semicolons (which will additionally highlight their structure), ignoring user-provided whitespace, thus:
{ ...
; x = case () of
{ pat -> ()
}
; y = y
; ...
}
People unfamiliar with haskell might hate this. A less complicated solution is to move the virtual tokens past all preceding whitespace:
...;
x = case () of{
pat -> ()};
y = y;
...