PrettyErrors is a wiki page summarising the state of play.
clang has very nice-looking error messages.
<pre><spanstyle="font-weight: bold">pretty.c:6:7: <spanstyle="color: pink">warning:</span> incompatible pointer to integer conversion passing 'char [14]' to parameter of type 'int' [-Wint-conversion]</span> foo("Hello, world!");<spanstyle="color: green">^~~~~~~~~~~~~~~</span><spanstyle="font-weight: bold">pretty.c:1:14: note: passing argument to parameter 'i' here</span>void foo(int i) {<spanstyle="color: green">^</span>1 warning generated.</pre>
ghc's error messages are not so good.
<pre>ugly.hs:7:18: Couldn't match expected type ‘()’ with actual type ‘[Char]’ In the first argument of ‘f’, namely ‘"Hello, world!"’ In the second argument of ‘($)’, namely ‘f "Hello, world!"’ In the expression: print $ f "Hello, world!"</pre>
In my opinion, there are three independent improvements that could be made to GHC error messages and warnings: color, context and whitespace. Currently they're blobs of text.
Consider all three applied to error messages:
<pre><strong>ugly.hs:7:18: <spanstyle="color: red">error:</span> Argument to 'f' is type '[Char]' but expected 'Int'</strong>main = print $ f "Hello, world!"<spanstyle="color: green">^~~~~~~~~~~~~~~</span><strong>ugly.hs:3:1: note: type of 'f' is given here:</strong>f :: () -> IO ()<spanstyle="color: green">^~</span></pre>
or
ugly.hs: note: type of 'f' is inferred:
f :: forall m. Monad m => () -> m ()
^~
In my opinion, context and whitespace are more important that color. Even without color, compare this error message to the one shown above:
ugly.hs:7:18: error: Argument to 'f' is type '[Char]' but expected 'Int'main = print $ f "Hello, world!" ^~~~~~~~~~~~~~~ugly.hs:3:1: note: type of 'f' is given here:f :: () -> IO () ^~
In my opinion this is much easier to visually process than GHC's current messages.
Edited
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
I'm afraid I don't quite see what you're getting at. The original post says, "[This modified version] is much easier to visually process than GHC's current messages." My question is: Why, precisely? I don't mean to be defensive or dismissive, but trying to generate grounds for a meaningful conversation. For example, here are a few things that you might be getting at:
Having different colors/font weights (i.e. boldness) makes the error messages more visually interesting and therefore easier to pay attention to and read.
Having blank lines in the middle of single error messages makes them less imposing.
Using position marker in a line below some code is easier to follow than an ever-growing context.
In the example, the type of f is given explicitly, so the context in which the error was made is more apparent.
Short of re-engineering the entire way that GHC handles error messages, it would certainly be hard to produce exactly the output that you are requesting. But, it may be possible to address bulletpoints like my suggested ones above piecemeal and nudge ourselves in the direction of better errors.
It's also worth pointing out that each of the bulletpoints above has reasons "against", such as:
Not every terminal supports these extra modes. In particular, GHC has already had some trouble getting "smart" quotes working in all possible environments (or, indeed, figuring out when to fall back onto dumb quotes).
An automated processor of error messages (that is, an IDE built around GHC) could easily get confused around the blank lines. In fact, I believe I've run into this exact problem when running clang from emacs -- the extra "context-setting" output gets interpreted as fresh warnings.
It's unclear to me, personally, if having the position marker on a separate line is necessarily better than the current output.
The user of a DSL in Haskell is generally unaware of the full, general type of a function they are using. Perhaps including the full type in the error message would make it scarier, not friendlier.
In any case, I'm curious to hear more about the specific things GHC can do to improve. I think we all want "better" error messages, but we need to agree on a definition of "better" first. And, the changes should probably be incremental, unless we have an eager volunteer to examine the whole error-message generation mechanism holistically. There is quite a bit of code dedicated to error messages, so this is not a task to be taken on lightly!
Edsko and I were thinking about this a bit in light of the recent discussion on Reddit. He had what I thought was a rather nice idea:
Putting aside the specific proposal made in this ticket, it seems like generally what we need is a more semantically-rich representation for our error messages. This need not be a giant AST encoding every possible error that might arise. Our current approach of encoding messages in SDoc works fairly well. What it lacks is the ability to denote meaningful references to various parts of the program (e.g. types, expressions, constraints).
A moderately painless(?) way to fix this would be to index Doc (and SDoc) on a type which could then be embedded in the document. To put it concretely,
The Embed constructor could then be used to embed various compiler-phase specific atoms into the document. For instance, the type-checker might emit errors in the form of SDoc TcDoc where,
Consumers of error messages could then use these annotations as they like. Most of the existing consumers would likely expose a function which would take a function to project the phase-specific data back to a plain SDoc. For instance,
showSDoc'::DynFlags->(a->SDocVoid)->SDoca->String
and we could avoid breaking existing users of showSDoc by defining it as,
Other uses (say, tooling using the GHC API) might choose to instead use a richer presentation of the data embedded in the document. These users will still be limited by the fact that the error representation is still ultimately a pretty-printer document, but at least now we aren't forcing them to parse a formatted error message to extract these key details. Moreover, we might be able to expose more context in this embedded data than we show in the current messages.
One of the nice properties of this approach is that it allows a somewhat gradual transition. Adding the infrastructure to enable this sort of embedding requires only minor changes to existing code (e.g. adding the index to SDoc). Moreover, I have a sneaking suspicion that it would allow us to clean up the current story around Names in Outputable.
It's come to my attention that my ticket:8809#comment:79617 may have shut down the conversation here. That was the opposite of my intent! I'd love to figure out how to break down the problem of difficult-to-work-with error messages into its pieces so that we can debate them (and hopefully implement improvements) sensibly.
I should also be clear on one particular point: the biggest barrier to getting this done is the love from someone(s) to see it all through. This would be a valuable service, indeed.
It would be nice if we could refactor GHC so that error messages are kept in some sort of structured format with all information that might be relevant. Then, when printed we could have flags to specify how to render the errors (e.g., "machine form", which would be good for external tools, such as IDEs; or "human form", which could have the nice formatting in the example).
It would be nice if we could refactor GHC so that error messages are kept in some sort of structured format with all information that might be relevant. Then, when printed we could have flags to specify how to render the errors (e.g., "machine form", which would be good for external tools, such as IDEs; or "human form", which could have the nice formatting in the example).
Indeed this would be nice, however placing all of the information necessary for an error comes at a cost. I think Simon PJ articulates this fairly well in this comment on the Reddit post mentioned by goldfire (reproduced here for archival sake),
Building error messages from strings (or in GHC's case SDocs) is pretty lame because you can write them but not analyse them. The "obvious" alternative is to use a huge algebraic data type with one constructor for each error message that GHC can produce. Then you generate the constructor in one place, and render it into a string somewhere else, perhaps in more than one way. I am not optimistic about this, because it puts a big central road-block in the way of generating error messages, and splits the work into two different places (the renderer and the generator). That's an advantage in some ways, but there are so darn MANY different error messages that it feels far too centralised and brittle to me.
Idris does something in the middle. As I understand David Cristiansen, they have an abstract type a bit like SDoc, but it is much richer than GHC's SDoc. They can certainly do colour (and SDocs should too). And you can attach auxilary info to the SDoc so that when rendered in a web browser you get popup hints. This would all be very feasible in GHC, if someone did the work.
Another big issue is having enough information to hand when you are generating the message in the first place. Attaching provenance information to type constraints is a huge help (as the Elm article suggests) which GHC does, but not well enough. For example Lennart Augustsson gave a talk at the Haskell Implementors workshop last year with some simple suggestions that work really well in his Mu compiler. Jurriaan Hage and his colleages at Utrecht have a lot of experience of this kind of thing with Helium. GHC is better placed to do this now than it has ever been before, because the type inference engine is increasingly based on solving constraints. Almost all type errors are generated in a single module, TcErrors, if you are interested to look there.
I'm keen to make sure that running GHC in batch mode sending output to a text file or dumb terminal gives something useful. I don't want to require a snazzy IDE or emacs mode. But I'd love to be able to exploit one if it was available.
The proposal I lay out in ticket:8809#comment:101672 was an attempt to find a way to implement the alternative that Simon describes above while minimizing the impact of the change.
Edsko and I were thinking about this a bit in light of the recent discussion on Reddit. He had what I thought was a rather nice idea: ...
I think the idea of embedding richer info into SDoc is a good one. In particular, I like the idea that this enables a gradual transition. For example, we could have some large ADT defined in !TcErrors that represents all of the errors that the module produces (but not other modules). Then some of the downside of the big-ADT approach that Simon is worried about is reduced. And then we could do another module... and so on.
However, I think indexing SDoc is going to lead to trouble. We won't be able to have lists of errors that originated in disparate parts of the compiler. And we won't be able to embed multiple types of information in the same error message. Instead, what if we just use dynamic typing here? (gasp!) By this, I mean something like
data Doc = forall a. Typeable a => Embed a | Empty | ...
When pulling out embedded bits, we just use dynamic checks to get the types. Although this seems somewhat un-Haskellish, I think it's forced by the very-dynamic nature of an error message. During parsing, a consumer can discover what type of embedded information should be at a certain spot, and then do the dynamic check. This seems like just the sort of thing that dynamic typing excels at.
goldfire, indeed the an ADT-per-compiler-phase is what I was thinking (and I have the beginnings of a branch looking at TcErrors in particular. In my case though, I was thinking of at least starting by merely annotating a few semantically-important elements of the message (e.g. Names, Types, TyVars, etc.). This would enable, for instance, IDEs to link to the definition span of a symbol, print an expanded representation of a type, etc.
That being said, there is no reason why one couldn't go further with this same approach and encode the entire error as a value in some ADT. This certainly offers further advantages, although also implies a bit more work (which is why I'm starting with the atoms listed above).
As far as the indexing issue goes, I was thinking we would give Doc a Monad instance. This would allow a number of quite convenient patterns. For instance, have msgs :: Doc TcErrDoc containing some errors you'd like to print: If you have pprTcErrDoc :: TcErrDoc -> Doc Void, you could trivially flatten the document with msgs >>= pprTcErrDoc.
Further if you want to combine a Doc TcErrDoc with a Doc ParserErrDoc, you'd simply lift them both into an ADT data GhcErrDoc = TcErrDoc TcErrDoc | ParserErrDoc ParserErrDoc with Applicative. Alternatively, if you'd rather keep the universe of error types open, you could opt to lift them into a universally quantified newtype, roughly like you suggest.
I should note that adding an index and Monad instance to Doc isn't entirely trivial. I believe it is possible (and have a patch with much of the work) but I haven't yet proven to myself that it will preserve the invariants that the Hughes pretty-printer expects.
There are a few implementations of annotated pretty-printers of various flavors on Hackage, but they either provide only Functor (e.g. pretty, annotated-wl-pprint), or are of the Wadler-Leijen variety (e.g. wl-pprint-extras).
My current approach treats the "Pure" constructor like text, adding a PureBeside a Bool (Doc a) constructor to Doc. This, however, makes it impossible (I believe) to correctly implement some combinators which expect to know the width of the string (e.g. sep).
I believe it might be easier to add Monad in a Leijen-style printer, where the width is only necessary on rendering. However, I'm afraid I'm not familiar enough with pretty-printers to know the trade-offs involved here. What are the reasons against doing this?