Unicode output in GHC
Unicode output is somewhat broken in GHC as a whole. We should fix it properly.
Most output is generated by the Pretty module. Pretty has two ways to output:
-
printLeftRender, which is used when the rendering mode isLeftMode.This method uses the
BufWritemodule to speed up output. ForFastStrings,the output will be in UTF-8, for strings and other characters the output
takes the low 8 bits of each character.
-
printDoc, when used in modes other thanLeftMode(e.g. for things likeerror messages and
-ddump), callshPutStrfor strings which uses theprevailing encoding on stdout. However, it calls
hPutFSforFastStrings,which always emits UTF-8.
-
In GHCi, there is an additional layer due to Haskeline, which pipes all the
output through its own decoder (or tries to, I think there are cases not
covered).
This is all a bit of a mess.
We should be using the Unicode layer in the IO library for all encoding/decoding now. I suggest that:
-
we leave
printLeftRenderalone. It is used for printing things like the.sfile, and never outputs any Unicode characters because everything isZ-encoded.
-
printDoc, instead ofhPutFS, should usehPutStr . decodeFS -
We get rid of the Haskeline decoding layer.
However, this will introduce a regression on Windows, because the Haskeline encoding layer currently does code-page encoding. Judah has mentioned looking at doing code-page encoding in the GHC IO library, so let's see what happens there.
Once this is done, we can do #2507 (closed) (quotation characters in error messages).
Trac metadata
| Trac field | Value |
|---|---|
| Version | 6.11 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | high |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture |