Unicode output in GHC
Unicode output is somewhat broken in GHC as a whole. We should fix it properly.
Most output is generated by the Pretty module. Pretty has two ways to output:
-
printLeftRender
, which is used when the rendering mode isLeftMode
.This method uses the
BufWrite
module to speed up output. ForFastStrings
,the output will be in UTF-8, for strings and other characters the output
takes the low 8 bits of each character.
-
printDoc
, when used in modes other thanLeftMode
(e.g. for things likeerror messages and
-ddump
), callshPutStr
for strings which uses theprevailing encoding on stdout. However, it calls
hPutFS
forFastStrings
,which always emits UTF-8.
-
In GHCi, there is an additional layer due to Haskeline, which pipes all the
output through its own decoder (or tries to, I think there are cases not
covered).
This is all a bit of a mess.
We should be using the Unicode layer in the IO library for all encoding/decoding now. I suggest that:
-
we leave
printLeftRender
alone. It is used for printing things like the.s
file, and never outputs any Unicode characters because everything isZ-encoded.
-
printDoc
, instead ofhPutFS
, should usehPutStr . decodeFS
-
We get rid of the Haskeline decoding layer.
However, this will introduce a regression on Windows, because the Haskeline encoding layer currently does code-page encoding. Judah has mentioned looking at doing code-page encoding in the GHC IO library, so let's see what happens there.
Once this is done, we can do #2507 (closed) (quotation characters in error messages).
Trac metadata
Trac field | Value |
---|---|
Version | 6.11 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | high |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |