Unicode output in GHC

Unicode output is somewhat broken in GHC as a whole. We should fix it properly.

Most output is generated by the Pretty module. Pretty has two ways to output:

printLeftRender, which is used when the rendering mode is LeftMode.

This method uses the BufWrite module to speed up output. For FastStrings,

the output will be in UTF-8, for strings and other characters the output

takes the low 8 bits of each character.
printDoc, when used in modes other than LeftMode (e.g. for things like

error messages and -ddump), calls hPutStr for strings which uses the

prevailing encoding on stdout. However, it calls hPutFS for FastStrings,

which always emits UTF-8.
In GHCi, there is an additional layer due to Haskeline, which pipes all the

output through its own decoder (or tries to, I think there are cases not

covered).

This is all a bit of a mess.

We should be using the Unicode layer in the IO library for all encoding/decoding now. I suggest that:

we leave printLeftRender alone. It is used for printing things like the

.s file, and never outputs any Unicode characters because everything is

Z-encoded.
printDoc, instead of hPutFS, should use hPutStr . decodeFS
We get rid of the Haskeline decoding layer.

However, this will introduce a regression on Windows, because the Haskeline encoding layer currently does code-page encoding. Judah has mentioned looking at doing code-page encoding in the GHC IO library, so let's see what happens there.

Once this is done, we can do #2507 (closed) (quotation characters in error messages).

Trac metadata

Trac field	Value
Version	6.11
Type	Bug
TypeOfFailure	OtherFailure
Priority	high
Resolution	Unresolved
Component	Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message

Unicode output in GHC