Skip to content

Unicode output in GHC

Unicode output is somewhat broken in GHC as a whole. We should fix it properly.

Most output is generated by the Pretty module. Pretty has two ways to output:

  • printLeftRender, which is used when the rendering mode is LeftMode.

    This method uses the BufWrite module to speed up output. For FastStrings,

    the output will be in UTF-8, for strings and other characters the output

    takes the low 8 bits of each character.

  • printDoc, when used in modes other than LeftMode (e.g. for things like

    error messages and -ddump), calls hPutStr for strings which uses the

    prevailing encoding on stdout. However, it calls hPutFS for FastStrings,

    which always emits UTF-8.

  • In GHCi, there is an additional layer due to Haskeline, which pipes all the

    output through its own decoder (or tries to, I think there are cases not

    covered).

This is all a bit of a mess.

We should be using the Unicode layer in the IO library for all encoding/decoding now. I suggest that:

  • we leave printLeftRender alone. It is used for printing things like the

    .s file, and never outputs any Unicode characters because everything is

    Z-encoded.

  • printDoc, instead of hPutFS, should use hPutStr . decodeFS

  • We get rid of the Haskeline decoding layer.

However, this will introduce a regression on Windows, because the Haskeline encoding layer currently does code-page encoding. Judah has mentioned looking at doing code-page encoding in the GHC IO library, so let's see what happens there.

Once this is done, we can do #2507 (closed) (quotation characters in error messages).

Trac metadata
Trac field Value
Version 6.11
Type Bug
TypeOfFailure OtherFailure
Priority high
Resolution Unresolved
Component Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information