Printing non-ASCII characters to console on Windows
As part of an initiative of getting stack to work properly on Windows for users with international names (https://github.com/commercialhaskell/stack/issues/3988) and working on trying to find a fix for ghc-pkg
- #15021 (closed) I discovered a weird behavior that have been known for a while and does affect other languages, not only Haskell.
First of all here is the default behavior on Windows with Locale that isn't Cyrillic for this program:
main :: IO ()
main = putStrLn "Алексей Кулешевич"
PS C:\phab\windows-console> stack exec -- console
console.EXE: <stdout>: commitBuffer: invalid argument (invalid character)
Now consider this program:
main :: IO ()
main = do
hSetEncoding stdout utf8
putStrLn "Алексей Кулешевич"
Compiling and running it on Windows 7 with English locale results in:
PS C:\phab\windows-console> stack exec -- console
Алексей Кулешевич
PS C:\phab\windows-console> chcp 65001
Active code page: 65001
PS C:\phab\windows-console> stack exec -- console
Алексей Кулешевич
лешевич
�ич
No knowledge of Russian is necessary in order to see that after the code page is set to 65001
there are characters printed to the console that don't belong there. That seems to be bug in Windows handling of unicode characters, since it's the exactly same result is cmd
as well as Powershell and has been reported with other languages like Perl and Java.
Worth noting that this also directly affects ghc
, whenever GHC_CHARENC
environment variable is set to "UTF-8"
.
Besides the bug being described above it is sad that we need to rely on both the code page and the handle encoding to be set correctly in order to even see the semi-correct output without a total program crash.
The fix being proposed here is to use WriteConsoleW
API call instead of writing to a handle, but only when the handle is actually a console and not pipe. This allows us to print unicode characters correctly without changing or relying on the setting of the current code page. Here is a sample output with my recent experiments:
PS C:\phab\windows-console> chcp
Active code page: 437
PS C:\phab\windows-console> stack exec -- console
Алексей Кулешевич
I'll add some code examples of proposed solution in the upcoming days.
Trac metadata
Trac field | Value |
---|---|
Version | 8.2.2 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | Unknown/Multiple |
Architecture |