Printing non-ASCII characters to console on Windows
As part of an initiative of getting stack to work properly on Windows for users with international names (https://github.com/commercialhaskell/stack/issues/3988) and working on trying to find a fix for
ghc-pkg - #15021 (closed) I discovered a weird behavior that have been known for a while and does affect other languages, not only Haskell.
First of all here is the default behavior on Windows with Locale that isn't Cyrillic for this program:
main :: IO () main = putStrLn "Алексей Кулешевич"
PS C:\phab\windows-console> stack exec -- console console.EXE: <stdout>: commitBuffer: invalid argument (invalid character)
Now consider this program:
main :: IO () main = do hSetEncoding stdout utf8 putStrLn "Алексей Кулешевич"
Compiling and running it on Windows 7 with English locale results in:
PS C:\phab\windows-console> stack exec -- console ╨É╨╗╨╡╨║╤ü╨╡╨╣ ╨Ü╤â╨╗╨╡╤ê╨╡╨▓╨╕╤ç PS C:\phab\windows-console> chcp 65001 Active code page: 65001 PS C:\phab\windows-console> stack exec -- console Алексей Кулешевич лешевич �ич
No knowledge of Russian is necessary in order to see that after the code page is set to
65001 there are characters printed to the console that don't belong there. That seems to be bug in Windows handling of unicode characters, since it's the exactly same result is
cmd as well as Powershell and has been reported with other languages like Perl and Java.
Worth noting that this also directly affects
GHC_CHARENC environment variable is set to
Besides the bug being described above it is sad that we need to rely on both the code page and the handle encoding to be set correctly in order to even see the semi-correct output without a total program crash.
The fix being proposed here is to use
WriteConsoleW API call instead of writing to a handle, but only when the handle is actually a console and not pipe. This allows us to print unicode characters correctly without changing or relying on the setting of the current code page. Here is a sample output with my recent experiments:
PS C:\phab\windows-console> chcp Active code page: 437 PS C:\phab\windows-console> stack exec -- console Алексей Кулешевич
I'll add some code examples of proposed solution in the upcoming days.