Unicode in CLI support is very weird on Windows
getLine fails to read unicode characters even when codepage and stdin encoding is set to UTF-8 and putStrLn fails to write unicode characters when codepage is not set, even though it could.
Steps to reproduce
Type the following in a GHCi session:
and then enter a character like "µ" or "ẞ". If you have entered a single character, the result will be this:
Alternatively, write and compile a
.hs file with this content:
main :: IO () main = getLine >>= putStrLn
When running this without writing
chcp 65001 or setting the "UTF-8 by default" setting beforehand, entering unicode characters will result in an error (note: this behaviour is assumed by me to persist in compiled code, but has only been reproduced by me in GHCi so far):
*** Exception: <stdout>: hPutChar: invalid argument (invalid character)
or print a
? character instead (note: this behaviour is not assumed by me to be the default, as, for testing purposes, I had set my codepage to an unfamiliar one).
When the codepage is set, entering unsupported characters leads to them being replaced with a space:
HelloµHello Hello Hello
hSetEncoding stdin utf8 to any of the above examples does not change the behaviour.
hGetEncoding stdin in GHCi when the codepage is 65001 gives:
but the incorrect behaviour occurs nonetheless.
µ should return
For the compiled code example:
The string should be returned unchanged
- GHC version used: 8.10.1
- Operating System: Windows 10
- System Architecture: 64-Bit
This is only the case on Windows.
This works fine on Linux or on WSL.