'<stdin>: hGetLine: invalid argument' with Unicode input on Windows
Summary
On Windows, when handling Unicode input it would report an error.
Due to the probabilistic nature of this issue and different locales, I have to narrow it as much as possible (see below). Nevertheless, I'm not sure if it can be 100% reproducible on your side.
Steps to reproduce
Change "Current language for non-Unicode programs:" to "Chinese (Simplified, China)" and don't check "Beta: UTF-8". (It's might be not necessary, but can narrow the problem).
Create a new project (named testunicode).
-- Main.hs
module Main where
import System.IO
main :: IO ()
main = do
i_enc <- hGetEncoding stdin
o_enc <- hGetEncoding stdout
e_enc <- hGetEncoding stderr
putStrLn $ show i_enc
putStrLn $ show o_enc
putStrLn $ show e_enc
line <- getLine
putStrLn line
Ensure there is only one conhost.exe in your task manager (It might be not necessary, but can narrow the problem).
Open cmd.exe
, set PATH
for GHC and cabal (DON'T launch by a .cmd
script, it might cause to unable to reproduce!)
Ensure console Font is "新宋体" (Note that if you select other Font, there would be no error but another problem occurs... see below)
C:\work-pl\haskell\testunicode>chcp
Active code page: 936
C:\work-pl\haskell\testunicode>chcp 65001
Active code page: 65001 (This will clear the console)
C:\work-pl\haskell\testunicode>cabal build
... [2 of 2] Linking ... \\testunicode.exe
$ cabal run
C:\work-pl\haskell\testunicode>cabal run
Just UTF-8
Just UTF-8
Just UTF-8
Фывфыв
testunicode: <stdin>: hGetLine: invalid argument (cannot decode byte sequence starting from 208)
If can not reproduce (this might be because your code page defaults to UTF-8), do the following:
C:\work-pl\haskell\testunicode>chcp 936
Active code page: 936
C:\work-pl\haskell\testunicode>chcp 65001
Active code page: 65001
C:\work-pl\haskell\testunicode>cabal run
Just UTF-8
Just UTF-8
Just UTF-8
Фывфыв
testunicode: <stdin>: hGetLine: invalid argument (cannot decode byte sequence starting from 208)
This should reproduce (DON't close the console window, see below).
It seems like a Windows BUG because sometimes no problem, for example, if you launch the console by a .cmd
script and console's active code page defaults to UTF-8. However, I don't believe it is a Windows BUG, because some other PL is OK, e.g. Racket.
Even it works perfectly in io-manager=native
mode.
For example, edit testunicode.cabal and add ghc-options: -rtsopts
.
C:\work-pl\haskell\testunicode>cabal clean
C:\work-pl\haskell\testunicode>cabal build
... [2 of 2] Linking ... \\testunicode.exe
C:\work-pl\haskell\testunicode>cabal run testunicode -- +RTS --io-manager=native
Just UTF-8
Just UTF-8
Just UTF-8
Фывфыв
Фывфыв
It works perfectly.
However, the --io-manager=native
can not work normally in REPL.
C:\work-pl\haskell\testunicode>cabal repl testunicode --repl-options="+RTS --io-manager=native"
ghci> main
Just UTF-8
Just UTF-8
Just UTF-8
<----- STUCK HERE!!!
Note that this issue is only for input. No problem with output. For example, if Main.hs has no getLine
and just putStrLn "Фывфыв"
, it works perfectly.
If you select Font "Lucida Console" instead of "新宋体".
Then no error occurs, but show nothing.
C:\work-pl\haskell\testunicode>cabal run
Just UTF-8
Just UTF-8
Just UTF-8
Фывфыв
<----- NOTHING!
However, the Lucida Console definitely can render "Фывфыв", see https://www.myfonts.com/collections/lucida-console-font-monotype-imaging
Thanks.
Related issues: #10542 (closed) #18307 (closed)
Expected behavior
Handle Unicode input correctly.
Environment
- GHC version used: 9.8.1