Skip to content

'<stdin>: hGetLine: invalid argument' with Unicode input on Windows

Summary

On Windows, when handling Unicode input it would report an error.

Due to the probabilistic nature of this issue and different locales, I have to narrow it as much as possible (see below). Nevertheless, I'm not sure if it can be 100% reproducible on your side.

Steps to reproduce

Change "Current language for non-Unicode programs:" to "Chinese (Simplified, China)" and don't check "Beta: UTF-8". (It's might be not necessary, but can narrow the problem).

Create a new project (named testunicode).

-- Main.hs
module Main where
import System.IO

main :: IO ()
main = do
  i_enc <- hGetEncoding stdin
  o_enc <- hGetEncoding stdout
  e_enc <- hGetEncoding stderr
  putStrLn $ show i_enc
  putStrLn $ show o_enc
  putStrLn $ show e_enc
  line <- getLine
  putStrLn line

Ensure there is only one conhost.exe in your task manager (It might be not necessary, but can narrow the problem).

Open cmd.exe, set PATH for GHC and cabal (DON'T launch by a .cmd script, it might cause to unable to reproduce!)

Ensure console Font is "新宋体" (Note that if you select other Font, there would be no error but another problem occurs... see below)

C:\work-pl\haskell\testunicode>chcp
Active code page: 936

C:\work-pl\haskell\testunicode>chcp 65001
Active code page: 65001 (This will clear the console)

C:\work-pl\haskell\testunicode>cabal build
... [2 of 2] Linking ... \\testunicode.exe

$ cabal run
C:\work-pl\haskell\testunicode>cabal run
Just UTF-8
Just UTF-8
Just UTF-8
Фывфыв
testunicode: <stdin>: hGetLine: invalid argument (cannot decode byte sequence starting from 208)

If can not reproduce (this might be because your code page defaults to UTF-8), do the following:

C:\work-pl\haskell\testunicode>chcp 936
Active code page: 936

C:\work-pl\haskell\testunicode>chcp 65001
Active code page: 65001

C:\work-pl\haskell\testunicode>cabal run
Just UTF-8
Just UTF-8
Just UTF-8
Фывфыв
testunicode: <stdin>: hGetLine: invalid argument (cannot decode byte sequence starting from 208)

This should reproduce (DON't close the console window, see below).

It seems like a Windows BUG because sometimes no problem, for example, if you launch the console by a .cmd script and console's active code page defaults to UTF-8. However, I don't believe it is a Windows BUG, because some other PL is OK, e.g. Racket.

Even it works perfectly in io-manager=native mode.

For example, edit testunicode.cabal and add ghc-options: -rtsopts.

C:\work-pl\haskell\testunicode>cabal clean
C:\work-pl\haskell\testunicode>cabal build
... [2 of 2] Linking ... \\testunicode.exe
C:\work-pl\haskell\testunicode>cabal run testunicode -- +RTS --io-manager=native
Just UTF-8
Just UTF-8
Just UTF-8
Фывфыв
Фывфыв

It works perfectly.

However, the --io-manager=native can not work normally in REPL.

C:\work-pl\haskell\testunicode>cabal repl testunicode --repl-options="+RTS --io-manager=native"
ghci> main
Just UTF-8
Just UTF-8
Just UTF-8
<----- STUCK HERE!!!

Note that this issue is only for input. No problem with output. For example, if Main.hs has no getLine and just putStrLn "Фывфыв", it works perfectly.

If you select Font "Lucida Console" instead of "新宋体".

Then no error occurs, but show nothing.

C:\work-pl\haskell\testunicode>cabal run
Just UTF-8
Just UTF-8
Just UTF-8
Фывфыв
<----- NOTHING!

However, the Lucida Console definitely can render "Фывфыв", see https://www.myfonts.com/collections/lucida-console-font-monotype-imaging

Thanks.

Related issues: #10542 (closed) #18307 (closed)

Expected behavior

Handle Unicode input correctly.

Environment

  • GHC version used: 9.8.1
Edited by Siyuan Chen
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information