Skip to content

Incorrect encoding for "error" strings on windows

Summary

Cyrillic characters provided to error get mangled; cyrillic characters printed via hPutStrLn stderr are ok.

Steps to reproduce

The following code

module Main where
import System.IO
import GHC.IO.Encoding
main :: IO ()
main = do
    putStr "stdout encoding: " >> hGetEncoding stdout >>= print
    putStr "stderr encoding: " >> hGetEncoding stderr >>= print
    putStr "getForeignEncoding: " >> getForeignEncoding >>= print
    putStr "getLocaleEncoding: "  >> getLocaleEncoding  >>= print
    putStr "getFileSystemEncoding: "  >> getFileSystemEncoding  >>= print
    putStrLn "latin"
    putStrLn "кириллица 1"
    hPutStrLn stderr "кириллица 2"
    error "кириллица 3"

is built with ghc-9.0.1 via stack and resolver: nightly-2021-09-09. Running it with stack exec -- nameOfexecutable (optionally +RTS --io-manager=native) yields

stdout encoding: Just UTF-8
stderr encoding: Just UTF-8
getForeignEncoding: UTF-8
getLocaleEncoding: UTF-8
getFileSystemEncoding: UTF-8
latin
кириллица 1
кириллица 2
codepage-exe.EXE: кириллица 3
CallStack (from HasCallStack):
  error, called at app\Main.hs:17:5 in main:Main

Expected behavior

All cyrillic characters are displayed correctly.

Environment

  • GHC version used: 9.0.1
  • Terminal: Windows Terminal
  • Shell: powershell-7.1.4 with $OutputEncoding

Preamble          : 
BodyName          : utf-8
EncodingName      : Unicode (UTF-8)
HeaderName        : utf-8
WebName           : utf-8
WindowsCodePage   : 1200
IsBrowserDisplay  : True
IsBrowserSave     : True
IsMailNewsDisplay : True
IsMailNewsSave    : True
IsSingleByte      : False
EncoderFallback   : System.Text.EncoderReplacementFallback
DecoderFallback   : System.Text.DecoderReplacementFallback
IsReadOnly        : True
CodePage          : 65001

The git-bash shell exhibits the same behavior.

Optional:

  • Operating System: win10
  • System Architecture: x86_64

Judging by the form of the above mojibake, it is a utf8-encoded cyrillic interpreted as cp1251. When I add mkTextEncoding "CP1251" >>= setForeignEncoding before error, everything is displayed correctly.

Is there a setting that would make the program detect a correct encoding (foreign locale?) for error output regardless of OS and/or windows flavor? Here the assumption that foreign locale is utf8 fails.

Cheers!

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information