Improve error messages from GHC.IO.Encoding.Failure
Summary
Error messages from GHC.IO.Encoding.Failure
are notoriously unhelpful. They are triggered by encoding mismatches and are environmental-dependent, so a program could work perfectly fine in development, but suddenly complain about invalid argument in production. The error message does not give any hints about an unexpected byte/char and whether it was decoding or encoding, which makes it frustrating and challenging to debug.
Steps to reproduce
module Main where
import GHC.IO.Encoding
main :: IO ()
main = decoding
decoding :: IO ()
decoding = do
-- enforce non-Latin1 locale
setLocaleEncoding utf8
-- read binary file
xs <- readFile "Foo"
-- force evaluation
print (length xs)
-- throws hGetContents: invalid argument (invalid byte sequence)
encoding :: IO ()
encoding = do
-- enforce non-Unicode locale
setLocaleEncoding latin1
-- print unescaped Unicode char
putStrLn "Я"
-- throws commitBuffer: invalid argument (invalid character)
Expected behavior
I have a patch, which changes error messages to respectively
-- hGetContents: invalid argument (cannot decode byte sequence starting from 207)
-- commitBuffer: invalid argument (cannot encode character '\1071')
Admittedly, this is not perfect and ideally we would dump more context and specify encodings involved, but better than status quo.
Environment
- GHC version used: 9.2