Skip to content

Improve error messages from GHC.IO.Encoding.Failure

Summary

Error messages from GHC.IO.Encoding.Failure are notoriously unhelpful. They are triggered by encoding mismatches and are environmental-dependent, so a program could work perfectly fine in development, but suddenly complain about invalid argument in production. The error message does not give any hints about an unexpected byte/char and whether it was decoding or encoding, which makes it frustrating and challenging to debug.

Steps to reproduce

module Main where 

import GHC.IO.Encoding

main :: IO ()
main = decoding

decoding :: IO ()
decoding = do
  -- enforce non-Latin1 locale
  setLocaleEncoding utf8
  -- read binary file
  xs <- readFile "Foo"
  -- force evaluation
  print (length xs)
  -- throws hGetContents: invalid argument (invalid byte sequence)

encoding :: IO ()
encoding = do
  -- enforce non-Unicode locale
  setLocaleEncoding latin1
  -- print unescaped Unicode char
  putStrLn "Я"
  -- throws commitBuffer: invalid argument (invalid character)

Expected behavior

I have a patch, which changes error messages to respectively

-- hGetContents: invalid argument (cannot decode byte sequence starting from 207)
-- commitBuffer: invalid argument (cannot encode character '\1071')

Admittedly, this is not perfect and ideally we would dump more context and specify encodings involved, but better than status quo.

Environment

  • GHC version used: 9.2
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information