Skip to content

Add a setting to change how Unicode encoding errors are handled

I proposal that we augment ghc-6.12.1's support for Unicode Handles by adding the following functions to System.IO:

hSetOnEncodingError :: Handle -> OnEncodingError -> IO ()
hGetOnEncodingError :: Handle -> IO OnEncodingError

as well as the enumeration OnEncodingError with three constructors:

  • ThrowEncodingError: Throw an exception at the first encoding or decoding

error.

  • SkipEncodingError: Skip all invalid bytes or characters.
  • TranslitEncodingError: Replace undecodable bytes with u+FFFD, and unencodable characters with '?'.

I have implemented this functionality in the attached patch. Haddock docs are here: http://code.haskell.org/~judah/new-io-docs/System-IO.html#23

The choice of error handler is orthogonal to the choice of encoder. Additionally, the same setting is used for both read and write modes. For portability, the handlers are written in pure Haskell rather than using GNU iconv's //TRANSLIT feature.

Note that the text package, for example, provides more sophisticated error-handling options. However, I think the above choices are useful enough without making the API too complicated.

Discussion deadline: September 9

Haddock docs: http://code.haskell.org/~judah/new-io-docs/System-IO.html#23

Trac metadata
Trac field Value
Version 6.10.4
Type Bug
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component libraries/base
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information