text decoding doesn't use recover on eof
ghc-7.2.1 provides a way for TextEncodings to recover from decoding errors. However, that functionality does not work for incomplete byte sequences at the end of a file; in that case, it throws an error regardless of the recovery function. This is a problem since it makes it difficult to ensure that a program won't throw an exception on bad input.
Reproduction steps:
ghc --make GetChar.hs
ghc -e "Data.ByteString.hPut System.IO.stdout (Data.ByteString.pack [200])" | ./GetChar
where GetChar.hs is the following module:
{-# LANGUAGE RecordWildCards #-}
./GetChar
module Main where
import System.IO
import GHC.IO.Encoding
import GHC.IO.Encoding.Failure
main = do
mkRecoveringLocaleEncoding "UTF-8" >>= hSetEncoding stdin
getChar >>= print
mkRecoveringLocaleEncoding :: String -> IO TextEncoding
mkRecoveringLocaleEncoding name = do
enc <- mkTextEncoding name
return $ case enc of
TextEncoding {..} -> TextEncoding {
mkTextDecoder = fmap (setRecover $ recoverDecode TransliterateCodingFailure)
mkTextDecoder,
mkTextEncoder = fmap (setRecover $ recoverEncode TransliterateCodingFailure)
mkTextEncoder,..
}
where
setRecover r x = x { recover = r }
Result:
GetChar: <stdin>: hGetChar: invalid argument (invalid byte sequence for this encoding)
In the course of investigating the issue, I found the following comment near the definition of GHC.IO.Handle.streamEncode:
-- FIXME: we should use recover to deal with EOF, rather than always throwing an
-- IOException (ioe_invalidCharacter).
So I guess this ticket records my vote to fix that problem.
Trac metadata
| Trac field | Value |
|---|---|
| Version | 7.2.1 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture |