Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • GHC GHC
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
    • Locked files
  • Issues 5.6k
    • Issues 5.6k
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 644
    • Merge requests 644
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Artifacts
    • Schedules
    • Test cases
  • Deployments
    • Deployments
    • Releases
  • Packages and registries
    • Packages and registries
    • Model experiments
  • Analytics
    • Analytics
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Glasgow Haskell CompilerGlasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #5436

text decoding doesn't use recover on eof

ghc-7.2.1 provides a way for TextEncodings to recover from decoding errors. However, that functionality does not work for incomplete byte sequences at the end of a file; in that case, it throws an error regardless of the recovery function. This is a problem since it makes it difficult to ensure that a program won't throw an exception on bad input.

Reproduction steps:

ghc --make GetChar.hs
ghc -e "Data.ByteString.hPut System.IO.stdout (Data.ByteString.pack [200])" | ./GetChar

where GetChar.hs is the following module:

{-# LANGUAGE RecordWildCards #-}
./GetChar
module Main where

import System.IO
import GHC.IO.Encoding
import GHC.IO.Encoding.Failure

main = do
    mkRecoveringLocaleEncoding "UTF-8" >>= hSetEncoding stdin
    getChar >>= print

mkRecoveringLocaleEncoding :: String -> IO TextEncoding
mkRecoveringLocaleEncoding name = do
    enc <- mkTextEncoding name
    return $ case enc of
        TextEncoding {..} -> TextEncoding {
                mkTextDecoder = fmap (setRecover $ recoverDecode TransliterateCodingFailure)
                                    mkTextDecoder,
                mkTextEncoder = fmap (setRecover $ recoverEncode TransliterateCodingFailure)
                                    mkTextEncoder,..
            }
  where
    setRecover r x = x { recover = r }

Result:

GetChar: <stdin>: hGetChar: invalid argument (invalid byte sequence for this encoding)

In the course of investigating the issue, I found the following comment near the definition of GHC.IO.Handle.streamEncode:

-- FIXME: we should use recover to deal with EOF, rather than always throwing an
-- IOException (ioe_invalidCharacter).

So I guess this ticket records my vote to fix that problem.

Trac metadata
Trac field Value
Version 7.2.1
Type Bug
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking