Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • GHC GHC
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
    • Locked files
  • Issues 5.5k
    • Issues 5.5k
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 635
    • Merge requests 635
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Artifacts
    • Schedules
    • Test cases
  • Deployments
    • Deployments
    • Releases
  • Packages and registries
    • Packages and registries
    • Model experiments
  • Analytics
    • Analytics
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Glasgow Haskell CompilerGlasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #15553

GHC.IO.Encoding not flushing partially converted input

Conversion by GHC.IO.Encoding produces incomplete output for some encodings because it does not flush partially converted input at the end of the string.

iconv(3) provides API for the flushing.

In each series of calls to iconv(), the last should be one with inbuf or *inbuf equal to NULL, in order to flush out any partially converted input.

But GHC.IO.Encoding does not perform the flushing properly and it can cause incomplete conversion result. I found two cases that it actually produces incomplete output, but there might be more cases.

Case 1: EUC-JISX0213

For example, the following code is expected to output two bytes 0xa4 0xb1, but it outputs none.

enc <- mkTextEncoding "EUC-JISX0213"
withFile "test.txt" WriteMode $ \h -> hSetEncoding h enc >> hPutStr h "\x3051"

The problem happens because of the following mapping between Unicode and EUC-JISX0213.

Unicode EUC-JISX0213
U+3051 U+309A 0xa4 0xfa
U+3051 0xa4 0xb1

After seeing the codepoint U+3051, the converter is unable to determine which of the two byte sequence to output until it sees the next character or the end of the string. But GHC.IO.Encoding does not call the above mentioned flushing API, therefore the converter is unable to recognize the end of the string.

Case 2: ISO-2022-JP

Similarly, following code is expected to output byte sequence 0x1b 0x24 0x42 0x24 0x22 0x1b 0x28 0x42 but the last three bytes 0x1b 0x28 0x42 is not produced.

enc <- mkTextEncoding "ISO-2022-JP"
withFile "test.txt" WriteMode $ \h -> hSetEncoding h enc >> hPutStr h "\x3042"

ISO-2022-JP is a stateful encoding and RFC 1468 requires the state is reset to initial state at the end of the string. The missing three bytes 0x1b 0x28 0x42 are the escape sequence for that purpose. But again GHC.IO.Encoding does not call the above mentionedflushing API, therefore the converter cannot recognize the end of the string and cannot reset the state.

Trac metadata
Trac field Value
Version 8.4.3
Type Bug
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component Core Libraries
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking