GHC.IO.Encoding not flushing partially converted input
Conversion by GHC.IO.Encoding
produces incomplete output for some encodings because it does not flush partially converted input at the end of the string.
iconv(3) provides API for the flushing.
In each series of calls to iconv(), the last should be one with inbuf or *inbuf equal to NULL, in order to flush out any partially converted input.
But GHC.IO.Encoding
does not perform the flushing properly and it can cause incomplete conversion result.
I found two cases that it actually produces incomplete output, but there might be more cases.
Case 1: EUC-JISX0213
For example, the following code is expected to output two bytes 0xa4 0xb1, but it outputs none.
enc <- mkTextEncoding "EUC-JISX0213"
withFile "test.txt" WriteMode $ \h -> hSetEncoding h enc >> hPutStr h "\x3051"
The problem happens because of the following mapping between Unicode and EUC-JISX0213.
Unicode | EUC-JISX0213 |
U+3051 U+309A | 0xa4 0xfa |
U+3051 | 0xa4 0xb1 |
After seeing the codepoint U+3051, the converter is unable to determine which of the two byte sequence to output until it sees the next character or the end of the string. But GHC.IO.Encoding
does not call the above mentioned flushing API, therefore the converter is unable to recognize the end of the string.
Case 2: ISO-2022-JP
Similarly, following code is expected to output byte sequence 0x1b 0x24 0x42
0x24 0x22
0x1b 0x28 0x42
but the last three bytes 0x1b 0x28 0x42
is not produced.
enc <- mkTextEncoding "ISO-2022-JP"
withFile "test.txt" WriteMode $ \h -> hSetEncoding h enc >> hPutStr h "\x3042"
ISO-2022-JP is a stateful encoding and RFC 1468 requires the state is reset to initial state at the end of the string. The missing three bytes 0x1b 0x28 0x42
are the escape sequence for that purpose. But again GHC.IO.Encoding
does not call the above mentionedflushing
API, therefore the converter cannot recognize the end of the string and cannot reset the state.
Trac metadata
Trac field | Value |
---|---|
Version | 8.4.3 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Core Libraries |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |