GHC.IO.Encoding not flushing partially converted input
GHC.IO.Encoding produces incomplete output for some encodings because it does not flush partially converted input at the end of the string.
iconv(3) provides API for the flushing.
In each series of calls to iconv(), the last should be one with inbuf or *inbuf equal to NULL, in order to flush out any partially converted input.
GHC.IO.Encoding does not perform the flushing properly and it can cause incomplete conversion result.
I found two cases that it actually produces incomplete output, but there might be more cases.
Case 1: EUC-JISX0213
For example, the following code is expected to output two bytes 0xa4 0xb1, but it outputs none.
enc <- mkTextEncoding "EUC-JISX0213" withFile "test.txt" WriteMode $ \h -> hSetEncoding h enc >> hPutStr h "\x3051"
The problem happens because of the following mapping between Unicode and EUC-JISX0213.
|U+3051 U+309A||0xa4 0xfa|
After seeing the codepoint U+3051, the converter is unable to determine which of the two byte sequence to output until it sees the next character or the end of the string. But
GHC.IO.Encoding does not call the above mentioned flushing API, therefore the converter is unable to recognize the end of the string.
Case 2: ISO-2022-JP
Similarly, following code is expected to output byte sequence
0x1b 0x24 0x42
0x1b 0x28 0x42 but the last three bytes
0x1b 0x28 0x42 is not produced.
enc <- mkTextEncoding "ISO-2022-JP" withFile "test.txt" WriteMode $ \h -> hSetEncoding h enc >> hPutStr h "\x3042"
ISO-2022-JP is a stateful encoding and RFC 1468 requires the state is reset to initial state at the end of the string. The missing three bytes
0x1b 0x28 0x42 are the escape sequence for that purpose. But again
GHC.IO.Encoding does not call the above mentioned
flushing API, therefore the converter cannot recognize the end of the string and cannot reset the state.