Commit ed329074 authored by Herbert Valerio Riedel's avatar Herbert Valerio Riedel 🕺
Browse files

Fix thinko in `decodeStringUtf8`

This resulted in some two-bytes utf8 encodings to be decoded
into U+FFFD unintentionally (such as e.g. U+0142).

With this fix, the property

    [ c | c <- [minBound..maxBound]
        , c < '\xD800' || c >= '\xE000' -- surrogate pair codes
        , (decodeStringUtf8 . encodeStringUtf8) [c] /= [c]
        ] == ['\xfffe','\xffff']

holds. It's not clear to me why U+FFFE and U+FFFF ought to be singled
out. Needs more investigation.

TODO: testsuite coverage
parent a87fcd10
......@@ -31,10 +31,10 @@ decodeStringUtf8 = go
twoBytes :: Word8 -> [Word8] -> String
twoBytes c0 (c1:cs')
| c1 .&. 0xC0 == 0x80
= let d = ((c0 .&. 0x1F) `shiftL` 6)
.|. (c1 .&. 0x3F)
= let d = (fromIntegral (c0 .&. 0x1F) `shiftL` 6)
.|. fromIntegral (c1 .&. 0x3F)
in if d >= 0x80
then chr (fromIntegral d) : go cs'
then chr d : go cs'
else replacementChar : go cs'
twoBytes _ cs' = replacementChar : go cs'
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment