Commit e78841b5 authored by rwbarton's avatar rwbarton Committed by Ben Gamari

Update encoding001 to test the full range of non-surrogate code points

GHC has used surrogate code points for roundtripping since 7.4.
See Note [Roundtripping].

Also, improve the wording of that Note slightly.

Test Plan: validate still passes

Reviewers: austin, hvr, bgamari

Reviewed By: bgamari

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D1087
parent 76e2341a
......@@ -74,21 +74,22 @@ data CodingFailureMode
-- unicode input that includes lone surrogate codepoints is invalid by
-- definition.
--
--
-- When we used private-use characters there was a technical problem when it
-- came to encoding back to bytes using iconv. The iconv code will not fail when
-- it tries to encode a private-use character (as it would if trying to encode
-- a surrogate), which means that we wouldn't get a chance to replace it
-- with the byte we originally escaped.
--
-- To work around this, when filling the buffer to be encoded (in
-- writeBlocks/withEncodedCString/newEncodedCString), we replaced the
-- private-use characters with lone surrogates again! Likewise, when
-- private-use characters with lone surrogates again! Likewise, when
-- reading from a buffer (unpack/unpack_nl/peekEncodedCString) we had
-- to do the inverse process.
--
-- The user of String would never see these lone surrogates, but it
-- The user of String would never see these lone surrogates, but it
-- ensures that iconv will throw an error when encountering them. We
-- ensured that iconv will throw an error when encountering them. We
-- used lone surrogates in the range 0xDC00 to 0xDCFF for this purpose.
codingFailureModeSuffix :: CodingFailureMode -> String
codingFailureModeSuffix ErrorOnCodingFailure = ""
......
......@@ -29,14 +29,7 @@ main = do
chr (fromIntegral (x `shiftR` 8) .&. 0xff),
chr (fromIntegral x .&. 0xff) ]
hPutStr h (concatMap expand32 [ 0, 32 .. 0xD7ff ])
-- We avoid the private-use characters at 0xEF00..0xEFFF
-- that reserved for GHC's PEP383 roundtripping implementation.
--
-- The reason is that currently normal text containing those
-- characters will be mangled, even if we aren't using an encoding
-- created using //ROUNDTRIP.
hPutStr h (concatMap expand32 [ 0xE000, 0xE000+32 .. 0xEEFF ])
hPutStr h (concatMap expand32 [ 0xF000, 0xF000+32 .. 0x10FFFF ])
hPutStr h (concatMap expand32 [ 0xE000, 0xE000+32 .. 0x10FFFF ])
hClose h
-- convert the UTF-32BE file into each other encoding
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment