Cabal/Distribution/Utils/Generic.hs · 6e1871a54dcc4f2686256f6b24518b2fa0cb0d2e · Glasgow Haskell Compiler / Packages / Cabal

Modify replacement properties of `encodeStringUtf8`/`decodeStringUtf8` · 4aedd00c

Herbert Valerio Riedel authored Dec 03, 2017

This changes `decodeStringUtf8` to not replace U+FFFE and U+FFFF into
U+FFFD, while `encodeStringUtf8` now replaces surrogate pairs
(i.e. code-points U+D800 through U+DFFF which are invalid in UTF-8)
with U+FFFD.

Consequently, `decodeStringUtf8 . encodeStringUtf8` can now properly
round-trip all scalar code-points
(i.e. [U+0000..U+D7FF] ∪ [U+E000..U+10FFFF]).

This should finally address #4644

4aedd00c