UTF-8 decoding doesn't work correctly on GHC 9.2.1+AArch64 NCG
Summary
With GHC 9.2.1, GHC.IO.Encoding.utf8
fails to decode non-ASCII text.
It happens on
- arm64-darwin / AArch64 NCG / ghc-9.2 branch
- aarch64-linux / AArch64 NCG / GHC 9.2.1
but does not happen on
- x86_64
- arm64-darwin / LLVM backend / GHC 8.10.7
- arm64-darwin / base library built with LLVM backend (
hadrian/build --flavour=default+llvm
) / ghc-9.2 branch - arm64-darwin / AArch64 NCG / master branch
So I guess that a bug in AArch64 NCG whose fix is not backported to ghc-9.2 is affecting here.
Steps to reproduce
The following program
import GHC.Foreign
import GHC.IO.Encoding
import qualified Foreign.C.String as F
main = withCStringLen utf8 "\x1F424" $ \csl -> do
s <- F.peekCAStringLen csl
print s
s <- peekCStringLen utf8 csl
print s
prints
"\240\159\144\164"
string.hs: recoverDecode: invalid argument (invalid byte sequence)
Expected behavior
It should output
"\240\159\144\164"
"\128036"
Environment
- GHC version used: 9.2.1 release / ghc-9.2 branch (ac4496ce)
- Operating System: macOS
- System Architecture: AArch64