Corrupted codegen when passing ByteArray# to FFI
Summary
GHC uses an undocumented pointer adjustment when passing ByteArray#
into the FFI. Of this, @treeowl writes
This is FFI magic of the highest order.
Unfortunately, it's also magic that has gone awry. In the presence of sound uses of unsafeCoerce#
(and sufficient inlining), the FFI codegen fails to introduce this pointer adjustment. The requirement of sufficient inlining makes it difficult for users of GHC to predict when GHC will produce corrupt binaries. It also makes the UnliftedArray
machinery in primitive
unsound for users who want to store ByteArray
s in UnliftedArray
and then pass those ByteArray
s to foreign functions.
Steps to reproduce
It is necessary to compile with optimizations to generate a corrupt binary. There are two source files, unsound.hs
and foreign.c
, to be compiled with:
gcc -c foreign.c -o foreign.o
ghc -O2 -Wall foreign.o unsound.hs -o unsound
The haskell file looks big, but it's really simple. Most of it is just wrappers around MutableByteArray#
and MutableArrayArray#
that offer are heavily monomorphized version of the interface that the primitive
library exposes. For unsound.hs
, we have:
{-# language GADTSyntax #-}
{-# language KindSignatures #-}
{-# language UnliftedFFITypes #-}
{-# language ForeignFunctionInterface #-}
{-# language MagicHash #-}
{-# language UnboxedTuples #-}
import GHC.Exts
import GHC.Word
import GHC.IO
import Data.Kind (Type)
foreign import ccall unsafe "head_bytearray"
c_head_bytearray :: MutableByteArray# RealWorld -> IO Word8
main :: IO ()
main = do
mbs <- newByteArrays 1
mb0 <- luckySingleton
writeByteArrays mbs 0 mb0
print =<< flip readByteArray 0 =<< readByteArraysSuccess mbs 0
print =<< flip readByteArray 0 =<< readByteArraysSuccess mbs 0
print =<< headByteArray =<< readByteArraysSuccess mbs 0
-- This last line should produce the same result as the previous
-- three lines, but it does not.
print =<< headByteArray =<< readByteArraysFailure mbs 0
-- Take the first element of a byte array. It is not necessary to
-- use the FFI to do this, but we do so in order to demonstrate
-- how the FFI codegen can produce incorrect code.
headByteArray :: MutableByteArray -> IO Word8
headByteArray (MutableByteArray m#) = c_head_bytearray m#
-- An array of bytes
data MutableByteArray :: Type where
MutableByteArray :: MutableByteArray# RealWorld -> MutableByteArray
-- A mutable array of mutable byte arrays
data MutableByteArrays :: Type where
MutableByteArrays :: MutableArrayArray# RealWorld -> MutableByteArrays
-- Create a new mutable byte array of length 1 with the sole byte
-- set to the 42.
luckySingleton :: IO MutableByteArray
luckySingleton = IO $ \s0 -> case newByteArray# 1# s0 of
(# s1, marr# #) -> case writeWord8Array# marr# 0# 42## s1 of
s2 -> (# s2, MutableByteArray marr# #)
readByteArray :: MutableByteArray -> Int -> IO Word8
readByteArray (MutableByteArray b#) (I# i#) = IO $ \s0 ->
case readWord8Array# b# i# s0 of
(# s1, w #) -> (# s1, W8# w #)
-- Allocate a new array of mutable byte arrays. All elements are
-- uninitialized. Attempting to read them will cause a crash.
newByteArrays :: Int -> IO MutableByteArrays
newByteArrays (I# len#) = IO $ \s0 -> case newArrayArray# len# s0 of
(# s1, a# #) -> (# s1, MutableByteArrays a# #)
readByteArraysSuccess :: MutableByteArrays -> Int -> IO MutableByteArray
readByteArraysSuccess (MutableByteArrays maa#) (I# i#)
= IO $ \s -> case readMutableByteArrayArray# maa# i# s of
(# s', aa# #) -> (# s', MutableByteArray aa# #)
-- Should be semantically identical to readByteArraysSuccess. It usually
-- is, but when the returned element is used by a foreign function, GHC
-- generates incorrect code.
readByteArraysFailure :: MutableByteArrays -> Int -> IO MutableByteArray
readByteArraysFailure (MutableByteArrays maa#) (I# i#)
= IO $ \s -> case readArrayArrayArray# maa# i# s of
(# s', aa# #) -> (# s', MutableByteArray (unsafeCoerce# aa#) #)
-- Write a mutable byte array to the array of mutable byte arrays
-- at the given index.
writeByteArrays :: MutableByteArrays -> Int -> MutableByteArray -> IO ()
writeByteArrays (MutableByteArrays maa#) (I# i#) (MutableByteArray a) = IO $ \s0 ->
case writeMutableByteArrayArray# maa# i# a s0 of
s1 -> (# s1, () #)
And then the C file (foreign.c
) is just:
#include <stdint.h>
uint8_t head_bytearray (uint8_t *arr) {
return arr[0];
}
Expected behavior
The code should print the number 42 four times. Instead, it print the number 42 three times (correct behavior) and then prints 48.
Analysis
I'm not familiar with the part of GHC that does codegen for FFI calls. My hunch is that the unsafeCoerce#
from ArrayArray#
to MutableByteArray# RealWorld
somehow tricks the codegen into generating foreign-function code for a value of the wrong type.
Impact
This is devastating for the primitive
library, which uses unsafeCoerce#
pervasively in the methods that make up the interface to UnliftedArray
and MutableUnliftedArray
.
Environment
- GHC version used: 8.6.4, 8.6.5, and 8.8 alpha. I've not tested with older GHCs.
Optional:
- Operating System: Ubuntu
- System Architecture: x86_64