Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • GHC GHC
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 4,866
    • Issues 4,866
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 455
    • Merge requests 455
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Releases
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Glasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #16650
Closed
Open
Created May 11, 2019 by Andrew Martin@andrewthadDeveloper

Corrupted codegen when passing ByteArray# to FFI

Summary

GHC uses an undocumented pointer adjustment when passing ByteArray# into the FFI. Of this, @treeowl writes

This is FFI magic of the highest order.

Unfortunately, it's also magic that has gone awry. In the presence of sound uses of unsafeCoerce# (and sufficient inlining), the FFI codegen fails to introduce this pointer adjustment. The requirement of sufficient inlining makes it difficult for users of GHC to predict when GHC will produce corrupt binaries. It also makes the UnliftedArray machinery in primitive unsound for users who want to store ByteArrays in UnliftedArray and then pass those ByteArrays to foreign functions.

Steps to reproduce

It is necessary to compile with optimizations to generate a corrupt binary. There are two source files, unsound.hs and foreign.c, to be compiled with:

gcc -c foreign.c -o foreign.o
ghc -O2 -Wall foreign.o unsound.hs -o unsound

The haskell file looks big, but it's really simple. Most of it is just wrappers around MutableByteArray# and MutableArrayArray# that offer are heavily monomorphized version of the interface that the primitive library exposes. For unsound.hs, we have:

{-# language GADTSyntax #-}
{-# language KindSignatures #-}
{-# language UnliftedFFITypes #-}
{-# language ForeignFunctionInterface #-}
{-# language MagicHash #-}
{-# language UnboxedTuples #-}

import GHC.Exts
import GHC.Word
import GHC.IO
import Data.Kind (Type)

foreign import ccall unsafe "head_bytearray"
  c_head_bytearray :: MutableByteArray# RealWorld -> IO Word8

main :: IO ()
main = do
  mbs <- newByteArrays 1
  mb0 <- luckySingleton
  writeByteArrays mbs 0 mb0
  print =<< flip readByteArray 0 =<< readByteArraysSuccess mbs 0
  print =<< flip readByteArray 0 =<< readByteArraysSuccess mbs 0
  print =<< headByteArray =<< readByteArraysSuccess mbs 0
  -- This last line should produce the same result as the previous
  -- three lines, but it does not.
  print =<< headByteArray =<< readByteArraysFailure mbs 0

-- Take the first element of a byte array. It is not necessary to
-- use the FFI to do this, but we do so in order to demonstrate
-- how the FFI codegen can produce incorrect code.
headByteArray :: MutableByteArray -> IO Word8
headByteArray (MutableByteArray m#) = c_head_bytearray m#

-- An array of bytes
data MutableByteArray :: Type where
  MutableByteArray :: MutableByteArray# RealWorld -> MutableByteArray

-- A mutable array of mutable byte arrays
data MutableByteArrays :: Type where
  MutableByteArrays :: MutableArrayArray# RealWorld -> MutableByteArrays

-- Create a new mutable byte array of length 1 with the sole byte
-- set to the 42.
luckySingleton :: IO MutableByteArray
luckySingleton = IO $ \s0 -> case newByteArray# 1# s0 of
  (# s1, marr# #) -> case writeWord8Array# marr# 0# 42## s1 of
    s2 -> (# s2, MutableByteArray marr# #)

readByteArray :: MutableByteArray -> Int -> IO Word8
readByteArray (MutableByteArray b#) (I# i#) = IO $ \s0 ->
  case readWord8Array# b# i# s0 of
    (# s1, w #) -> (# s1, W8# w #)

-- Allocate a new array of mutable byte arrays. All elements are
-- uninitialized. Attempting to read them will cause a crash.
newByteArrays :: Int -> IO MutableByteArrays
newByteArrays (I# len#) = IO $ \s0 -> case newArrayArray# len# s0 of
  (# s1, a# #) -> (# s1, MutableByteArrays a# #)

readByteArraysSuccess :: MutableByteArrays -> Int -> IO MutableByteArray
readByteArraysSuccess (MutableByteArrays maa#) (I# i#)
  = IO $ \s -> case readMutableByteArrayArray# maa# i# s of
      (# s', aa# #) -> (# s', MutableByteArray aa# #)

-- Should be semantically identical to readByteArraysSuccess. It usually
-- is, but when the returned element is used by a foreign function, GHC
-- generates incorrect code.
readByteArraysFailure :: MutableByteArrays -> Int -> IO MutableByteArray
readByteArraysFailure (MutableByteArrays maa#) (I# i#)
  = IO $ \s -> case readArrayArrayArray# maa# i# s of
      (# s', aa# #) -> (# s', MutableByteArray (unsafeCoerce# aa#) #)

-- Write a mutable byte array to the array of mutable byte arrays
-- at the given index.
writeByteArrays :: MutableByteArrays -> Int -> MutableByteArray -> IO ()
writeByteArrays (MutableByteArrays maa#) (I# i#) (MutableByteArray a) = IO $ \s0 ->
  case writeMutableByteArrayArray# maa# i# a s0 of
    s1 -> (# s1, () #)

And then the C file (foreign.c) is just:

#include <stdint.h>

uint8_t head_bytearray (uint8_t *arr) {
  return arr[0];
}

Expected behavior

The code should print the number 42 four times. Instead, it print the number 42 three times (correct behavior) and then prints 48.

Analysis

I'm not familiar with the part of GHC that does codegen for FFI calls. My hunch is that the unsafeCoerce# from ArrayArray# to MutableByteArray# RealWorld somehow tricks the codegen into generating foreign-function code for a value of the wrong type.

Impact

This is devastating for the primitive library, which uses unsafeCoerce# pervasively in the methods that make up the interface to UnliftedArray and MutableUnliftedArray.

Environment

  • GHC version used: 8.6.4, 8.6.5, and 8.8 alpha. I've not tested with older GHCs.

Optional:

  • Operating System: Ubuntu
  • System Architecture: x86_64
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking