Awkward design around pinned ByteArray# and compaction
According to the documentation for GHC.Compact:
Pinned
ByteArray#objects cannot be compacted. This is for a good reason: the memory is pinned so that it can be referenced by address (the address might be stored in a C data structure, for example), so we can't make a copy of it to store in theCompact.
We also provide a primop isByteArrayPinned# to allow users to opportunistically exploit existing pinned-ness of ByteArray#s to avoid copying when using the FFI or Storable.
However, these features disagree about when a ByteArray# is pinned:
- Compaction considers a
ByteArray#to be pinned only if it was explicitly pinned at creation-time withnewPinnedByteArray#ornewAlignedPinnedByteArray#. - In addition to explicitly pinned
ByteArray#s,isByteArrayPinned#considersByteArray#s in large-object heap blocks or in compact regions to be pinned.
This mismatch can easily lead to memory-unsafety. Here's a small example program:
import qualified Data.ByteString.Short as SBS
import qualified Data.ByteString.Char8 as BS
import GHC.Compact (compact, getCompact)
import Control.Monad (forM_)
import System.Mem (performMajorGC)
main :: IO ()
main = do
x <- compact (SBS.fromShort (SBS.replicate 8000 48))
let printCompact v = BS.putStrLn (BS.take 20 (getCompact v))
printCompact x
performMajorGC
forM_ [8000..9000] $ \i -> compact (SBS.fromShort (SBS.replicate i 49))
printCompact x
This program seems innocent. But under the hood, SBS.replicate 8000 48 creates a large pinned-but-not-explicitly-pinned ByteArray#, and SBS.fromShort (since bytestring-0.11.1.0) sees that it is pinned and opportunistically creates a non-GC reference into it instead of making a copy. Then the underlying buffer is copied into the compact region x without issue since it was not explicitly pinned, but the non-GC reference in the ByteString cannot be updated and becomes a dangling pointer. (The subsequent code tries to demonstrate this by making two calls to printCompact x produce different output, though of course this isn't truly guaranteed to work.)
Of course, GHC.Compact is not a "Safe Haskell" module and ByteStrings are typically not compactible. But it's still a little unsatisfactory that this performs unsafe memory accesses instead of raising a CompactionFailed exception. Here are a few options:
- Prevent compaction of these implicitly pinned
ByteArray#s. - Do nothing, and just document this infelicity.
- Weaken
isByteArrayPinned#to only consider explicitly pinnedByteArray#s. - Provide a primop that makes an implictly pinned
ByteArray#explicitly pinned, without copying its contents. - Distinguish between
ByteArray#references (which may happen to refer to pinned objects) andPinnedByteArray#references, and only refuse to compact the latter. (This means the distinction must exist at runtime. Perhaps both can be references toARR_WORDSheap objects, but with different pointer tags?)