Skip to content

Awkward design around pinned ByteArray# and compaction

According to the documentation for GHC.Compact:

Pinned ByteArray# objects cannot be compacted. This is for a good reason: the memory is pinned so that it can be referenced by address (the address might be stored in a C data structure, for example), so we can't make a copy of it to store in the Compact.

We also provide a primop isByteArrayPinned# to allow users to opportunistically exploit existing pinned-ness of ByteArray#s to avoid copying when using the FFI or Storable.

However, these features disagree about when a ByteArray# is pinned:

  • Compaction considers a ByteArray# to be pinned only if it was explicitly pinned at creation-time with newPinnedByteArray# or newAlignedPinnedByteArray#.
  • In addition to explicitly pinned ByteArray#s, isByteArrayPinned# considers ByteArray#s in large-object heap blocks or in compact regions to be pinned.

This mismatch can easily lead to memory-unsafety. Here's a small example program:

import qualified Data.ByteString.Short as SBS
import qualified Data.ByteString.Char8 as BS
import GHC.Compact (compact, getCompact)
import Control.Monad (forM_)
import System.Mem (performMajorGC)

main :: IO ()
main = do
  x <- compact (SBS.fromShort (SBS.replicate 8000 48))
  let  printCompact v = BS.putStrLn (BS.take 20 (getCompact v))
  printCompact x

  performMajorGC
  forM_ [8000..9000] $ \i -> compact (SBS.fromShort (SBS.replicate i 49))

  printCompact x

This program seems innocent. But under the hood, SBS.replicate 8000 48 creates a large pinned-but-not-explicitly-pinned ByteArray#, and SBS.fromShort (since bytestring-0.11.1.0) sees that it is pinned and opportunistically creates a non-GC reference into it instead of making a copy. Then the underlying buffer is copied into the compact region x without issue since it was not explicitly pinned, but the non-GC reference in the ByteString cannot be updated and becomes a dangling pointer. (The subsequent code tries to demonstrate this by making two calls to printCompact x produce different output, though of course this isn't truly guaranteed to work.)

Of course, GHC.Compact is not a "Safe Haskell" module and ByteStrings are typically not compactible. But it's still a little unsatisfactory that this performs unsafe memory accesses instead of raising a CompactionFailed exception. Here are a few options:

  1. Prevent compaction of these implicitly pinned ByteArray#s.
  2. Do nothing, and just document this infelicity.
  3. Weaken isByteArrayPinned# to only consider explicitly pinned ByteArray#s.
  4. Provide a primop that makes an implictly pinned ByteArray# explicitly pinned, without copying its contents.
  5. Distinguish between ByteArray# references (which may happen to refer to pinned objects) and PinnedByteArray# references, and only refuse to compact the latter. (This means the distinction must exist at runtime. Perhaps both can be references to ARR_WORDS heap objects, but with different pointer tags?)
Edited by Andreas Klebinger
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information