Awkward design around pinned ByteArray# and compaction
According to the documentation for GHC.Compact
:
Pinned
ByteArray#
objects cannot be compacted. This is for a good reason: the memory is pinned so that it can be referenced by address (the address might be stored in a C data structure, for example), so we can't make a copy of it to store in theCompact
.
We also provide a primop isByteArrayPinned#
to allow users to opportunistically exploit existing pinned-ness of ByteArray#
s to avoid copying when using the FFI or Storable
.
However, these features disagree about when a ByteArray#
is pinned:
- Compaction considers a
ByteArray#
to be pinned only if it was explicitly pinned at creation-time withnewPinnedByteArray#
ornewAlignedPinnedByteArray#
. - In addition to explicitly pinned
ByteArray#
s,isByteArrayPinned#
considersByteArray#
s in large-object heap blocks or in compact regions to be pinned.
This mismatch can easily lead to memory-unsafety. Here's a small example program:
import qualified Data.ByteString.Short as SBS
import qualified Data.ByteString.Char8 as BS
import GHC.Compact (compact, getCompact)
import Control.Monad (forM_)
import System.Mem (performMajorGC)
main :: IO ()
main = do
x <- compact (SBS.fromShort (SBS.replicate 8000 48))
let printCompact v = BS.putStrLn (BS.take 20 (getCompact v))
printCompact x
performMajorGC
forM_ [8000..9000] $ \i -> compact (SBS.fromShort (SBS.replicate i 49))
printCompact x
This program seems innocent. But under the hood, SBS.replicate 8000 48
creates a large pinned-but-not-explicitly-pinned ByteArray#
, and SBS.fromShort
(since bytestring-0.11.1.0
) sees that it is pinned and opportunistically creates a non-GC reference into it instead of making a copy. Then the underlying buffer is copied into the compact region x
without issue since it was not explicitly pinned, but the non-GC reference in the ByteString
cannot be updated and becomes a dangling pointer. (The subsequent code tries to demonstrate this by making two calls to printCompact x
produce different output, though of course this isn't truly guaranteed to work.)
Of course, GHC.Compact
is not a "Safe Haskell" module and ByteString
s are typically not compactible. But it's still a little unsatisfactory that this performs unsafe memory accesses instead of raising a CompactionFailed
exception. Here are a few options:
- Prevent compaction of these implicitly pinned
ByteArray#
s. - Do nothing, and just document this infelicity.
- Weaken
isByteArrayPinned#
to only consider explicitly pinnedByteArray#
s. - Provide a primop that makes an implictly pinned
ByteArray#
explicitly pinned, without copying its contents. - Distinguish between
ByteArray#
references (which may happen to refer to pinned objects) andPinnedByteArray#
references, and only refuse to compact the latter. (This means the distinction must exist at runtime. Perhaps both can be references toARR_WORDS
heap objects, but with different pointer tags?)