Skip to content

Putting large objects in compact regions causes a segfault under the right conditions

Summary

When putting a large object in a compact region, and then several smaller ones afterwards, the program crashes with a segfault because the GC tries to follow an invalid pointer.

Steps to reproduce

The following Haskell program reliably causes a segfault for me:

import Data.Traversable (for)
import Data.Primitive.ByteArray
import GHC.Compact
import System.Mem (performGC)

main :: IO ()
main = do
  -- Make a fresh compact region. So far, it only contains a single almost empty small block
  c <- compact ()
  print =<< compactSize c -- prints 32768, as the default block size is 32K
  -- Now add a big object to the compact region. The size is chosen so that the compact region
  -- allocator will allocate two megablocks while actually leaving space in the first megablock.
  let
    totalSizeInWords = 129016
    arraySizeInWords = totalSizeInWords - 2 {- header words -}
  big <- newByteArray (arraySizeInWords * 8)
  bigFrozen <- unsafeFreezeByteArray big

  -- Add the big object to the compact region.
  -- It is larger than the existing block, and also larger than the big object threshold.
  -- The `ByteArray` wrapper is likely put still into the first block,
  -- but the `ByteArray#` is put in a new block.
  -- One can fit 129024 words into a megablock. Our object is 129016 words in size,
  -- but the allocator reserves a bit too much space, so it needs to allocate 129025 words,
  -- and therefore two megablocks.
  _ <- compactAdd c bigFrozen
  print =<< compactSize c -- prints 2113536, because it allocated a megablock group of 2 megablocks

  -- Now we have to fill the first block, which is currently the nursery,
  -- and eventually, one of the Ints will be added to the second block,
  -- causing the segfault.
  placeholders <- for [0 :: Int ..10000] $ \i -> do
    r <- getCompact <$> compactAdd c i
    print i
    performGC
    pure r

  -- Keep the placeholders alive so the GC actually has a pointer to follow into the invalid
  -- parts of the compact region.
  print $ length placeholders

I dug a bit into the RTS source code, but I don't yet fully understand what is going on.

The segfault above is the result of an invariant about megablock groups that is violated by allocateForCompact. In the example program, when we start adding the Ints in a loop, it will first allocate them from the nursery, which at that point is still the first block of the compact region. I think that those allocations happen in the ALLOCATE macro (https://gitlab.haskell.org/ghc/ghc/blob/ghc-8.6.3-release/rts/Compact.cmm#L25).

Then, once the nursery is full, allocateForCompact is called (https://gitlab.haskell.org/ghc/ghc/blob/ghc-8.6.3-release/rts/Compact.cmm#L29).

There, it first does the same check on the nursery as the ALLOCATE macro (https://gitlab.haskell.org/ghc/ghc/blob/ghc-8.6.3-release/rts/sm/CNF.c#L479), which obviously fails. Then it proceeds, and it will find that the Int is not a large object, therefore skipping the special logic there (https://gitlab.haskell.org/ghc/ghc/blob/ghc-8.6.3-release/rts/sm/CNF.c#L490).

However, then it will find that the nursery is full (https://gitlab.haskell.org/ghc/ghc/blob/ghc-8.6.3-release/rts/sm/CNF.c#L501) and pick the next block that is not full as new nursery. In the example above, there is a new block: the one that was allocated for the large array. It consists of 2 megablocks, but it's free pointer still points within the first megablock, and there's enough room, so it picks that block. But once that block has been chosen as a nursery, it will start allocating closures even in the second megablock of that group. If I understand the comment in BlockAlloc.c (https://gitlab.haskell.org/ghc/ghc/blob/ghc-8.6.3-release/rts/sm/BlockAlloc.c#L120) correctly, this violates the invariants of megablock groups, as closures that are allocated there don't have a valid bdescr of the block they've been placed into.

The remaining question is: why is there still space left in the first megablock, when the allocator actually allocated two megablocks? I think this is because the size computation for large objects at https://gitlab.haskell.org/ghc/ghc/blob/ghc-8.6.3-release/rts/sm/CNF.c#L491. It reserves room for an StgCompactNFData while only space for a StgCompactNFDataBlock is needed.

I tried to fix this by changing it to not picking megablock groups as nursery, and that indeed fixed the issue above, but then we got other segfaults that I wasn't able to reliably reproduce yet, as they occurred as part of a large application.

It could be that the loop that tries to fit the allocation in subsequent blocks (https://gitlab.haskell.org/ghc/ghc/blob/ghc-8.6.3-release/rts/sm/CNF.c#L517) is at fault, as it would still put more than one object into a megablock allocation, while https://gitlab.haskell.org/ghc/ghc/blob/ghc-8.6.3-release/rts/sm/BlockAlloc.c#L117 seems to imply that only a single object should be put in those.

Additionally, the bdescr initialization in compactAllocateBlockInternal (https://gitlab.haskell.org/ghc/ghc/blob/ghc-8.6.3-release/rts/sm/CNF.c#L247) seems to write into the data area of the allocation when the allocated area is larger than one megablock.

Expected behavior

I expect the program to print the sizes of the compact regions and then all numbers from 0 to 10000, and not segfault after printing number 2046.

Environment

  • GHC version used: 8.6.3

Optional:

  • Operating System: Ubuntu 18.04
  • System Architecture: x86_64
Edited by Fabian Thorand
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information