Skip to content

Long GC pauses while collecting compact regions

Summary

In one of our projects, we noticed very long GC pauses. We have been able to pin that down to cases where long-lived caches have become garbage. We keep these caches in the form of compact regions, and they can grow quite large (the largest are just around 128GB). While we realized that the release of compact regions is linear in the region's block list length, the timings we saw looked rather quadratic e.g.

Cache size [b] GC pause [s]
2_533_449_728 0.123
2_962_022_400 0.169
3_768_795_136 0.214
5_052_579_840 0.286
9_247_870_976 1.016
14_269_587_456 3.470

At this rate, we expect a pause of approx. 400s for 128 GB, and that's what happens. The problem seems to be that free_mblock_list can grow in proportion to the region's block list (please see the perf samples). This behavior can be reliably reproduced, so it's probably not just us. To do so I used the following program:

#!/usr/bin/env stack
{- stack
  --resolver lts-17.11
  script
  --optimize
  --package base
  --package ghc-compact
  --package primitive
  --ghc-options -threaded
  --ghc-options "-with-rtsopts=-N -I0 -qg -S"
-}

{-# LANGUAGE NumericUnderscores #-}

import Control.Monad (replicateM)
import Data.IORef
import Data.Primitive.ByteArray
import GHC.Compact
import System.Environment (getArgs)
import System.Mem (performGC)

main :: IO ()
main = do
  a <- getArgs
  let x = read $ head a
  compactStuffRef <- newIORef =<< allocCompactStuff x

  stuffSize <- compactSize =<< readIORef compactStuffRef
  putStrLn $ "Compact size [b]: " ++ show stuffSize

  putStrLn "*************************"
  putStrLn "Press any key to start GC"
  putStrLn "************************* "
  _ <- getChar

  writeIORef compactStuffRef undefined
  performGC

  putStrLn "*************************"
  putStrLn "After GC"
  putStrLn "************************* "
  return ()

allocCompactStuff n = do
  region <- compact' =<< chunk
  go 0 region
  where
    chunk = replicateM 512 $ allocBytes 3276 -- alloc just below LARGE_OBJECT_THRESHOLD

    go m region
      | m == n = pure region
      | otherwise = go (m + 1) =<< addChunk region

    -- adds a lot of fragmentation to the free_mblock_list
    addChunk region = compactAdd region . getCompact . head =<< replicateM 64 (compact' =<< chunk)

allocBytes :: Int -> IO ByteArray
allocBytes x = unsafeFreezeByteArray =<< newByteArray x

compact' :: a -> IO (Compact a)
compact' = compactSized 1_032_192 False

The goal I had in mind here is to "grow" the free_mblock_list proportionally to the length of some compact region (governed by the argument passed to the program) - it seems that this can be reliably achieved by allocating many smaller compacts and keeping only some of them. This more or less agrees with what we have seen in the actual program. If you measure GC times after releasing the "big" compact, it looks like this:

x Compact size [b] GC pause [s]
1_000 2_099_015_680 0.222
2_000 4_196_167_680 0.465
4_000 8_390_471_680 1.067
8_000 16_779_079_680 3.065
10_000 20_973_383_680 4.078
16_000 33_550_295_680 9.059

If you attach perf after "Press any key ...", the following picture emerges:

perf-bench.svg

which seems to favor the explanation of quadratic behavior emerging from loops in free_mega_group.

Steps to reproduce

Please run the attached program

Expected behavior

Keep GC pause linear in the size of compact region (perhaps a data structure better than linked-list would do)

Environment

  • GHC version used: 8.10.4, also reproduced on 8.8.4

Optional:

  • Operating System: Linux
  • System Architecture: x86_64
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information