Long GC pauses while collecting compact regions
Summary
In one of our projects, we noticed very long GC pauses. We have been able to pin that down to cases where long-lived caches have become garbage. We keep these caches in the form of compact regions, and they can grow quite large (the largest are just around 128GB). While we realized that the release of compact regions is linear in the region's block list length, the timings we saw looked rather quadratic e.g.
Cache size [b] | GC pause [s] |
---|---|
2_533_449_728 | 0.123 |
2_962_022_400 | 0.169 |
3_768_795_136 | 0.214 |
5_052_579_840 | 0.286 |
9_247_870_976 | 1.016 |
14_269_587_456 | 3.470 |
At this rate, we expect a pause of approx. 400s for 128 GB, and that's what happens. The problem seems to be that free_mblock_list
can grow in proportion to the region's block list (please see the perf samples). This behavior can be reliably reproduced, so it's probably not just us. To do so I used the following program:
#!/usr/bin/env stack
{- stack
--resolver lts-17.11
script
--optimize
--package base
--package ghc-compact
--package primitive
--ghc-options -threaded
--ghc-options "-with-rtsopts=-N -I0 -qg -S"
-}
{-# LANGUAGE NumericUnderscores #-}
import Control.Monad (replicateM)
import Data.IORef
import Data.Primitive.ByteArray
import GHC.Compact
import System.Environment (getArgs)
import System.Mem (performGC)
main :: IO ()
main = do
a <- getArgs
let x = read $ head a
compactStuffRef <- newIORef =<< allocCompactStuff x
stuffSize <- compactSize =<< readIORef compactStuffRef
putStrLn $ "Compact size [b]: " ++ show stuffSize
putStrLn "*************************"
putStrLn "Press any key to start GC"
putStrLn "************************* "
_ <- getChar
writeIORef compactStuffRef undefined
performGC
putStrLn "*************************"
putStrLn "After GC"
putStrLn "************************* "
return ()
allocCompactStuff n = do
region <- compact' =<< chunk
go 0 region
where
chunk = replicateM 512 $ allocBytes 3276 -- alloc just below LARGE_OBJECT_THRESHOLD
go m region
| m == n = pure region
| otherwise = go (m + 1) =<< addChunk region
-- adds a lot of fragmentation to the free_mblock_list
addChunk region = compactAdd region . getCompact . head =<< replicateM 64 (compact' =<< chunk)
allocBytes :: Int -> IO ByteArray
allocBytes x = unsafeFreezeByteArray =<< newByteArray x
compact' :: a -> IO (Compact a)
compact' = compactSized 1_032_192 False
The goal I had in mind here is to "grow" the free_mblock_list
proportionally to the length of some compact region (governed by the argument passed to the program) - it seems that this can be reliably achieved by allocating many smaller compacts and keeping only some of them. This more or less agrees with what we have seen in the actual program. If you measure GC times after releasing the "big" compact, it looks like this:
x | Compact size [b] | GC pause [s] |
---|---|---|
1_000 | 2_099_015_680 | 0.222 |
2_000 | 4_196_167_680 | 0.465 |
4_000 | 8_390_471_680 | 1.067 |
8_000 | 16_779_079_680 | 3.065 |
10_000 | 20_973_383_680 | 4.078 |
16_000 | 33_550_295_680 | 9.059 |
If you attach perf
after "Press any key ...", the following picture emerges:
which seems to favor the explanation of quadratic behavior emerging from loops in free_mega_group
.
Steps to reproduce
Please run the attached program
Expected behavior
Keep GC pause linear in the size of compact region (perhaps a data structure better than linked-list would do)
Environment
- GHC version used: 8.10.4, also reproduced on 8.8.4
Optional:
- Operating System: Linux
- System Architecture: x86_64