Skip to content

RTS -N2 creates crazy amounts of slop

This simple, everyday program (gist for easy download) creates insane amounts of slop when run with +RTS -N2, but not with +RTS -N1

import qualified Data.Vector as V

data WordTuple = WordTuple {-# UNPACK #-} !Int {-# UNPACK #-} !Int

main :: IO ()
main = do
  putStrLn "Measuring 1M WordTuples"
  V.fromList
    [ WordTuple (fromIntegral (i+1)) (fromIntegral (i+2))
    | i <- [1..10000000::Int]
    ]
    `seq`
    return ()

Output with GHC 8.6.5:

$ ghc --make -O -threaded -fforce-recomp SlopProblem.hs && command time ./SlopProblem +RTS -N2 -s
...
     336,696,448 bytes maximum residency (14 sample(s))
     592,190,336 bytes maximum slop
             321 MB total memory in use (0 MB lost due to fragmentation)
... 1706624maxresident)k

Note how the maximum slop is 2x higher than maximum residency and total memory in use.

The problem disappears with -N1 and drops time RES to 600 MB:

     336,551,896 bytes maximum residency (11 sample(s))
       1,716,264 bytes maximum slop
             320 MB total memory in use (0 MB lost due to fragmentation)

Problems I see so far:

  • -N2 or more makes huge slop.
  • Slop isn't accounted in total memory in use or maximum residency, which is suprising.
  • There is brutal overhead: 1600 MB maxresident RAM is used for 16 MB user data (1M * 2 * 8-Byte-Int).

Subtasks

Edited by Niklas Hambüchen
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information