RTS -N2 creates crazy amounts of slop
This simple, everyday program (gist for easy download) creates insane amounts of slop when run with +RTS -N2, but not with +RTS -N1
import qualified Data.Vector as V
data WordTuple = WordTuple {-# UNPACK #-} !Int {-# UNPACK #-} !Int
main :: IO ()
main = do
putStrLn "Measuring 1M WordTuples"
V.fromList
[ WordTuple (fromIntegral (i+1)) (fromIntegral (i+2))
| i <- [1..10000000::Int]
]
`seq`
return ()
Output with GHC 8.6.5:
$ ghc --make -O -threaded -fforce-recomp SlopProblem.hs && command time ./SlopProblem +RTS -N2 -s
...
336,696,448 bytes maximum residency (14 sample(s))
592,190,336 bytes maximum slop
321 MB total memory in use (0 MB lost due to fragmentation)
... 1706624maxresident)k
Note how the maximum slop is 2x higher than maximum residency and total memory in use.
The problem disappears with -N1 and drops time RES to 600 MB:
336,551,896 bytes maximum residency (11 sample(s))
1,716,264 bytes maximum slop
320 MB total memory in use (0 MB lost due to fragmentation)
Problems I see so far:
-
-N2or more makes huge slop. - Slop isn't accounted in
total memory in useormaximum residency, which is suprising. - There is brutal overhead: 1600 MB
maxresidentRAM is used for 16 MB user data (1M * 2 * 8-Byte-Int).
Subtasks
-
Figure out what causes the slop - done in #18849 (comment 308938)
-
Figure out unexplained huge maxresidentdifference from #18849 (comment 308921) -
Figure out unxexplained bytes copied during GCdifference from #18849 (comment 308921)
Edited by Niklas Hambüchen