RTS -N2 creates crazy amounts of slop
This simple, everyday program (gist for easy download) creates insane amounts of slop
when run with +RTS -N2
, but not with +RTS -N1
import qualified Data.Vector as V
data WordTuple = WordTuple {-# UNPACK #-} !Int {-# UNPACK #-} !Int
main :: IO ()
main = do
putStrLn "Measuring 1M WordTuples"
V.fromList
[ WordTuple (fromIntegral (i+1)) (fromIntegral (i+2))
| i <- [1..10000000::Int]
]
`seq`
return ()
Output with GHC 8.6.5:
$ ghc --make -O -threaded -fforce-recomp SlopProblem.hs && command time ./SlopProblem +RTS -N2 -s
...
336,696,448 bytes maximum residency (14 sample(s))
592,190,336 bytes maximum slop
321 MB total memory in use (0 MB lost due to fragmentation)
... 1706624maxresident)k
Note how the maximum slop
is 2x higher than maximum residency
and total memory in use
.
The problem disappears with -N1
and drops time
RES to 600 MB:
336,551,896 bytes maximum residency (11 sample(s))
1,716,264 bytes maximum slop
320 MB total memory in use (0 MB lost due to fragmentation)
Problems I see so far:
-
-N2
or more makes huge slop. - Slop isn't accounted in
total memory in use
ormaximum residency
, which is suprising. - There is brutal overhead: 1600 MB
maxresident
RAM is used for 16 MB user data (1M * 2 * 8-Byte-Int
).
Subtasks
-
Figure out what causes the slop - done in #18849 (comment 308938)
-
Figure out unexplained huge maxresident
difference from #18849 (comment 308921) -
Figure out unxexplained bytes copied during GC
difference from #18849 (comment 308921)