GC memory requirements are too pessimistic with the copying collector when using a lot of compact/large/pinned objects
Summary
I think the logic that decides how much memory we need at the end of a GC in rts/sm/GC.c
is too pessimistic.
It assumes that all live data is copyable and therefore needs a 2x overhead. But even when using the copying collector, we have things on the heap that don't get moved, eg, compact, large and pinned blocks.
Current behaviour
Currently we calculate the memory requirement by scaling the live block size by a factor that is influenced by the RTS's F
and Fd
flags and the garbage collector type. If we are using the copying collector then the baseline is 2, and for the non-moving or compacting collector we have 1.2.
This means that if say we have a heap that consists of 1GB of normal data and 4GB of compact/large objects, then we'd need at least (1 + 4) * (2 + F)
GB memory. Yet, the 2x overhead should only apply to the data that can be copied, so really we should be able to get away with (1 * 2 + 4 * 1.2) * F
.
Most Haskell applications won't suffer from this as the percentage of the heap taken up by large/compact objects is probably quite small, but especially if you are using compact objects heavily it's quite likely that they'd make up a large part of your heap.
Suggested behaviour
Irrespective of the collector in use, split the heap into moving and non-moving parts and apply the appropriate factor for each, ie, 2x for the moving part, and 1.2x for the non-moving part (though maybe this could go lower? not sure why we are using this precise number). This wouldn't change anything for the non-moving gc as the moving part of the heap would always be empty.