More segment sizes for the non-moving allocator
Summary
When using the non-moving collector, allocation into the non-moving generation happens through a set of free list allocators. There are allocators to provide blocks for each power of 2 up to the size limit for non-large objects.
This means that if you wish to allocate an object that isn't exactly a power of 2, then the next biggest allocator has to be used, leading to some wasted space. The worst case occurs when we want to allocate an object that is just bigger than a power of 2, and in this case we almost double the space usage of the object. On the flip side this also gives an upper bound on the amount of waste.
In practice, I've seen that this can impact common datatypes used in the Haskell ecosystem pretty badly. For instance:
-
(:)
,(,)
, HashMap's Leaf constructor, etc all take 3 words (24 bytes) to represent. So, they need to claim a 32 byte block, leading to 8 bytes of waste for each value on the heap. This might not seem like much but given that these values are likely to be be obliquitous, it can really add up. - HashMaps use a branching factor that is a power of 2 (16, or 32 items depending on the version). This is represented by an
Array#
with a power of 2 amount of elements. But as anArray#
also needs a length and an info pointer, this will push as once again just past a power of 2 leading to lots of waste.
Cases like this can lead to an approximate overhead of 25%. By adding some more allocator sizes between the existing ones (or making the allocator sizes configurable), I think we could alleviate this.
Data
For one particular application where the total waste was about 25%, I found that the following object sizes contributed the most to the waste:
object size (bytes) | percentage of waste (%) |
---|---|
24 | 28.85 |
40 | 16.36 |
144 | 10.64 |
216 | 9.95 |
136 | 8.82 |
192 | 6.40 |
48 | 4.88 |
80 | 3.25 |
72 | 1.57 |
56 | 1.41 |
Proposed solution
It looks like most of the overhead is coming from relatively small objects. In particular, in the above data, the top 10 contribute 92% of the waste and are all quite small objects.
In light of this I think it would be helpful to add some extra small allocators. I'm not quite sure how that should work.
One option is to just add every word size allocator up to a threshold, say 34 (32 + 2) words. This would mean that every Array#
created by HashMaps would fit perfectly into an allocator. This threshold could be made configurable so users could choose whether to optimise to reduce the amount of allocators or the amount of waste.
Currently I'm considering implementing something like this. What do folks think?
Alternatives
Another solution to issues like this is to introduce some sort of compaction, but I think that would be a lot more work to implement that just adding some more allocator sizes.