WIP: Make bdescr more dense
Currently the block descriptor (bdescr
) struct is sized to take a single cache line. This is nice as it eliminates any possibility of false sharing between GC threads. On the other hand, there is a lot of redundancy in this representation:
- The
start
field is completely redundant and can be computed from the block descriptor address - The
gen
andgen_no
fields are redundant:bd->gen == &generations[bd->gen_no]
(#15414) - The
free
field is a pointer but can only range over a single megablock
Given how often we look at block descriptors and how brutal GHC's runtime tends to be on caches, I hypothesized that we would be better off with a denser (but slightly more computationally intensive) representation. I test this hypothesis here. In short I:
- eliminate the
start
field entirely - drop the
gen
field in favor of the much smallergen_no
- turn the
free
field into a 32-bit offset - rearrange the struct to pack nicely into half a cacheline
I have done very little benchmarking of this but initial indications suggest that this improves overall runtime by a couple percent in single-threaded nofib
tests.