WIP: Make bdescr more dense
Currently the block descriptor (bdescr) struct is sized to take a single cache line. This is nice as it eliminates any possibility of false sharing between GC threads. On the other hand, there is a lot of redundancy in this representation:
- The
startfield is completely redundant and can be computed from the block descriptor address - The
genandgen_nofields are redundant:bd->gen == &generations[bd->gen_no](#15414) - The
freefield is a pointer but can only range over a single megablock
Given how often we look at block descriptors and how brutal GHC's runtime tends to be on caches, I hypothesized that we would be better off with a denser (but slightly more computationally intensive) representation. I test this hypothesis here. In short I:
- eliminate the
startfield entirely - drop the
genfield in favor of the much smallergen_no - turn the
freefield into a 32-bit offset - rearrange the struct to pack nicely into half a cacheline
I have done very little benchmarking of this but initial indications suggest that this improves overall runtime by a couple percent in single-threaded nofib tests.