Skip to content

WIP: Make bdescr more dense

Ben Gamari requested to merge bgamari/ghc:wip/drop-bdescr-start into master

Currently the block descriptor (bdescr) struct is sized to take a single cache line. This is nice as it eliminates any possibility of false sharing between GC threads. On the other hand, there is a lot of redundancy in this representation:

  • The start field is completely redundant and can be computed from the block descriptor address
  • The gen and gen_no fields are redundant: bd->gen == &generations[bd->gen_no] (#15414)
  • The free field is a pointer but can only range over a single megablock

Given how often we look at block descriptors and how brutal GHC's runtime tends to be on caches, I hypothesized that we would be better off with a denser (but slightly more computationally intensive) representation. I test this hypothesis here. In short I:

  • eliminate the start field entirely
  • drop the gen field in favor of the much smaller gen_no
  • turn the free field into a 32-bit offset
  • rearrange the struct to pack nicely into half a cacheline

I have done very little benchmarking of this but initial indications suggest that this improves overall runtime by a couple percent in single-threaded nofib tests.

Merge request reports