rts: remove spinlocks and yields from parallel GC
Here we:
- eliminate the
schedYield
calls from the gc worker loop (any_work) - eliminate the
schedYield
calls fromwaitForGCThreads
- eliminate
gc_spin
andmut_spin
spinlocks used for GC entry/exit synchronisation
They are replaced with mutexes and condition variables. Entry/exit synchronisation is quite straightforward, the gc worker loop a little more subtle. A note elaborating the latter is included.
Draft PRs are:
In particular, the comments of !4619 (closed) have several charts showing significant performance improvements for ghc -j
. I will provide similar data here in the description for this MR once I have it.
Results from nofib are here: https://gitlab.haskell.org/-/snippets/1898
- bytes allocated: +0.04%
- mutator time: +1.93%
- GC cpu time: -61.35%
- GC wall time: +13.36
- wall time: -2.90%
- perf instructions: -4.9%
- perf cycles: -31.49%
I expect this to improve:
I've not written release notes entries. I'd appreciate help here. Likewise, I think migration notes should recommend users reassess their RTS options.