Instead of a separate context-switch flag, set HpLim to zero
This reduces the latency between a context-switch being triggered and the thread returning to the scheduler, which in turn should reduce the cost of the GC barrier when there are many cores. We still retain the old context_switch flag which is checked at the end of each block of allocation. The idea is that setting HpLim may fail if the the target thread is modifying HpLim at the same time; the context_switch flag is a fallback. It also allows us to "context switch soon" without forcing an immediate switch, which can be costly.
Showing with 62 additions and 50 deletions