setNumCapabilities can cause threads to get stuck in gcWorkerThread
I have a patch with some instrumentation (Phab:D4339) that proves that sometimes threads do not leave gcWorkerThread until the following gc.
I suspect it's caused by idle_caps being mutated in scheduleDoGC after the call to requestSync. A thread enters yieldCapability sees that itself is not idle, so enters gcWorkerThread, but then idle_caps is mutated so that that thread is idle, and it's spin locks are not touched by the garbage collector.
Potential fixes:
- Don't look at
idle_capsin the garbage collector when we're touching the spin-locks, just do it for all capabilities. I don't think this does any harm. - Don't mutate
idle_capsafter the call torequestSync; move that logic to before the call.
Of course, maybe I'm misunderstanding and this isn't a bug?
Edited by Douglas Wilson