setNumCapabilities can cause threads to get stuck in gcWorkerThread

I have a patch with some instrumentation (Phab:D4339) that proves that sometimes threads do not leave gcWorkerThread until the following gc.

I suspect it's caused by idle_caps being mutated in scheduleDoGC after the call to requestSync. A thread enters yieldCapability sees that itself is not idle, so enters gcWorkerThread, but then idle_caps is mutated so that that thread is idle, and it's spin locks are not touched by the garbage collector.

Potential fixes:

Don't look at idle_caps in the garbage collector when we're touching the spin-locks, just do it for all capabilities. I don't think this does any harm.
Don't mutate idle_caps after the call to requestSync; move that logic to before the call.

Of course, maybe I'm misunderstanding and this isn't a bug?

Edited Mar 10, 2019 by Douglas Wilson

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information