Parallel GC scheduling problems

The parallel GC uses its own gang of threads separate from those used to run the program. This is causing performance loss, severe in some cases, especially when the number of GC threads and mutator threads equals the number of processor cores. In this case, when the GC spins up, the OS has to schedule N threads onto N cores, where all cores already have other threads running. It has to correctly choose to bump the old mutator threads off to make room for the new GC threads, but at least on Linux it doesn't always succeed in doing this, and there can be a delay while the scheduler sorts things out (as much as 50ms). The measurements I've been using to test the parallel GC so far have been mostly on single-threaded programs, so this problem only emerged recently.

Really we ought to be using the mutator threads as GC threads too. Things are made slightly more complicated by the fact that some of the mutator threads might not be awake when we GC, if not all cores are busy. Perhaps we should bite the bullet and try to set affinity masks.

If this is affecting you, try turning off the parallel GC, or reducing the number of threads it uses, with e.g. +RTS -g1.

Trac metadata

Trac field	Value
Version	6.10.1
Type	Bug
TypeOfFailure	OtherFailure
Priority	high
Resolution	Unresolved
Component	Runtime System
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message

Parallel GC scheduling problems