Parallel GC with more -N than physical processors gives dramatic slowdown (75 times)

Running hlint 1.6.21 (http://hackage.haskell.org/package/hlint) with GHC 6.12.1 on a Windows laptop with two cores, I get:

timer hlint src +RTS -N1 -qg = 1.344 seconds
timer hlint src +RTS -N2 -qg = 1.000 seconds
timer hlint src +RTS -N3 -qg = 0.984 seconds
timer hlint src +RTS -N4 -qg = 1.016 seconds
timer hlint src +RTS -N1 = 1.344 seconds
timer hlint src +RTS -N2 = 0.969 seconds
timer hlint src +RTS -N3 = 76.563 seconds

At -N1, -qg has no effect (as expected)

At -N2, -qg has a small positive effect (I repeated the benchmarks many times, so the effect is there)

At -N3, -qg is essential or it takes forever

The result seems to be that if you overschedule your garbage collector it goes totally crazy. People often use -N with a higher number than their processors, since it nicely allows IO and computation to be interleaved. GC should probably drop that -N down if it gets lots of contention.

This is a performance regression from GHC 6.10.4, where HLint worked fine with +RTS -N3. I only caught this regression as my test suite started to take forever to run.

Trac metadata

Trac field	Value
Version	6.12.1
Type	Bug
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Runtime System
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message

Parallel GC with more -N than physical processors gives dramatic slowdown (75 times)