Parallel GC with more -N than physical processors gives dramatic slowdown (75 times)
Running hlint 1.6.21 (http://hackage.haskell.org/package/hlint) with GHC 6.12.1 on a Windows laptop with two cores, I get:
timer hlint src +RTS -N1 -qg = 1.344 seconds
timer hlint src +RTS -N2 -qg = 1.000 seconds
timer hlint src +RTS -N3 -qg = 0.984 seconds
timer hlint src +RTS -N4 -qg = 1.016 seconds
timer hlint src +RTS -N1 = 1.344 seconds
timer hlint src +RTS -N2 = 0.969 seconds
timer hlint src +RTS -N3 = 76.563 seconds
At -N1, -qg has no effect (as expected)
At -N2, -qg has a small positive effect (I repeated the benchmarks many times, so the effect is there)
At -N3, -qg is essential or it takes forever
The result seems to be that if you overschedule your garbage collector it goes totally crazy. People often use -N with a higher number than their processors, since it nicely allows IO and computation to be interleaved. GC should probably drop that -N down if it gets lots of contention.
This is a performance regression from GHC 6.10.4, where HLint worked fine with +RTS -N3. I only caught this regression as my test suite started to take forever to run.
Trac metadata
| Trac field | Value |
|---|---|
| Version | 6.12.1 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Runtime System |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture |