(super!) linear slowdown of parallel builds on 40 core machine
I'm seeing slowdowns in parallel builds of a (simple!!) 6 module project when I
build it on a 40 core server that I'm using for work. For any given ghc invocation
with -jn, once n>10, I start to see a super-linear slow down as a function of n,
here are some basic numbers,
| compile time | |
|---|---|
| -j1 | 0m2.693s |
| -j4 | 0m2.507s |
| -j10 | 0m2.763s |
| -j25 | 0m12.634s |
| -j30 | 0m39.154s |
| -j40 | 0m57.511s |
| -j60 | 2m21.821s |
These timings are another 2-4x worse if ghc is invoked indirectly via cabal-install / Setup.hs
According to the linux utility latencytop, 100% of ghc's cpu time was spent on user-space lock contention when I did the -j40 invocation.
The timing in the -j40 case stayed the same even when ghc was also passed -O0 (and -fforce-recomp to ensure it did the same )
A bit of experimentation makes me believe that *any* cabalized project on a 40 core machine will exhibit this performance issue.
cabal clean
cabal configure --ghc-options="-j"
cabal build -j1
should be enough to trigger the lock contention.
That said, I'll try to cook up a minimal repro that i can share the source for post haste.