Poor performance for "sequential" forkIOs
The ghcide
developers have noticed that overall performance is quite poor with the default Shake scheduler (see #18221, https://github.com/ndmitchell/shake/pull/751, https://github.com/digital-asset/ghcide/issues/503). My understanding of the default scheduler is need
ing a target results in a new thread being forkIO
'd and then the caller blocking on an MVar
(or some other sort of blocking synchronization primitive).
Intriguingly, they have noticed that rather using forkOn
instead of forkIO
improves the situation markedly. @wz1000 provided two eventlogs of similar workloads under the forkIO
and forkOn
schemes. Comparing the two, the problem becomes pretty clear:
ghcide
with forkOn
Here we see a single hover event when running with the forkOn
scheme, where all threads are forked on capability 0:
On the whole it's clear that there is little parallelism available here; we only ever have one core working at any given moment.
ghcide
with forkIO
Here we see a single hover event with the much less efficient forkIO
scheme:
Zooming in further reveals what is happening:
Here the blue lines here denote thread creations; the red lines denote thread migrations. It seems that every time a Shake action need
s a rule, we kick off a new Haskell thread, push it to another capability's run queue, and (nearly) immediately park the capability we are running on, resulting a "stair-stepping" effect in the eventlog. Between each pair of "stairs" in this eventlog there is an idle period of anywhere from 10 to 500 microseconds.
In text, the events above look like this:
...
41577543192: cap 0: running thread 15902
41577622290: cap 0: creating thread 15903
41577631265: cap 0: stopping thread 15902 (thread finished)
41577632790: cap 0: running thread 15903
41577637059: cap 0: stopping thread 15903 (thread yielding)
41577637660: cap 0: running thread 15903
41577671853: cap 0: creating thread 15904
41577674842: cap 0: stopping thread 15903 (thread finished)
41577675399: cap 0: running thread 15904
41577681143: cap 0: stopping thread 15904 (thread yielding)
41577681726: cap 0: running thread 15904
41577710211: cap 0: creating thread 15905
41577713041: cap 0: stopping thread 15904 (thread yielding)
41577714797: cap 0: migrating thread 15905 to cap 1
41577730170: cap 0: running thread 15904
41577731107: cap 0: stopping thread 15904 (thread finished)
41577788360: cap 1: running thread 15905
...
To service this one hover request there are likely over 400 such context switches. Assuming an average of 100us per context switch, this would result in nearly 40ms of lost time. Terrible.