Poor performance for "sequential" forkIOs
ghcide developers have noticed that overall performance is quite poor with the default Shake scheduler (see #18221, https://github.com/ndmitchell/shake/pull/751, https://github.com/digital-asset/ghcide/issues/503). My understanding of the default scheduler is
needing a target results in a new thread being
forkIO'd and then the caller blocking on an
MVar (or some other sort of blocking synchronization primitive).
Intriguingly, they have noticed that rather using
forkOn instead of
forkIO improves the situation markedly. @wz1000 provided two eventlogs of similar workloads under the
forkOn schemes. Comparing the two, the problem becomes pretty clear:
Here we see a single hover event when running with the
forkOn scheme, where all threads are forked on capability 0:
On the whole it's clear that there is little parallelism available here; we only ever have one core working at any given moment.
Here we see a single hover event with the much less efficient
Zooming in further reveals what is happening:
Here the blue lines here denote thread creations; the red lines denote thread migrations. It seems that every time a Shake action
needs a rule, we kick off a new Haskell thread, push it to another capability's run queue, and (nearly) immediately park the capability we are running on, resulting a "stair-stepping" effect in the eventlog. Between each pair of "stairs" in this eventlog there is an idle period of anywhere from 10 to 500 microseconds.
In text, the events above look like this:
... 41577543192: cap 0: running thread 15902 41577622290: cap 0: creating thread 15903 41577631265: cap 0: stopping thread 15902 (thread finished) 41577632790: cap 0: running thread 15903 41577637059: cap 0: stopping thread 15903 (thread yielding) 41577637660: cap 0: running thread 15903 41577671853: cap 0: creating thread 15904 41577674842: cap 0: stopping thread 15903 (thread finished) 41577675399: cap 0: running thread 15904 41577681143: cap 0: stopping thread 15904 (thread yielding) 41577681726: cap 0: running thread 15904 41577710211: cap 0: creating thread 15905 41577713041: cap 0: stopping thread 15904 (thread yielding) 41577714797: cap 0: migrating thread 15905 to cap 1 41577730170: cap 0: running thread 15904 41577731107: cap 0: stopping thread 15904 (thread finished) 41577788360: cap 1: running thread 15905 ...
To service this one hover request there are likely over 400 such context switches. Assuming an average of 100us per context switch, this would result in nearly 40ms of lost time. Terrible.