Long CPU-IDLE pauses during -threaded default (moving, parallel) GC
Summary
A picture summarizes it best.
In words: during a rather modest load test, I observe frequent tens-of-milliseconds long GCs which aren't even CPU-intensive ("GC waiting" status in threadscope).
Steps to reproduce
The test is currently far from trivial, and probably deserves another iteration or two to minimize it.
There're 4 moving parts:
- a PostgreSQL database (version Debian 11.8-1.pgdg90+1),
- a PostgREST 7.0.1 server (the profiling target),
- an HTTP load generator,
- a custom
bpftrace
script for profile-sampling at GC time only.
I dropped all the scripts I use, and some of the results into this gist.
So, the rough recipe for repro:
git clone https://gist.github.com/ulidtko/b5821533af6f760cde58bd1d0dcefce8 GHC#18415
cd GHC#18415
less README.md # quick pointers to set up the parts
sudo -v; RPS=1000 DURATION=20s ./run.sh
These long idle pauses reproduce rather reliably, even at 200 RPS load. At 1000 RPS load (Thinkpad laptop, 4/5 saturated CPUs at this rate) — these become really bad; in one instance, I saw 500ms GC pause like this.
Expected behavior
To state the obvious, GC threads can of course use whatever CPU is required — but not be IDLE, all 4 [in this case] at the same time, for extended periods like the observed tens of milliseconds+.
Environment
Linux x86_64, GHC:
- 8.10.1
- 8.8.3, too (virtually the same results; doesn't look like a regression)
cc @bgamari