Stabilise nofib runtime measurements
With D4989 (cf. #15357 (closed)) having hit nofib
master, there are still many benchmarks that are unstable in one way or another. I identified three causes for unstability in #5793##15999 (closed). With system overhead mostly out of the equation, there are still two related tasks left:
- Identify benchmarks with GC wibbles. Plan: Look at how productivity rate changes while increasing gen 0 heap size. A GC-sensitive benchmark should have a non-monotonic or discontinuous productivity-rate-over-nursery-size curve. Then fix these by iterating
main
often enough for the curve to become smooth and monotone. - Now, all benchmarks should have monotonically decreasing instruction count for increasing nursery sizes. If not, maybe there's another class of benchmarks I didn't identify yet in #5793. Of these benchmarks, there are a few, like
real/eff/CS
, that still have highly code layout-sensitive runtimes. Fix these 'microbenchmarks' by hiding them behind a flag.