Lessons to be learned from the paper: Investigating Magic Numbers: Improving the Inlining Heuristic in the Glasgow Haskell Compiler
The paper Investigating Magic Numbers: Improving the Inlining Heuristic in the Glasgow Haskell Compiler by Hollenbeck, O'Boyle, and Steuwer (Haskell Symposium 2022) applied search to the parameter space of the inliner and seems to have found quite impressive improvements.
In particular the paper found a set of inlining magic numbers which achieves a 12% speedup over the defaults of ghc-8.10.3 for the testsuites of 10 particular packages.
To me the obvious questions not answered in the paper are:
-
Compile Times. There is no mention of compile times at all in the paper. It's generally accepted among ghc devs that being more lenient with inlining would improve runtime performance at the cost of increased compile time. If we want to change the set of magic inlining numbers we need to characterize not just the runtime but also compile time differences and make a informed decision based on the full picture.
-
nofib vs package tests. The paper heavily criticizes nofib on a number of points (some warranted, some not) and comes up with quite a clever alternative to benchmarking. I'm not sure if it's feasible to maintain a benchmark suite based on the principles in the paper for ghc.
It would be quite informative to compare the results between their benchmarking approach and nofib. At the very least this should give us a good idea how stable these results are between different ways of evaluation.
If nofib and these packages give use very different results the next question would be if this set of packages is more (or less) representative of typical haskell code. But that's a whole other question which isn't easy to answer at all.
-
Reproduction. In general it would be good to try to reproduce their results at least on nofib with one or multiple recent versions of ghc. But reproducing their methodology with newer versions of GHC would also be nice. Things often change between versions and it would be interesting to see how stable these results are between ghc and maybe even package versions and if they are still as beneficial for 9.4/master.