More accurate unboxing
This patch implements a fix for #20817. It ensures that * The final strictness signature for a function accurately reflects the unboxing done by the wrapper See Note [Finalising boxity for demand signatures] and Note [Finalising boxity for let-bound Ids] * A much better "layer-at-a-time" implementation of the budget for how many worker arguments we can have See Note [Worker argument budget] Generally this leads to a bit more worker/wrapper generation, because instead of aborting entirely if the budget is exceeded (and then lying about boxity), we unbox a bit. Binary sizes in increase slightly (around 1.8%) because of the increase in worker/wrapper generation. The big effects are to GHC.Ix, GHC.Show, GHC.IO.Handle.Internals. If we did a better job of dropping dead code, this effect might go away. Some nofib perf improvements: Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- VSD +1.8% -0.5% 0.017 0.017 0.0% awards +1.8% -0.1% +2.3% +2.3% 0.0% banner +1.7% -0.2% +0.3% +0.3% 0.0% bspt +1.8% -0.1% +3.1% +3.1% 0.0% eliza +1.8% -0.1% +1.2% +1.2% 0.0% expert +1.7% -0.1% +9.6% +9.6% 0.0% fannkuch-redux +1.8% -0.4% -9.3% -9.3% 0.0% kahan +1.8% -0.1% +22.7% +22.7% 0.0% maillist +1.8% -0.9% +21.2% +21.6% 0.0% nucleic2 +1.7% -5.1% +7.5% +7.6% 0.0% pretty +1.8% -0.2% 0.000 0.000 0.0% reverse-complem +1.8% -2.5% +12.2% +12.2% 0.0% rfib +1.8% -0.2% +2.5% +2.5% 0.0% scc +1.8% -0.4% 0.000 0.000 0.0% simple +1.7% -1.3% +17.0% +17.0% +7.4% spectral-norm +1.8% -0.1% +6.8% +6.7% 0.0% sphere +1.7% -2.0% +13.3% +13.3% 0.0% tak +1.8% -0.2% +3.3% +3.3% 0.0% x2n1 +1.8% -0.4% +8.1% +8.1% 0.0% -------------------------------------------------------------------------------- Min +1.1% -5.1% -23.6% -23.6% 0.0% Max +1.8% +0.0% +36.2% +36.2% +7.4% Geometric Mean +1.7% -0.1% +6.8% +6.8% +0.1% Compiler allocations in CI have a geometric mean of +0.1%; many small decreases but there are three bigger increases (7%), all because we do more worker/wrapper than before, so there is simply more code to compile. That's OK. Perf benchmarks in perf/should_run improve in allocation by a geo mean of -0.2%, which is good. None get worse. T12996 improves by -5.8% Metric Decrease: T12996 Metric Increase: T18282 T18923 T9630
Showing
- compiler/GHC/Core/Opt/DmdAnal.hs 381 additions, 27 deletionscompiler/GHC/Core/Opt/DmdAnal.hs
- compiler/GHC/Core/Opt/Pipeline.hs 3 additions, 2 deletionscompiler/GHC/Core/Opt/Pipeline.hs
- compiler/GHC/Core/Opt/WorkWrap.hs 46 additions, 25 deletionscompiler/GHC/Core/Opt/WorkWrap.hs
- compiler/GHC/Core/Opt/WorkWrap/Utils.hs 12 additions, 322 deletionscompiler/GHC/Core/Opt/WorkWrap/Utils.hs
- compiler/GHC/Types/Demand.hs 9 additions, 3 deletionscompiler/GHC/Types/Demand.hs
- testsuite/tests/stranal/should_compile/T20817.hs 8 additions, 0 deletionstestsuite/tests/stranal/should_compile/T20817.hs
- testsuite/tests/stranal/should_compile/T20817.stderr 358 additions, 0 deletionstestsuite/tests/stranal/should_compile/T20817.stderr
- testsuite/tests/stranal/should_compile/all.T 2 additions, 1 deletiontestsuite/tests/stranal/should_compile/all.T
- testsuite/tests/stranal/sigs/T19871.stderr 4 additions, 4 deletionstestsuite/tests/stranal/sigs/T19871.stderr
Loading
Please register or sign in to comment