Simplifier improvements
This MR started as: allow the simplifer to do more in one pass, arising from places I could see the simplifier taking two iterations where one would do. But it turned into a larger project, because these changes unexpectedly made inlining blow up, especially join points in deeply-nested cases. The main changes are below. There are also many new or rewritten Notes. Avoiding simplifying repeatedly ~~~~~~~~~~~~~~~ See Note [Avoiding simplifying repeatedly] * The SimplEnv now has a seInlineDepth field, which says how deep in unfoldings we are. See Note [Inline depth] in Simplify.Env. Currently used only for the next point: avoiding repeatedly simplifying coercions. * Avoid repeatedly simplifying coercions. see Note [Avoid re-simplifying coercions] in Simplify.Iteration As you'll see from the Note, this makes use of the seInlineDepth. * Allow Simplify.Iteration.simplAuxBind to inline used-once things. This is another part of Note [Post-inline for single-use things], and is really good for reducing simplifier iterations in situations like case K e of { K x -> blah } wher x is used once in blah. * Make GHC.Core.SimpleOpt.exprIsConApp_maybe do some simple case elimination. Note [Case elim in exprIsConApp_maybe] * Improve the case-merge transformation: - Move the main code to `GHC.Core.Utils.mergeCaseAlts`, to join `filterAlts` and friends. See Note [Merge Nested Cases] in GHC.Core.Utils. - Add a new case for `tagToEnum#`; see wrinkle (MC3). - Add a new case to look through join points: see wrinkle (MC4) postInlineUnconditionally ~~~~~~~~~~~~~~~~~~~~~~~~~ * Allow Simplify.Utils.postInlineUnconditionally to inline variables that are used exactly once. See Note [Post-inline for single-use things]. * Do not postInlineUnconditionally join point, ever. Doing so does not reduce allocation, which is the main point, and with join points that are used a lot it can bloat code. See point (1) of Note [Duplicating join points] in GHC.Core.Opt.Simplify.Iteration. * Do not postInlineUnconditionally a strict (demanded) binding. It will not allocate a thunk (it'll turn into a case instead) so again the main point of inlining it doesn't hold. Better to check per-call-site. * Improve occurrence analyis for bottoming function calls, to help postInlineUnconditionally. See Note [Bottoming function calls] in GHC.Core.Opt.OccurAnal Inlining generally ~~~~~~~~~~~~~~~~~~ * In GHC.Core.Opt.Simplify.Utils.interestingCallContext, use RhsCtxt NonRecursive (not BoringCtxt) for a plain-seq case. See Note [Seq is boring] Also, wrinkle (SB1), inline in that `seq` context only for INLINE functions (UnfWhen guidance). * In GHC.Core.Opt.Simplify.Utils.interestingArg, - return ValueArg for OtherCon [c1,c2, ...], but - return NonTrivArg for OtherCon [] This makes a function a little less likely to inline if all we know is that the argument is evaluated, but nothing else. * isConLikeUnfolding is no longer true for OtherCon {}. This propagates to exprIsConLike. Con-like-ness has /positive/ information. Join points ~~~~~~~~~~~ * Be very careful about inlining join points. See these two long Notes Note [Duplicating join points] in GHC.Core.Opt.Simplify.Iteration Note [Inlining join points] in GHC.Core.Opt.Simplify.Inline * When making join points, don't do so if the join point is so small it will immediately be inlined; check uncondInlineJoin. * In GHC.Core.Opt.Simplify.Inline.tryUnfolding, improve the inlining heuristics for join points. In general we /do not/ want to inline join points /even if they are small/. See Note [Duplicating join points] GHC.Core.Opt.Simplify.Iteration. But sometimes we do: see Note [Inlining join points] in GHC.Core.Opt.Simplify.Inline; and the new `isBetterUnfoldingThan` function. * Do not add an unfolding to a join point at birth. This is a tricky one and has a long Note [Do not add unfoldings to join points at birth] It shows up in two places - In `mkDupableAlt` do not add an inlining - (trickier) In `simplLetUnfolding` don't add an unfolding for a fresh join point I am not fully satisifed with this, but it works and is well documented. * In GHC.Core.Unfold.sizeExpr, make jumps small, so that we don't penalise having a non-inlined join point. Performance changes ~~~~~~~~~~~~~~~~~~~ * Binary sizes fall by around 2.6%, according to nofib. * Compile times improve slightly. Here are the figures over 1%. I investiate the biggest differnce in T18304. It's a very small module, just a few hundred nodes. The large percentage difffence is due to a single function that didn't quite inline before, and does now, making code size a bit bigger. I decided gains outweighed the losses. Metrics: compile_time/bytes allocated (changes over +/- 1%) ------------------------------------------------ CoOpt_Singletons(normal) -9.2% GOOD LargeRecord(normal) -23.5% GOOD MultiComponentModulesRecomp(normal) +1.2% MultiLayerModulesTH_OneShot(normal) +4.1% BAD PmSeriesS(normal) -3.8% PmSeriesV(normal) -1.5% T11195(normal) -1.3% T12227(normal) -20.4% GOOD T12545(normal) -3.2% T12707(normal) -2.1% GOOD T13253(normal) -1.2% T13253-spj(normal) +8.1% BAD T13386(normal) -3.1% GOOD T14766(normal) -2.6% GOOD T15164(normal) -1.4% T15304(normal) +1.2% T15630(normal) -8.2% T15630a(normal) NEW T15703(normal) -14.7% GOOD T16577(normal) -2.3% GOOD T17516(normal) -39.7% GOOD T18140(normal) +1.2% T18223(normal) -17.1% GOOD T18282(normal) -5.0% GOOD T18304(normal) +10.8% BAD T18923(normal) -2.9% GOOD T1969(normal) +1.0% T19695(normal) -1.5% T20049(normal) -12.7% GOOD T21839c(normal) -4.1% GOOD T3064(normal) -1.5% T3294(normal) +1.2% BAD T4801(normal) +1.2% T5030(normal) -15.2% GOOD T5321Fun(normal) -2.2% GOOD T6048(optasm) -16.8% GOOD T783(normal) -1.2% T8095(normal) -6.0% GOOD T9630(normal) -4.7% GOOD T9961(normal) +1.9% BAD WWRec(normal) -1.4% info_table_map_perf(normal) -1.3% parsing001(normal) +1.5% geo. mean -2.0% minimum -39.7% maximum +10.8% * Runtimes generally improve. In the testsuite perf/should_run gives: Metrics: runtime/bytes allocated ------------------------------------------ Conversions(normal) -0.3% T13536a(optasm) -41.7% GOOD T4830(normal) -0.1% haddock.Cabal(normal) -0.1% haddock.base(normal) -0.1% haddock.compiler(normal) -0.1% geo. mean -0.8% minimum -41.7% maximum +0.0% * For runtime, nofib is a better test. The news is mostly good. Here are the number more than +/- 0.1%: # bytes allocated ==========================++========== imaginary/digits-of-e1 || -14.40% imaginary/digits-of-e2 || -4.41% imaginary/paraffins || -0.17% imaginary/rfib || -0.15% imaginary/wheel-sieve2 || -0.10% real/compress || -0.47% real/fluid || -0.10% real/fulsom || +0.14% real/gamteb || -1.47% real/gg || -0.20% real/infer || +0.24% real/pic || -0.23% real/prolog || -0.36% real/scs || -0.46% real/smallpt || +4.03% shootout/k-nucleotide || -20.23% shootout/n-body || -0.42% shootout/spectral-norm || -0.13% spectral/boyer2 || -3.80% spectral/constraints || -0.27% spectral/hartel/ida || -0.82% spectral/mate || -20.34% spectral/para || +0.46% spectral/rewrite || +1.30% spectral/sphere || -0.14% ==========================++========== geom mean || -0.59% real/smallpt has a huge nest of local definitions, and I could not pin down a reason for a regression. But there are three big wins! Metric Decrease: CoOpt_Singletons LargeRecord T12227 T12707 T13386 T13536a T14766 T15703 T16577 T17516 T18223 T18282 T18923 T21839c T20049 T5321Fun T5030 T6048 T8095 T9630 T783 Metric Increase: MultiLayerModulesTH_OneShot T13253-spj T18304 T18698a T9961 T3294
Showing
- compiler/GHC/Core.hs 42 additions, 2 deletionscompiler/GHC/Core.hs
- compiler/GHC/Core/Coercion/Opt.hs 4 additions, 4 deletionscompiler/GHC/Core/Coercion/Opt.hs
- compiler/GHC/Core/Opt/OccurAnal.hs 23 additions, 2 deletionscompiler/GHC/Core/Opt/OccurAnal.hs
- compiler/GHC/Core/Opt/Simplify.hs 2 additions, 1 deletioncompiler/GHC/Core/Opt/Simplify.hs
- compiler/GHC/Core/Opt/Simplify/Env.hs 40 additions, 11 deletionscompiler/GHC/Core/Opt/Simplify/Env.hs
- compiler/GHC/Core/Opt/Simplify/Inline.hs 261 additions, 64 deletionscompiler/GHC/Core/Opt/Simplify/Inline.hs
- compiler/GHC/Core/Opt/Simplify/Iteration.hs 353 additions, 198 deletionscompiler/GHC/Core/Opt/Simplify/Iteration.hs
- compiler/GHC/Core/Opt/Simplify/Utils.hs 226 additions, 306 deletionscompiler/GHC/Core/Opt/Simplify/Utils.hs
- compiler/GHC/Core/Opt/Stats.hs 7 additions, 0 deletionscompiler/GHC/Core/Opt/Stats.hs
- compiler/GHC/Core/Unfold.hs 36 additions, 13 deletionscompiler/GHC/Core/Unfold.hs
- compiler/GHC/Core/Unfold/Make.hs 7 additions, 5 deletionscompiler/GHC/Core/Unfold/Make.hs
- compiler/GHC/Core/Utils.hs 233 additions, 3 deletionscompiler/GHC/Core/Utils.hs
- compiler/GHC/IfaceToCore.hs 1 addition, 1 deletioncompiler/GHC/IfaceToCore.hs
- compiler/GHC/Types/Id.hs 7 additions, 1 deletioncompiler/GHC/Types/Id.hs
- compiler/GHC/Types/Tickish.hs 2 additions, 0 deletionscompiler/GHC/Types/Tickish.hs
Loading
Please register or sign in to comment