Skip to content
  • Simon Peyton Jones's avatar
    Simplifier improvements · e026bdf2
    Simon Peyton Jones authored and Marge Bot's avatar Marge Bot committed
    This MR started as: allow the simplifer to do more in one pass,
    arising from places I could see the simplifier taking two iterations
    where one would do.  But it turned into a larger project, because
    these changes unexpectedly made inlining blow up, especially join
    points in deeply-nested cases.
    
    The main changes are below.  There are also many new or rewritten Notes.
    
    Avoiding simplifying repeatedly
    ~~~~~~~~~~~~~~~
    See Note [Avoiding simplifying repeatedly]
    
    * The SimplEnv now has a seInlineDepth field, which says how deep
      in unfoldings we are.  See Note [Inline depth] in Simplify.Env.
      Currently used only for the next point: avoiding repeatedly
      simplifying coercions.
    
    * Avoid repeatedly simplifying coercions.
      see Note [Avoid re-simplifying coercions] in Simplify.Iteration
      As you'll see from the Note, this makes use of the seInlineDepth.
    
    * Allow Simplify.Iteration.simplAuxBind to inline used-once things.
      This is another part of Note [Post-inline for single-use things], and
      is really good for reducing simplifier iterations in situations like
          case K e of { K x -> blah }
      wher x is used once in blah.
    
    * Make GHC.Core.SimpleOpt.exprIsConApp_maybe do some simple case
      elimination.  Note [Case elim in exprIsConApp_maybe]
    
    * Improve the case-merge transformation:
      - Move the main code to `GHC.Core.Utils.mergeCaseAlts`, to join `filterAlts`
        and friends.  See Note [Merge Nested Cases] in GHC.Core.Utils.
      - Add a new case for `tagToEnum#`; see wrinkle (MC3).
      - Add a new case to look through join points: see wrinkle (MC4)
    
    postInlineUnconditionally
    ~~~~~~~~~~~~~~~~~~~~~~~~~
    * Allow Simplify.Utils.postInlineUnconditionally to inline variables
      that are used exactly once. See Note [Post-inline for single-use things].
    
    * Do not postInlineUnconditionally join point, ever.
      Doing so does not reduce allocation, which is the main point,
      and with join points that are used a lot it can bloat code.
      See point (1) of Note [Duplicating join points] in
      GHC.Core.Opt.Simplify.Iteration.
    
    * Do not postInlineUnconditionally a strict (demanded) binding.
      It will not allocate a thunk (it'll turn into a case instead)
      so again the main point of inlining it doesn't hold.  Better
      to check per-call-site.
    
    * Improve occurrence analyis for bottoming function calls, to help
      postInlineUnconditionally.  See Note [Bottoming function calls]
      in GHC.Core.Opt.OccurAnal
    
    Inlining generally
    ~~~~~~~~~~~~~~~~~~
    * In GHC.Core.Opt.Simplify.Utils.interestingCallContext,
      use RhsCtxt NonRecursive (not BoringCtxt) for a plain-seq case.
      See Note [Seq is boring]  Also, wrinkle (SB1), inline in that
      `seq` context only for INLINE functions (UnfWhen guidance).
    
    * In GHC.Core.Opt.Simplify.Utils.interestingArg,
      - return ValueArg for OtherCon [c1,c2, ...], but
      - return NonTrivArg for OtherCon []
      This makes a function a little less likely to inline if all we
      know is that the argument is evaluated, but nothing else.
    
    * isConLikeUnfolding is no longer true for OtherCon {}.
      This propagates to exprIsConLike.  Con-like-ness has /positive/
      information.
    
    Join points
    ~~~~~~~~~~~
    * Be very careful about inlining join points.
      See these two long Notes
        Note [Duplicating join points] in GHC.Core.Opt.Simplify.Iteration
        Note [Inlining join points] in GHC.Core.Opt.Simplify.Inline
    
    * When making join points, don't do so if the join point is so small
      it will immediately be inlined; check uncondInlineJoin.
    
    * In GHC.Core.Opt.Simplify.Inline.tryUnfolding, improve the inlining
      heuristics for join points. In general we /do not/ want to inline
      join points /even if they are small/.  See Note [Duplicating join points]
      GHC.Core.Opt.Simplify.Iteration.
    
      But sometimes we do: see Note [Inlining join points] in
      GHC.Core.Opt.Simplify.Inline; and the new `isBetterUnfoldingThan` function.
    
    * Do not add an unfolding to a join point at birth.  This is a tricky one
      and has a long Note [Do not add unfoldings to join points at birth]
      It shows up in two places
      - In `mkDupableAlt` do not add an inlining
      - (trickier) In `simplLetUnfolding` don't add an unfolding for a
        fresh join point
      I am not fully satisifed with this, but it works and is well documented.
    
    * In GHC.Core.Unfold.sizeExpr, make jumps small, so that we don't penalise
      having a non-inlined join point.
    
    Performance changes
    ~~~~~~~~~~~~~~~~~~~
    * Binary sizes fall by around 2.6%, according to nofib.
    
    * Compile times improve slightly. Here are the figures over 1%.
    
      I investiate the biggest differnce in T18304. It's a very small module, just
      a few hundred nodes. The large percentage difffence is due to a single
      function that didn't quite inline before, and does now, making code size a
      bit bigger.  I decided gains outweighed the losses.
    
        Metrics: compile_time/bytes allocated (changes over +/- 1%)
        ------------------------------------------------
               CoOpt_Singletons(normal)   -9.2% GOOD
                    LargeRecord(normal)  -23.5% GOOD
    MultiComponentModulesRecomp(normal)   +1.2%
    MultiLayerModulesTH_OneShot(normal)   +4.1%  BAD
                      PmSeriesS(normal)   -3.8%
                      PmSeriesV(normal)   -1.5%
                         T11195(normal)   -1.3%
                         T12227(normal)  -20.4% GOOD
                         T12545(normal)   -3.2%
                         T12707(normal)   -2.1% GOOD
                         T13253(normal)   -1.2%
                     T13253-spj(normal)   +8.1%  BAD
                         T13386(normal)   -3.1% GOOD
                         T14766(normal)   -2.6% GOOD
                         T15164(normal)   -1.4%
                         T15304(normal)   +1.2%
                         T15630(normal)   -8.2%
                        T15630a(normal)          NEW
                         T15703(normal)  -14.7% GOOD
                         T16577(normal)   -2.3% GOOD
                         T17516(normal)  -39.7% GOOD
                         T18140(normal)   +1.2%
                         T18223(normal)  -17.1% GOOD
                         T18282(normal)   -5.0% GOOD
                         T18304(normal)  +10.8%  BAD
                         T18923(normal)   -2.9% GOOD
                          T1969(normal)   +1.0%
                         T19695(normal)   -1.5%
                         T20049(normal)  -12.7% GOOD
                        T21839c(normal)   -4.1% GOOD
                          T3064(normal)   -1.5%
                          T3294(normal)   +1.2%  BAD
                          T4801(normal)   +1.2%
                          T5030(normal)  -15.2% GOOD
                       T5321Fun(normal)   -2.2% GOOD
                          T6048(optasm)  -16.8% GOOD
                           T783(normal)   -1.2%
                          T8095(normal)   -6.0% GOOD
                          T9630(normal)   -4.7% GOOD
                          T9961(normal)   +1.9%  BAD
                          WWRec(normal)   -1.4%
            info_table_map_perf(normal)   -1.3%
                     parsing001(normal)   +1.5%
    
                              geo. mean   -2.0%
                              minimum    -39.7%
                              maximum    +10.8%
    
    * Runtimes generally improve. In the testsuite perf/should_run gives:
       Metrics: runtime/bytes allocated
       ------------------------------------------
                 Conversions(normal)   -0.3%
                     T13536a(optasm)  -41.7% GOOD
                       T4830(normal)   -0.1%
               haddock.Cabal(normal)   -0.1%
                haddock.base(normal)   -0.1%
            haddock.compiler(normal)   -0.1%
    
                           geo. mean   -0.8%
                           minimum    -41.7%
                           maximum     +0.0%
    
    * For runtime, nofib is a better test.  The news is mostly good.
      Here are the number more than +/- 0.1%:
    
        # bytes allocated
        ==========================++==========
           imaginary/digits-of-e1 ||  -14.40%
           imaginary/digits-of-e2 ||   -4.41%
              imaginary/paraffins ||   -0.17%
                   imaginary/rfib ||   -0.15%
           imaginary/wheel-sieve2 ||   -0.10%
                    real/compress ||   -0.47%
                       real/fluid ||   -0.10%
                      real/fulsom ||   +0.14%
                      real/gamteb ||   -1.47%
                          real/gg ||   -0.20%
                       real/infer ||   +0.24%
                         real/pic ||   -0.23%
                      real/prolog ||   -0.36%
                         real/scs ||   -0.46%
                     real/smallpt ||   +4.03%
            shootout/k-nucleotide ||  -20.23%
                  shootout/n-body ||   -0.42%
           shootout/spectral-norm ||   -0.13%
                  spectral/boyer2 ||   -3.80%
             spectral/constraints ||   -0.27%
              spectral/hartel/ida ||   -0.82%
                    spectral/mate ||  -20.34%
                    spectral/para ||   +0.46%
                 spectral/rewrite ||   +1.30%
                  spectral/sphere ||   -0.14%
        ==========================++==========
                        geom mean ||   -0.59%
    
        real/smallpt has a huge nest of local definitions, and I
        could not pin down a reason for a regression.  But there are
        three big wins!
    
    Metric Decrease:
        CoOpt_Singletons
        LargeRecord
        T12227
        T12707
        T13386
        T13536a
        T14766
        T15703
        T16577
        T17516
        T18223
        T18282
        T18923
        T21839c
        T20049
        T5321Fun
        T5030
        T6048
        T8095
        T9630
        T783
    Metric Increase:
        MultiLayerModulesTH_OneShot
        T13253-spj
        T18304
        T18698a
        T9961
        T3294
    e026bdf2