Skip to content
Snippets Groups Projects
  1. Feb 25, 2024
  2. Jan 13, 2024
  3. Sep 29, 2022
  4. Sep 28, 2022
    • Simon Peyton Jones's avatar
      Improve aggressive specialisation · 2a53ac18
      Simon Peyton Jones authored and Marge Bot's avatar Marge Bot committed
      This patch fixes #21286, by not unboxing dictionaries in
      worker/wrapper (ever). The main payload is tiny:
      
      * In `GHC.Core.Opt.DmdAnal.finaliseArgBoxities`, do not unbox
        dictionaries in `get_dmd`.  See Note [Do not unbox class dictionaries]
        in that module
      
      * I also found that imported wrappers were being fruitlessly
        specialised, so I fixed that too, in canSpecImport.
        See Note [Specialising imported functions] point (2).
      
      In doing due diligence in the testsuite I fixed a number of
      other things:
      
      * Improve Note [Specialising unfoldings] in GHC.Core.Unfold.Make,
        and Note [Inline specialisations] in GHC.Core.Opt.Specialise,
        and remove duplication between the two. The new Note describes
        how we specialise functions with an INLINABLE pragma.
      
        And simplify the defn of `spec_unf` in `GHC.Core.Opt.Specialise.specCalls`.
      
      * Improve Note [Worker/wrapper for INLINABLE functions] in
        GHC.Core.Opt.WorkWrap.
      
        And (critially) make an actual change which is to propagate the
        user-written pragma from the original function to the wrapper; see
        `mkStrWrapperInlinePrag`.
      
      * Write new Note [Specialising imported functions] in
        GHC.Core.Opt.Specialise
      
      All this has a big effect on some compile times. This is
      compiler/perf, showing only changes over 1%:
      
      Metrics: compile_time/bytes allocated
      -------------------------------------
                      LargeRecord(normal)  -50.2% GOOD
                 ManyConstructors(normal)   +1.0%
      MultiLayerModulesTH_OneShot(normal)   +2.6%
                        PmSeriesG(normal)   -1.1%
                           T10547(normal)   -1.2%
                           T11195(normal)   -1.2%
                           T11276(normal)   -1.0%
                          T11303b(normal)   -1.6%
                           T11545(normal)   -1.4%
                           T11822(normal)   -1.3%
                           T12150(optasm)   -1.0%
                           T12234(optasm)   -1.2%
                           T13056(optasm)   -9.3% GOOD
                           T13253(normal)   -3.8% GOOD
                           T15164(normal)   -3.6% GOOD
                           T16190(normal)   -2.1%
                           T16577(normal)   -2.8% GOOD
                           T16875(normal)   -1.6%
                           T17836(normal)   +2.2%
                          T17977b(normal)   -1.0%
                           T18223(normal)  -33.3% GOOD
                           T18282(normal)   -3.4% GOOD
                           T18304(normal)   -1.4%
                          T18698a(normal)   -1.4% GOOD
                          T18698b(normal)   -1.3% GOOD
                           T19695(normal)   -2.5% GOOD
                            T5837(normal)   -2.3%
                            T9630(normal)  -33.0% GOOD
                            WWRec(normal)   -9.7% GOOD
                   hard_hole_fits(normal)   -2.1% GOOD
                           hie002(normal)   +1.6%
      
                                geo. mean   -2.2%
                                minimum    -50.2%
                                maximum     +2.6%
      
      I diligently investigated some of the big drops.
      
      * Caused by not doing w/w for dictionaries:
          T13056, T15164, WWRec, T18223
      
      * Caused by not fruitlessly specialising wrappers
          LargeRecord, T9630
      
      For runtimes, here is perf/should+_run:
      
      Metrics: runtime/bytes allocated
      --------------------------------
                     T12990(normal)   -3.8%
                      T5205(normal)   -1.3%
                      T9203(normal)  -10.7% GOOD
              haddock.Cabal(normal)   +0.1%
               haddock.base(normal)   -1.1%
           haddock.compiler(normal)   -0.3%
              lazy-bs-alloc(normal)   -0.2%
      ------------------------------------------
                          geo. mean   -0.3%
                          minimum    -10.7%
                          maximum     +0.1%
      
      I did not investigate exactly what happens in T9203.
      
      Nofib is a wash:
      
      +-------------------------------++--+-----------+-----------+
      |                               ||  | tsv (rel) | std. err. |
      +===============================++==+===========+===========+
      |                     real/anna ||  |    -0.13% |      0.0% |
      |                      real/fem ||  |    +0.13% |      0.0% |
      |                   real/fulsom ||  |    -0.16% |      0.0% |
      |                     real/lift ||  |    -1.55% |      0.0% |
      |                  real/reptile ||  |    -0.11% |      0.0% |
      |                  real/smallpt ||  |    +0.51% |      0.0% |
      |          spectral/constraints ||  |    +0.20% |      0.0% |
      |               spectral/dom-lt ||  |    +1.80% |      0.0% |
      |               spectral/expert ||  |    +0.33% |      0.0% |
      +===============================++==+===========+===========+
      |                     geom mean ||  |           |           |
      +-------------------------------++--+-----------+-----------+
      
      I spent quite some time investigating dom-lt, but it's pretty
      complicated.  See my note on !7847.  Conclusion: it's just a delicate
      inlining interaction, and we have plenty of those.
      
      Metric Decrease:
          LargeRecord
          T13056
          T13253
          T15164
          T16577
          T18223
          T18282
          T18698a
          T18698b
          T19695
          T9630
          WWRec
          hard_hole_fits
          T9203
      2a53ac18
  5. Sep 27, 2022
    • Sebastian Graf's avatar
      Demand: Clear distinction between Call SubDmd and eval Dmd (#21717) · aeafdba5
      Sebastian Graf authored
      In #21717 we saw a reportedly unsound strictness signature due to an unsound
      definition of plusSubDmd on Calls. This patch contains a description and the fix
      to the unsoundness as outlined in `Note [Call SubDemand vs. evaluation Demand]`.
      
      This fix means we also get rid of the special handling of `-fpedantic-bottoms`
      in eta-reduction. Thanks to less strict and actually sound strictness results,
      we will no longer eta-reduce the problematic cases in the first place, even
      without `-fpedantic-bottoms`.
      
      So fixing the unsoundness also makes our eta-reduction code simpler with less
      hacks to explain. But there is another, more unfortunate side-effect:
      We *unfix* #21085, but fortunately we have a new fix ready:
      See `Note [mkCall and plusSubDmd]`.
      
      There's another change:
      I decided to make `Note [SubDemand denotes at least one evaluation]` a lot
      simpler by using `plusSubDmd` (instead of `lubPlusSubDmd`) even if both argument
      demands are lazy. That leads to less precise results, but in turn rids ourselves
      from the need for 4 different `OpMode`s and the complication of
      `Note [Manual specialisation of lub*Dmd/plus*Dmd]`. The result is simpler code
      that is in line with the paper draft on Demand Analysis.
      
      I left the abandoned idea in `Note [Unrealised opportunity in plusDmd]` for
      posterity. The fallout in terms of regressions is negligible, as the testsuite
      and NoFib shows.
      
      ```
              Program         Allocs    Instrs
      --------------------------------------------------------------------------------
               hidden          +0.2%     -0.2%
               linear          -0.0%     -0.7%
      --------------------------------------------------------------------------------
                  Min          -0.0%     -0.7%
                  Max          +0.2%     +0.0%
       Geometric Mean          +0.0%     -0.0%
      ```
      
      Fixes #21717.
      aeafdba5
  6. Aug 25, 2022
    • Simon Peyton Jones's avatar
      Fix arityType: -fpedantic-bottoms, join points, etc · a90298cc
      Simon Peyton Jones authored
      This MR fixes #21694, #21755.  It also makes sure that #21948 and
      fix to #21694.
      
      * For #21694 the underlying problem was that we were calling arityType
        on an expression that had free join points.  This is a Bad Bad Idea.
        See Note [No free join points in arityType].
      
      * To make "no free join points in arityType" work out I had to avoid
        trying to use eta-expansion for runRW#. This entailed a few changes
        in the Simplifier's treatment of runRW#.  See
        GHC.Core.Opt.Simplify.Iteration Note [No eta-expansion in runRW#]
      
      * I also made andArityType work correctly with -fpedantic-bottoms;
        see Note [Combining case branches: andWithTail].
      
      * Rewrote Note [Combining case branches: optimistic one-shot-ness]
      
      * arityType previously treated join points differently to other
        let-bindings. This patch makes them unform; arityType analyses
        the RHS of all bindings to get its ArityType, and extends am_sigs.
      
        I realised that, now we have am_sigs giving the ArityType for
        let-bound Ids, we don't need the (pre-dating) special code in
        arityType for join points. But instead we need to extend the env for
        Rec bindings, which weren't doing before.  More uniform now.  See
        Note [arityType for let-bindings].
      
        This meant we could get rid of ae_joins, and in fact get rid of
        EtaExpandArity altogether.  Simpler.
      
      * And finally, it was the strange treatment of join-point Ids in
        arityType (involving a fake ABot type) that led to a serious bug:
        #21755.  Fixed by this refactoring, which treats them uniformly;
        but without breaking #18328.
      
        In fact, the arity for recursive join bindings is pretty tricky;
        see the long Note [Arity for recursive join bindings]
        in GHC.Core.Opt.Simplify.Utils.  That led to more refactoring,
        including deciding that an Id could have an Arity that is bigger
        than its JoinArity; see Note [Invariants on join points], item
        2(b) in GHC.Core
      
      * Make sure that the "demand threshold" for join points in DmdAnal
        is no bigger than the join-arity.  In GHC.Core.Opt.DmdAnal see
        Note [Demand signatures are computed for a threshold arity based on idArity]
      
      * I moved GHC.Core.Utils.exprIsDeadEnd into GHC.Core.Opt.Arity,
        where it more properly belongs.
      
      * Remove an old, redundant hack in FloatOut.  The old Note was
        Note [Bottoming floats: eta expansion] in GHC.Core.Opt.SetLevels.
      
      Compile time improves very slightly on average:
      
      Metrics: compile_time/bytes allocated
      ---------------------------------------------------------------------------------------
        T18223(normal) ghc/alloc    725,808,720    747,839,216  +3.0%  BAD
        T6048(optasm)  ghc/alloc    105,006,104    101,599,472  -3.2% GOOD
        geo. mean                                          -0.2%
        minimum                                            -3.2%
        maximum                                            +3.0%
      
      For some reason Windows was better
      
         T10421(normal) ghc/alloc    125,888,360    124,129,168  -1.4% GOOD
         T18140(normal) ghc/alloc     85,974,520     83,884,224  -2.4% GOOD
        T18698b(normal) ghc/alloc    236,764,568    234,077,288  -1.1% GOOD
         T18923(normal) ghc/alloc     75,660,528     73,994,512  -2.2% GOOD
          T6048(optasm) ghc/alloc    112,232,512    108,182,520  -3.6% GOOD
        geo. mean                                          -0.6%
      
      I had a quick look at T18223 but it is knee deep in coercions and
      the size of everything looks similar before and after.  I decided
      to accept that 3% increase in exchange for goodness elsewhere.
      
      Metric Decrease:
          T10421
          T18140
          T18698b
          T18923
          T6048
      
      Metric Increase:
          T18223
      a90298cc
  7. Jun 27, 2022
    • Andreas Klebinger's avatar
      Don't mark lambda binders as OtherCon · ac7a7fc8
      Andreas Klebinger authored and Marge Bot's avatar Marge Bot committed
      We used to put OtherCon unfoldings on lambda binders of workers
      and sometimes also join points/specializations with with the
      assumption that since the wrapper would force these arguments
      once we execute the RHS they would indeed be in WHNF.
      
      This was wrong for reasons detailed in #21472. So now we purge
      evaluated unfoldings from *all* lambda binders.
      
      This fixes #21472, but at the cost of sometimes not using as efficient a
      calling convention. It can also change inlining behaviour as some
      occurances will no longer look like value arguments when they did
      before.
      
      As consequence we also change how we compute CBV information for
      arguments slightly. We now *always* determine the CBV convention
      for arguments during tidy. Earlier in the pipeline we merely mark
      functions as candidates for having their arguments treated as CBV.
      
      As before the process is described in the relevant notes:
      Note [CBV Function Ids]
      Note [Attaching CBV Marks to ids]
      Note [Never put `OtherCon` unfoldigns on lambda binders]
      
      -------------------------
      Metric Decrease:
          T12425
          T13035
          T18223
          T18223
          T18923
          MultiLayerModulesTH_OneShot
      Metric Increase:
          WWRec
      -------------------------
      ac7a7fc8
  8. Jun 20, 2022
    • Sebastian Graf's avatar
      Simplify: Take care with eta reduction in recursive RHSs (#21652) · 49fb2f9b
      Sebastian Graf authored
      Similar to the fix to #20836 in CorePrep, we now track the set of enclosing
      recursive binders in the SimplEnv and SimpleOptEnv.
      See Note [Eta reduction in recursive RHSs] for details.
      
      I also updated Note [Arity robustness] with the insights Simon and I had in a
      call discussing the issue.
      
      Fixes #21652.
      
      Unfortunately, we get a 5% ghc/alloc regression in T16577. That is due to
      additional eta reduction in GHC.Read.choose1 and the resulting ANF-isation
      of a large list literal at the top-level that didn't happen before (presumably
      because it was too interesting to float to the top-level). There's not much we
      can do about that.
      
      Metric Increase:
          T16577
      49fb2f9b
  9. May 30, 2022
    • Simon Peyton Jones's avatar
      A bunch of changes related to eta reduction · 6656f016
      Simon Peyton Jones authored and Marge Bot's avatar Marge Bot committed
      This is a large collection of changes all relating to eta
      reduction, originally triggered by #18993, but there followed
      a long saga.
      
      Specifics:
      
      * Move state-hack stuff from GHC.Types.Id (where it never belonged)
        to GHC.Core.Opt.Arity (which seems much more appropriate).
      
      * Add a crucial mkCast in the Cast case of
        GHC.Core.Opt.Arity.eta_expand; helps with T18223
      
      * Add clarifying notes about eta-reducing to PAPs.
        See Note [Do not eta reduce PAPs]
      
      * I moved tryEtaReduce from GHC.Core.Utils to GHC.Core.Opt.Arity,
        where it properly belongs.  See Note [Eta reduce PAPs]
      
      * In GHC.Core.Opt.Simplify.Utils.tryEtaExpandRhs, pull out the code for
        when eta-expansion is wanted, to make wantEtaExpansion, and all that
        same function in GHC.Core.Opt.Simplify.simplStableUnfolding.  It was
        previously inconsistent, but it's doing the same thing.
      
      * I did a substantial refactor of ArityType; see Note [ArityType].
        This allowed me to do away with the somewhat mysterious takeOneShots;
        more generally it allows arityType to describe the function, leaving
        its clients to decide how to use that information.
      
        I made ArityType abstract, so that clients have to use functions
        to access it.
      
      * Make GHC.Core.Opt.Simplify.Utils.rebuildLam (was stupidly called
        mkLam before) aware of the floats that the simplifier builds up, so
        that it can still do eta-reduction even if there are some floats.
        (Previously that would not happen.)  That means passing the floats
        to rebuildLam, and an extra check when eta-reducting (etaFloatOk).
      
      * In GHC.Core.Opt.Simplify.Utils.tryEtaExpandRhs, make use of call-info
        in the idDemandInfo of the binder, as well as the CallArity info. The
        occurrence analyser did this but we were failing to take advantage here.
      
        In the end I moved the heavy lifting to GHC.Core.Opt.Arity.findRhsArity;
        see Note [Combining arityType with demand info], and functions
        idDemandOneShots and combineWithDemandOneShots.
      
        (These changes partly drove my refactoring of ArityType.)
      
      * In GHC.Core.Opt.Arity.findRhsArity
        * I'm now taking account of the demand on the binder to give
          extra one-shot info.  E.g. if the fn is always called with two
          args, we can give better one-shot info on the binders
          than if we just look at the RHS.
      
        * Don't do any fixpointing in the non-recursive
          case -- simple short cut.
      
        * Trim arity inside the loop. See Note [Trim arity inside the loop]
      
      * Make SimpleOpt respect the eta-reduction flag
        (Some associated refactoring here.)
      
      * I made the CallCtxt which the Simplifier uses distinguish between
        recursive and non-recursive right-hand sides.
           data CallCtxt = ... | RhsCtxt RecFlag | ...
        It affects only one thing:
           - We call an RHS context interesting only if it is non-recursive
             see Note [RHS of lets] in GHC.Core.Unfold
      
      * Remove eta-reduction in GHC.CoreToStg.Prep, a welcome simplification.
        See Note [No eta reduction needed in rhsToBody] in GHC.CoreToStg.Prep.
      
      Other incidental changes
      
      * Fix a fairly long-standing outright bug in the ApplyToVal case of
        GHC.Core.Opt.Simplify.mkDupableContWithDmds. I was failing to take the
        tail of 'dmds' in the recursive call, which meant the demands were All
        Wrong.  I have no idea why this has not caused problems before now.
      
      * Delete dead function GHC.Core.Opt.Simplify.Utils.contIsRhsOrArg
      
      Metrics: compile_time/bytes allocated
                                     Test    Metric       Baseline      New value Change
      ---------------------------------------------------------------------------------------
      MultiLayerModulesTH_OneShot(normal) ghc/alloc  2,743,297,692  2,619,762,992  -4.5% GOOD
                           T18223(normal) ghc/alloc  1,103,161,360    972,415,992 -11.9% GOOD
                            T3064(normal) ghc/alloc    201,222,500    184,085,360  -8.5% GOOD
                            T8095(normal) ghc/alloc  3,216,292,528  3,254,416,960  +1.2%
                            T9630(normal) ghc/alloc  1,514,131,032  1,557,719,312  +2.9%  BAD
                       parsing001(normal) ghc/alloc    530,409,812    525,077,696  -1.0%
      
      geo. mean                                 -0.1%
      
      Nofib:
             Program           Size    Allocs   Runtime   Elapsed  TotalMem
      --------------------------------------------------------------------------------
               banner          +0.0%     +0.4%     -8.9%     -8.7%      0.0%
          exact-reals          +0.0%     -7.4%    -36.3%    -37.4%      0.0%
       fannkuch-redux          +0.0%     -0.1%     -1.0%     -1.0%      0.0%
                 fft2          -0.1%     -0.2%    -17.8%    -19.2%      0.0%
                fluid          +0.0%     -1.3%     -2.1%     -2.1%      0.0%
                   gg          -0.0%     +2.2%     -0.2%     -0.1%      0.0%
        spectral-norm          +0.1%     -0.2%      0.0%      0.0%      0.0%
                  tak          +0.0%     -0.3%     -9.8%     -9.8%      0.0%
                 x2n1          +0.0%     -0.2%     -3.2%     -3.2%      0.0%
      --------------------------------------------------------------------------------
                  Min          -3.5%     -7.4%    -58.7%    -59.9%      0.0%
                  Max          +0.1%     +2.2%    +32.9%    +32.9%      0.0%
       Geometric Mean          -0.0%     -0.1%    -14.2%    -14.8%     -0.0%
      
      Metric Decrease:
          MultiLayerModulesTH_OneShot
          T18223
          T3064
          T15185
          T14766
      Metric Increase:
          T9630
      6656f016
  10. May 03, 2022
    • Sebastian Graf's avatar
      Assume at least one evaluation for nested SubDemands (#21081, #21133) · 15ffe2b0
      Sebastian Graf authored
      See the new `Note [SubDemand denotes at least one evaluation]`.
      
      A demand `n :* sd` on a let binder `x=e` now means
      
      > "`x` was evaluated `n` times and in any program trace it is evaluated, `e` is
      >  evaluated deeply in sub-demand `sd`."
      
      The "any time it is evaluated" premise is what this patch adds. As a result,
      we get better nested strictness. For example (T21081)
      ```hs
      f :: (Bool, Bool) -> (Bool, Bool)
      f pr = (case pr of (a,b) -> a /= b, True)
      -- before: <MP(L,L)>
      -- after:  <MP(SL,SL)>
      
      g :: Int -> (Bool, Bool)
      g x = let y = let z = odd x in (z,z) in f y
      ```
      The change in demand signature "before" to "after" allows us to case-bind `z`
      here.
      
      Similarly good things happen for the `sd` in call sub-demands `Cn(sd)`, which
      allows for more eta-reduction (which is only sound with `-fno-pedantic-bottoms`,
      albeit).
      
      We also fix #21085, a surprising inconsistency with `Poly` to `Call` sub-demand
      expansion.
      
      In an attempt to fix a regression caused by less inlining due to eta-reduction
      in T15426, I eta-expanded the definition of `elemIndex` and `elemIndices`, thus
      fixing #21345 on the go.
      
      The main point of this patch is that it fixes #21081 and #21133.
      
      Annoyingly, I discovered that more precise demand signatures for join points can
      transform a program into a lazier program if that join point gets floated to the
      top-level, see #21392. There is no simple fix at the moment, but !5349 might.
      Thus, we accept a ~5% regression in `MultiLayerModulesTH_OneShot`, where #21392
      bites us in `addListToUniqDSet`. T21392 reliably reproduces the issue.
      
      Surprisingly, ghc/alloc perf on Windows improves much more than on other jobs, by
      0.4% in the geometric mean and by 2% in T16875.
      
      Metric Increase:
          MultiLayerModulesTH_OneShot
      Metric Decrease:
          T16875
      15ffe2b0
  11. Mar 16, 2022
    • Sebastian Graf's avatar
      Demand: Let `Boxed` win in `lubBoxity` (#21119) · 1575c4a5
      Sebastian Graf authored and Marge Bot's avatar Marge Bot committed
      Previously, we let `Unboxed` win in `lubBoxity`, which is unsoundly optimistic
      in terms ob Boxity analysis. "Unsoundly" in the sense that we sometimes unbox
      parameters that we better shouldn't unbox. Examples are #18907 and T19871.absent.
      
      Until now, we thought that this hack pulled its weight becuase it worked around
      some shortcomings of the phase separation between Boxity analysis and CPR
      analysis. But it is a gross hack which caused regressions itself that needed all
      kinds of fixes and workarounds. See for example #20767. It became impossible to
      work with in !7599, so I want to remove it.
      
      For example, at the moment, `lubDmd B dmd` will not unbox `dmd`,
      but `lubDmd A dmd` will. Given that `B` is supposed to be the bottom element of
      the lattice, it's hardly justifiable to get a better demand when `lub`bing with
      `A`.
      
      The consequence of letting `Boxed` win in `lubBoxity` is that we *would* regress
       #2387, #16040 and parts of #5075 and T19871.sumIO, until Boxity and CPR
      are able to communicate better. Fortunately, that is not the case since I could
      tweak the other source of optimism in Boxity analysis that is described in
      `Note [Unboxed demand on function bodies returning small products]` so that
      we *recursively* assume unboxed demands on function bodies returning small
      products. See the updated Note.
      
      `Note [Boxity for bottoming functions]` describes why we need bottoming
      functions to have signatures that say that they deeply unbox their arguments.
      In so doing, I had to tweak `finaliseArgBoxities` so that it will never unbox
      recursive data constructors. This is in line with our handling of them in CPR.
      I updated `Note [Which types are unboxed?]` to reflect that.
      
      In turn we fix #21119, #20767, #18907, T19871.absent and get a much simpler
      implementation (at least to think about). We can also drop the very ad-hoc
      definition of `deferAfterPreciseException` and its Note in favor of the
      simple, intuitive definition we used to have.
      
      Metric Decrease:
          T16875
          T18223
          T18698a
          T18698b
          hard_hole_fits
      Metric Increase:
          LargeRecord
          MultiComponentModulesRecomp
          T15703
          T8095
          T9872d
      
      Out of all the regresions, only the one in T9872d doesn't vanish in a perf
      build, where the compiler is bootstrapped with -O2 and thus SpecConstr.
      Reason for regressions:
      
        * T9872d is due to `ty_co_subst` taking its `LiftingContext` boxed.
          That is because the context is passed to a function argument, for
          example in `liftCoSubstTyVarBndrUsing`.
        * In T15703, LargeRecord and T8095, we get a bit more allocations in
          `expand_syn` and `piResultTys`, because a `TCvSubst` isn't unboxed.
          In both cases that guards against reboxing in some code paths.
        * The same is true for MultiComponentModulesRecomp, where we get less unboxing
          in `GHC.Unit.Finder.$wfindInstalledHomeModule`. In a perf build, allocations
          actually *improve* by over 4%!
      
      Results on NoFib:
      
      --------------------------------------------------------------------------------
              Program         Allocs    Instrs
      --------------------------------------------------------------------------------
               awards          -0.4%     +0.3%
            cacheprof          -0.3%     +2.4%
                  fft          -1.5%     -5.1%
             fibheaps          +1.2%     +0.8%
                fluid          -0.3%     -0.1%
                  ida          +0.4%     +0.9%
         k-nucleotide          +0.4%     -0.1%
           last-piece         +10.5%    +13.9%
                 lift          -4.4%     +3.5%
              mandel2         -99.7%    -99.8%
                 mate          -0.4%     +3.6%
               parser          -1.0%     +0.1%
               puzzle         -11.6%     +6.5%
      reverse-complem          -3.0%     +2.0%
                  scs          -0.5%     +0.1%
               sphere          -0.4%     -0.2%
            wave4main          -8.2%     -0.3%
      --------------------------------------------------------------------------------
      Summary excludes mandel2 because of excessive bias
                  Min         -11.6%     -5.1%
                  Max         +10.5%    +13.9%
       Geometric Mean          -0.2%     +0.3%
      --------------------------------------------------------------------------------
      
      Not bad for a bug fix.
      
      The regression in `last-piece` could become a win if SpecConstr would work on
      non-recursive functions. The regression in `fibheaps` is due to
      `Note [Reboxed crud for bottoming calls]`, e.g., #21128.
      1575c4a5
  12. Feb 12, 2022
    • Andreas Klebinger's avatar
      Tag inference work. · 0e93023e
      Andreas Klebinger authored and Matthew Pickering's avatar Matthew Pickering committed
      This does three major things:
      * Enforce the invariant that all strict fields must contain tagged
      pointers.
      * Try to predict the tag on bindings in order to omit tag checks.
      * Allows functions to pass arguments unlifted (call-by-value).
      
      The former is "simply" achieved by wrapping any constructor allocations with
      a case which will evaluate the respective strict bindings.
      
      The prediction is done by a new data flow analysis based on the STG
      representation of a program. This also helps us to avoid generating
      redudant cases for the above invariant.
      
      StrictWorkers are created by W/W directly and SpecConstr indirectly.
      See the Note [Strict Worker Ids]
      
      Other minor changes:
      
      * Add StgUtil module containing a few functions needed by, but
        not specific to the tag analysis.
      
      -------------------------
      Metric Decrease:
      	T12545
      	T18698b
      	T18140
      	T18923
              LargeRecord
      Metric Increase:
              LargeRecord
      	ManyAlternatives
      	ManyConstructors
      	T10421
      	T12425
      	T12707
      	T13035
      	T13056
      	T13253
      	T13253-spj
      	T13379
      	T15164
      	T18282
      	T18304
      	T18698a
      	T1969
      	T20049
      	T3294
      	T4801
      	T5321FD
      	T5321Fun
      	T783
      	T9233
      	T9675
      	T9961
      	T19695
      	WWRec
      -------------------------
      0e93023e
  13. Oct 24, 2021
    • Sebastian Graf's avatar
      DmdAnal: Implement Boxity Analysis (#19871) · 3bab222c
      Sebastian Graf authored and Marge Bot's avatar Marge Bot committed
      This patch fixes some abundant reboxing of `DynFlags` in
      `GHC.HsToCore.Match.Literal.warnAboutOverflowedLit` (which was the topic
      of #19407) by introducing a Boxity analysis to GHC, done as part of demand
      analysis. This allows to accurately capture ad-hoc unboxing decisions previously
      made in worker/wrapper in demand analysis now, where the boxity info can
      propagate through demand signatures.
      
      See the new `Note [Boxity analysis]`. The actual fix for #19407 is described in
      `Note [No lazy, Unboxed demand in demand signature]`, but
      `Note [Finalising boxity for demand signature]` is probably a better entry-point.
      
      To support the fix for #19407, I had to change (what was)
      `Note [Add demands for strict constructors]` a bit
      (now `Note [Unboxing evaluated arguments]`). In particular, we now take care of
      it in `finaliseBoxity` (which is only called from demand analaysis) instead of
      `wantToUnboxArg`.
      
      I also had to resurrect `Note [Product demands for function body]` and rename
      it to `Note [Unboxed demand on function bodies returning small products]` to
      avoid huge regressions in `join004` and `join007`, thereby fixing #4267 again.
      See the updated Note for details.
      
      A nice side-effect is that the worker/wrapper transformation no longer needs to
      look at strictness info and other bits such as `InsideInlineableFun` flags
      (needed for `Note [Do not unbox class dictionaries]`) at all. It simply collects
      boxity info from argument demands and interprets them with a severely simplified
      `wantToUnboxArg`. All the smartness is in `finaliseBoxity`, which could be moved
      to DmdAnal completely, if it wasn't for the call to `dubiousDataConInstArgTys`
      which would be awkward to export.
      
      I spent some time figuring out the reason for why `T16197` failed prior to my
      amendments to `Note [Unboxing evaluated arguments]`. After having it figured
      out, I minimised it a bit and added `T16197b`, which simply compares computed
      strictness signatures and thus should be far simpler to eyeball.
      
      The 12% ghc/alloc regression in T11545 is because of the additional `Boxity`
      field in `Poly` and `Prod` that results in more allocation during `lubSubDmd`
      and `plusSubDmd`. I made sure in the ticky profiles that the number of calls
      to those functions stayed the same. We can bear such an increase here, as we
      recently improved it by -68% (in b760c1f7).
      T18698* regress slightly because there is more unboxing of dictionaries
      happening and that causes Lint (mostly) to allocate more.
      
      Fixes #19871, #19407, #4267, #16859, #18907 and #13331.
      
      Metric Increase:
          T11545
          T18698a
          T18698b
      
      Metric Decrease:
          T12425
          T16577
          T18223
          T18282
          T4267
          T9961
      3bab222c
  14. Oct 20, 2021
    • Sylvain Henry's avatar
      Bignum: allow Integer predicates to inline (#20361) · 758e0d7b
      Sylvain Henry authored and Marge Bot's avatar Marge Bot committed
      T17516 allocations increase by 48% because Integer's predicates are
      inlined in some Ord instance methods. These methods become too big to be
      inlined while they probably should: this is tracked in #20516.
      
      Metric Increase:
          T17516
      758e0d7b
  15. Sep 30, 2021
    • Sebastian Graf's avatar
      Nested CPR light unleashed (#18174) · c261f220
      Sebastian Graf authored and Marge Bot's avatar Marge Bot committed
      This patch enables worker/wrapper for nested constructed products, as described
      in `Note [Nested CPR]`. The machinery for expressing Nested CPR was already
      there, since !5054. Worker/wrapper is equipped to exploit Nested CPR annotations
      since !5338. CPR analysis already handles applications in batches since !5753.
      This patch just needs to flip a few more switches:
      
      1. In `cprTransformDataConWork`, we need to look at the field expressions
         and their `CprType`s to see whether the evaluation of the expressions
         terminates quickly (= is in HNF) or if they are put in strict fields.
         If that is the case, then we retain their CPR info and may unbox nestedly
         later on. More details in `Note [Nested CPR]`.
      2. Enable nested `ConCPR` signatures in `GHC.Types.Cpr`.
      3. In the `asConCpr` call in `GHC.Core.Opt.WorkWrap.Utils`, pass CPR info of
         fields to the `Unbox`.
      4. Instead of giving CPR signatures to DataCon workers and wrappers, we now have
         `cprTransformDataConWork` for workers and treat wrappers by analysing their
         unfolding. As a result, the code from GHC.Types.Id.Make went away completely.
      5. I deactivated worker/wrappering for recursive DataCons and wrote a function
         `isRecDataCon` to detect them. We really don't want to give `repeat` or
         `replicate` the Nested CPR property.
         See Note [CPR for recursive data structures] for which kind of recursive
         DataCons we target.
      6. Fix a couple of tests and their outputs.
      
      I also documented that CPR can destroy sharing and lead to asymptotic increase
      in allocations (which is tracked by #13331/#19326) in
      `Note [CPR for data structures can destroy sharing]`.
      
      Nofib results:
      ```
      --------------------------------------------------------------------------------
              Program         Allocs    Instrs
      --------------------------------------------------------------------------------
         ben-raytrace          -3.1%     -0.4%
         binary-trees          +0.8%     -2.9%
         digits-of-e2          +5.8%     +1.2%
                event          +0.8%     -2.1%
       fannkuch-redux          +0.0%     -1.4%
                 fish           0.0%     -1.5%
               gamteb          -1.4%     -0.3%
              mkhprog          +1.4%     +0.8%
           multiplier          +0.0%     -1.9%
                  pic          -0.6%     -0.1%
              reptile         -20.9%    -17.8%
            wave4main          +4.8%     +0.4%
                 x2n1        -100.0%     -7.6%
      --------------------------------------------------------------------------------
                  Min         -95.0%    -17.8%
                  Max          +5.8%     +1.2%
       Geometric Mean          -2.9%     -0.4%
      ```
      The huge wins in x2n1 (loopy list) and reptile (see #19970) are due to
      refraining from unboxing (:). Other benchmarks like digits-of-e2 or wave4main
      regress because of that. Ultimately there are no great improvements due to
      Nested CPR alone, but at least it's a win.
      Binary sizes decrease by 0.6%.
      
      There are a significant number of metric decreases. The most notable ones (>1%):
      ```
             ManyAlternatives(normal) ghc/alloc   771656002.7   762187472.0  -1.2%
             ManyConstructors(normal) ghc/alloc  4191073418.7  4114369216.0  -1.8%
            MultiLayerModules(normal) ghc/alloc  3095678333.3  3128720704.0  +1.1%
                    PmSeriesG(normal) ghc/alloc    50096429.3    51495664.0  +2.8%
                    PmSeriesS(normal) ghc/alloc    63512989.3    64681600.0  +1.8%
                    PmSeriesV(normal) ghc/alloc    62575424.0    63767208.0  +1.9%
                       T10547(normal) ghc/alloc    29347469.3    29944240.0  +2.0%
                      T11303b(normal) ghc/alloc    46018752.0    47367576.0  +2.9%
                       T12150(optasm) ghc/alloc    81660890.7    82547696.0  +1.1%
                       T12234(optasm) ghc/alloc    59451253.3    60357952.0  +1.5%
                       T12545(normal) ghc/alloc  1705216250.7  1751278952.0  +2.7%
                       T12707(normal) ghc/alloc   981000472.0   968489800.0  -1.3% GOOD
                       T13056(optasm) ghc/alloc   389322664.0   372495160.0  -4.3% GOOD
                       T13253(normal) ghc/alloc   337174229.3   341954576.0  +1.4%
                       T13701(normal) ghc/alloc  2381455173.3  2439790328.0  +2.4%  BAD
                         T14052(ghci) ghc/alloc  2162530642.7  2139108784.0  -1.1%
                       T14683(normal) ghc/alloc  3049744728.0  2977535064.0  -2.4% GOOD
                       T14697(normal) ghc/alloc   362980213.3   369304512.0  +1.7%
                       T15164(normal) ghc/alloc  1323102752.0  1307480600.0  -1.2%
                       T15304(normal) ghc/alloc  1304607429.3  1291024568.0  -1.0%
                       T16190(normal) ghc/alloc   281450410.7   284878048.0  +1.2%
                       T16577(normal) ghc/alloc  7984960789.3  7811668768.0  -2.2% GOOD
                       T17516(normal) ghc/alloc  1171051192.0  1153649664.0  -1.5%
                       T17836(normal) ghc/alloc  1115569746.7  1098197592.0  -1.6%
                      T17836b(normal) ghc/alloc    54322597.3    55518216.0  +2.2%
                       T17977(normal) ghc/alloc    47071754.7    48403408.0  +2.8%
                      T17977b(normal) ghc/alloc    42579133.3    43977392.0  +3.3%
                       T18923(normal) ghc/alloc    71764237.3    72566240.0  +1.1%
                        T1969(normal) ghc/alloc   784821002.7   773971776.0  -1.4% GOOD
                        T3294(normal) ghc/alloc  1634913973.3  1614323584.0  -1.3% GOOD
                        T4801(normal) ghc/alloc   295619648.0   292776440.0  -1.0%
                      T5321FD(normal) ghc/alloc   278827858.7   276067280.0  -1.0%
                        T5631(normal) ghc/alloc   586618202.7   577579960.0  -1.5%
                        T5642(normal) ghc/alloc   494923048.0   487927208.0  -1.4%
                        T5837(normal) ghc/alloc    37758061.3    39261608.0  +4.0%
                        T9020(optasm) ghc/alloc   257362077.3   254672416.0  -1.0%
                        T9198(normal) ghc/alloc    49313365.3    50603936.0  +2.6%  BAD
                        T9233(normal) ghc/alloc   704944258.7   685692712.0  -2.7% GOOD
                        T9630(normal) ghc/alloc  1476621560.0  1455192784.0  -1.5%
                        T9675(optasm) ghc/alloc   443183173.3   433859696.0  -2.1% GOOD
                       T9872a(normal) ghc/alloc  1720926653.3  1693190072.0  -1.6% GOOD
                       T9872b(normal) ghc/alloc  2185618061.3  2162277568.0  -1.1% GOOD
                       T9872c(normal) ghc/alloc  1765842405.3  1733618088.0  -1.8% GOOD
         TcPlugin_RewritePerf(normal) ghc/alloc  2388882730.7  2365504696.0  -1.0%
                        WWRec(normal) ghc/alloc   607073186.7   597512216.0  -1.6%
      
                        T9203(normal) run/alloc   107284064.0   102881832.0  -4.1%
                haddock.Cabal(normal) run/alloc 24025329589.3 23768382560.0  -1.1%
                 haddock.base(normal) run/alloc 25660521653.3 25370321824.0  -1.1%
             haddock.compiler(normal) run/alloc 74064171706.7 73358712280.0  -1.0%
      ```
      The biggest exception to the rule is T13701 which seems to fluctuate as usual
      (not unlike T12545). T14697 has a similar quality, being a generated
      multi-module test. T5837 is small enough that it similarly doesn't measure
      anything significant besides module loading overhead.
      T13253 simply does one additional round of Simplification due to Nested CPR.
      
      There are also some apparent regressions in T9198, T12234 and PmSeriesG that we
      (@mpickering and I) were simply unable to reproduce locally. @mpickering tried
      to run the CI script in a local Docker container and actually found that T9198
      and PmSeriesG *improved*. In MRs that were rebased on top this one, like !4229,
      I did not experience such increases. Let's not get hung up on these regression
      tests, they were meant to test for asymptotic regressions.
      
      The build-cabal test improves by 1.2% in -O0.
      
      Metric Increase:
          T10421
          T12234
          T12545
          T13035
          T13056
          T13701
          T14697
          T18923
          T5837
          T9198
      Metric Decrease:
          ManyConstructors
          T12545
          T12707
          T13056
          T14683
          T16577
          T18223
          T1969
          T3294
          T9203
          T9233
          T9675
          T9872a
          T9872b
          T9872c
          T9961
          TcPlugin_RewritePerf
      c261f220
  16. Jun 05, 2021
    • Simon Peyton Jones's avatar
      Avoid useless w/w split, take 2 · ea9a4ef6
      Simon Peyton Jones authored and Marge Bot's avatar Marge Bot committed
      This commit:
      
          commit c6faa42b
          Author: Simon Peyton Jones <simonpj@microsoft.com>
          Date:   Mon Mar 9 10:20:42 2020 +0000
      
          Avoid useless w/w split
      
          This patch is just a tidy-up for the post-strictness-analysis
          worker wrapper split.  Consider
      
             f x = x
      
          Strictnesss analysis does not lead to a w/w split, so the
          obvious thing is to leave it 100% alone.  But actually, because
          the RHS is small, we ended up adding a StableUnfolding for it.
      
          There is some reason to do this if we choose /not/ do to w/w
          on the grounds that the function is small.  See
          Note [Don't w/w inline small non-loop-breaker things]
      
          But there is no reason if we would not have done w/w anyway.
      
          This patch just moves the conditional to later.  Easy.
      
      turns out to have a bug in it.  Instead of /moving/ the conditional,
      I /duplicated/ it.  Then in a subsequent unrelated tidy-up
      (087ac4eb) I removed the second (redundant) test!
      
      This patch does what I originally intended.
      
      There is also a small refactoring in GHC.Core.Unfold, to make the
      code clearer, but with no change in behaviour.
      
      It does, however, have a generally good effect on compile times,
      because we aren't dealing with so many silly stable unfoldings.
      Here are the non-zero changes:
      
      Metrics: compile_time/bytes allocated
      -------------------------------------
                                               Baseline
                           Test    Metric         value     New value Change
      ---------------------------------------------------------------------------
       ManyAlternatives(normal) ghc/alloc   791969344.0   792665048.0  +0.1%
       ManyConstructors(normal) ghc/alloc  4351126824.0  4358303528.0  +0.2%
              PmSeriesG(normal) ghc/alloc    50362552.0    50482208.0  +0.2%
              PmSeriesS(normal) ghc/alloc    63733024.0    63619912.0  -0.2%
                 T10421(normal) ghc/alloc   121224624.0   119695448.0  -1.3% GOOD
                T10421a(normal) ghc/alloc    85256392.0    83714224.0  -1.8%
                 T10547(normal) ghc/alloc    29253072.0    29258256.0  +0.0%
                 T10858(normal) ghc/alloc   189343152.0   187972328.0  -0.7%
                 T11195(normal) ghc/alloc   281208248.0   279727584.0  -0.5%
                 T11276(normal) ghc/alloc   141966952.0   142046224.0  +0.1%
                T11303b(normal) ghc/alloc    46228360.0    46259024.0  +0.1%
                 T11545(normal) ghc/alloc  2663128768.0  2667412656.0  +0.2%
                 T11822(normal) ghc/alloc   138686944.0   138760176.0  +0.1%
                 T12227(normal) ghc/alloc   482836000.0   475421056.0  -1.5% GOOD
                 T12234(optasm) ghc/alloc    60710520.0    60781808.0  +0.1%
                 T12425(optasm) ghc/alloc   104089000.0   104022424.0  -0.1%
                 T12545(normal) ghc/alloc  1711759416.0  1705711528.0  -0.4%
                 T12707(normal) ghc/alloc   991541120.0   991921776.0  +0.0%
                 T13035(normal) ghc/alloc   108199872.0   108370704.0  +0.2%
                 T13056(optasm) ghc/alloc   414642544.0   412580384.0  -0.5%
                 T13253(normal) ghc/alloc   361701272.0   355838624.0  -1.6%
             T13253-spj(normal) ghc/alloc   157710168.0   157397768.0  -0.2%
                 T13379(normal) ghc/alloc   370984400.0   371345888.0  +0.1%
                 T13701(normal) ghc/alloc  2439764144.0  2441351984.0  +0.1%
                   T14052(ghci) ghc/alloc  2154090896.0  2156671400.0  +0.1%
                 T15164(normal) ghc/alloc  1478517688.0  1440317696.0  -2.6% GOOD
                 T15630(normal) ghc/alloc   178053912.0   172489808.0  -3.1%
                 T16577(normal) ghc/alloc  7859948896.0  7854524080.0  -0.1%
                 T17516(normal) ghc/alloc  1271520128.0  1202096488.0  -5.5% GOOD
                 T17836(normal) ghc/alloc  1123320632.0  1123922480.0  +0.1%
                T17836b(normal) ghc/alloc    54526280.0    54576776.0  +0.1%
                T17977b(normal) ghc/alloc    42706752.0    42730544.0  +0.1%
                 T18140(normal) ghc/alloc   108834568.0   108693816.0  -0.1%
                 T18223(normal) ghc/alloc  5539629264.0  5579500872.0  +0.7%
                 T18304(normal) ghc/alloc    97589720.0    97196944.0  -0.4%
                 T18478(normal) ghc/alloc   770755472.0   771232888.0  +0.1%
                T18698a(normal) ghc/alloc   408691160.0   374364992.0  -8.4% GOOD
                T18698b(normal) ghc/alloc   492419768.0   458809408.0  -6.8% GOOD
                 T18923(normal) ghc/alloc    72177032.0    71368824.0  -1.1%
                  T1969(normal) ghc/alloc   803523496.0   804655112.0  +0.1%
                  T3064(normal) ghc/alloc   198411784.0   198608512.0  +0.1%
                  T4801(normal) ghc/alloc   312416688.0   312874976.0  +0.1%
               T5321Fun(normal) ghc/alloc   325230680.0   325474448.0  +0.1%
                  T5631(normal) ghc/alloc   592064448.0   593518968.0  +0.2%
                  T5837(normal) ghc/alloc    37691496.0    37710904.0  +0.1%
                   T783(normal) ghc/alloc   404629536.0   405064432.0  +0.1%
                  T9020(optasm) ghc/alloc   266004608.0   266375592.0  +0.1%
                  T9198(normal) ghc/alloc    49221336.0    49268648.0  +0.1%
                  T9233(normal) ghc/alloc   913464984.0   742680256.0 -18.7% GOOD
                  T9675(optasm) ghc/alloc   552296608.0   466322000.0 -15.6% GOOD
                 T9872a(normal) ghc/alloc  1789910616.0  1793924472.0  +0.2%
                 T9872b(normal) ghc/alloc  2315141376.0  2310338056.0  -0.2%
                 T9872c(normal) ghc/alloc  1840422424.0  1841567224.0  +0.1%
                 T9872d(normal) ghc/alloc   556713248.0   556838432.0  +0.0%
                  T9961(normal) ghc/alloc   383809160.0   384601600.0  +0.2%
                  WWRec(normal) ghc/alloc   773751272.0   753949608.0  -2.6% GOOD
      
      Residency goes down too:
      
      Metrics: compile_time/max_bytes_used
      ------------------------------------
                                   Baseline
                 Test  Metric         value     New value Change
      -----------------------------------------------------------
       T10370(optasm) ghc/max    42058448.0    39481672.0  -6.1%
       T11545(normal) ghc/max    43641392.0    43634752.0  -0.0%
       T15304(normal) ghc/max    29895824.0    29439032.0  -1.5%
       T15630(normal) ghc/max     8822568.0     8772328.0  -0.6%
      T18698a(normal) ghc/max    13882536.0    13787112.0  -0.7%
      T18698b(normal) ghc/max    14714112.0    13836408.0  -6.0%
        T1969(normal) ghc/max    24724128.0    24733496.0  +0.0%
        T3064(normal) ghc/max    14041152.0    14034768.0  -0.0%
        T3294(normal) ghc/max    32769248.0    32760312.0  -0.0%
        T9630(normal) ghc/max    41605120.0    41572184.0  -0.1%
        T9675(optasm) ghc/max    18652296.0    17253480.0  -7.5%
      
      Metric Decrease:
          T10421
          T12227
          T15164
          T17516
          T18698a
          T18698b
          T9233
          T9675
          WWRec
      
      Metric Increase:
          T12545
      ea9a4ef6
  17. Mar 03, 2021
  18. Dec 12, 2020
  19. Nov 20, 2020
    • Sebastian Graf's avatar
      Demand: Interleave usage and strictness demands (#18903) · 0aec78b6
      Sebastian Graf authored and Marge Bot's avatar Marge Bot committed
      As outlined in #18903, interleaving usage and strictness demands not
      only means a more compact demand representation, but also allows us to
      express demands that we weren't easily able to express before.
      
      Call demands are *relative* in the sense that a call demand `Cn(cd)`
      on `g` says "`g` is called `n` times. *Whenever `g` is called*, the
      result is used according to `cd`". Example from #18903:
      
      ```hs
      h :: Int -> Int
      h m =
        let g :: Int -> (Int,Int)
            g 1 = (m, 0)
            g n = (2 * n, 2 `div` n)
            {-# NOINLINE g #-}
        in case m of
          1 -> 0
          2 -> snd (g m)
          _ -> uncurry (+) (g m)
      ```
      
      Without the interleaved representation, we would just get `L` for the
      strictness demand on `g`. Now we are able to express that whenever
      `g` is called, its second component is used strictly in denoting `g`
      by `1C1(P(1P(U),SP(U)))`. This would allow Nested CPR to unbox the
      division, for example.
      
      Fixes #18903.
      While fixing regressions, I also discovered and fixed #18957.
      
      Metric Decrease:
          T13253-spj
      0aec78b6
  20. Nov 13, 2020
    • Sebastian Graf's avatar
      Arity: Emit "Exciting arity" warning only after second iteration (#18937) · 197d59fa
      Sebastian Graf authored and Marge Bot's avatar Marge Bot committed
      See Note [Exciting arity] why we emit the warning at all and why we only
      do after the second iteration now.
      
      Fixes #18937.
      197d59fa
    • Sebastian Graf's avatar
      Arity: Rework `ArityType` to fix monotonicity (#18870) · 63fa3997
      Sebastian Graf authored and Marge Bot's avatar Marge Bot committed
      As we found out in #18870, `andArityType` is not monotone, with
      potentially severe consequences for termination of fixed-point
      iteration. That showed in an abundance of "Exciting arity" DEBUG
      messages that are emitted whenever we do more than one step in
      fixed-point iteration.
      
      The solution necessitates also recording `OneShotInfo` info for
      `ABot` arity type. Thus we get the following definition for `ArityType`:
      
      ```
      data ArityType = AT [OneShotInfo] Divergence
      ```
      
      The majority of changes in this patch are the result of refactoring use
      sites of `ArityType` to match the new definition.
      
      The regression test `T18870` asserts that we indeed don't emit any DEBUG
      output anymore for a function where we previously would have.
      Similarly, there's a regression test `T18937` for #18937, which we
      expect to be broken for now.
      
      Fixes #18870.
      63fa3997
  21. Oct 18, 2020
    • Sebastian Graf's avatar
      Testsuite: Add dead arity analysis tests · 451455fd
      Sebastian Graf authored and Marge Bot's avatar Marge Bot committed
      We didn't seem to test these old tests at all, judging from their
      expected output.
      451455fd
    • Sebastian Graf's avatar
      Arity: Record arity types for non-recursive lets · 6b3eb06a
      Sebastian Graf authored and Marge Bot's avatar Marge Bot committed
      In #18793, we saw a compelling example which requires us to look at
      non-recursive let-bindings during arity analysis and unleash their arity
      types at use sites.
      
      After the refactoring in the previous patch, the needed change is quite
      simple and very local to `arityType`'s defn for non-recurisve `Let`.
      
      Apart from that, we had to get rid of the second item of
      `Note [Dealing with bottoms]`, which was entirely a safety measure and
      hindered optimistic fixed-point iteration.
      
      Fixes #18793.
      
      The following metric increases are all caused by this commit and a
      result of the fact that we just do more work now:
      
      Metric Increase:
          T3294
          T12545
          T12707
      6b3eb06a
  22. Feb 23, 2016
  23. Oct 07, 2014
  24. Jul 20, 2011
Loading