Commits · e8034d15f04539ec16e827bd10c6d9b111566592 · Reinier Maas / GHC

Feb 25, 2024
- ghc-internal: Move modules into GHC.Internal.* namespace · d8d6ad8c
  Ben Gamari authored 1 year ago
  
  Bumps haddock submodule due to testsuite output changes.
  d8d6ad8c
Jan 13, 2024

Arity: Require called *exactly once* for eta exp with -fpedantic-bottoms (#24296 ) · 42bee5aa

Sebastian Graf authored 1 year ago and

Marge Bot committed 1 year ago

In #24296, we had a program in which we eta expanded away an error despite the
presence of `-fpedantic-bottoms`.
This was caused by turning called *at least once* lambdas into one-shot lambdas,
while with `-fpedantic-bottoms` it is only sound to eta expand over lambdas that
are called *exactly* once.
An example can be found in `Note [Combining arity type with demand info]`.

Fixes #24296.

42bee5aa

Sep 29, 2022

Demand: Format Call SubDemands `Cn(sd)` as `C(n,sd)` (#22231) · 5a535172

Sebastian Graf authored 2 years ago

Justification in #22231. Short form: In a demand like `1C1(C1(L))`
it was too easy to confuse which `1` belongs to which `C`. Now
that should be more obvious.

Fixes #22231

5a535172

Sep 28, 2022

Improve aggressive specialisation · 2a53ac18

Simon Peyton Jones authored 2 years ago and

Marge Bot committed 2 years ago

This patch fixes #21286, by not unboxing dictionaries in
worker/wrapper (ever). The main payload is tiny:

* In `GHC.Core.Opt.DmdAnal.finaliseArgBoxities`, do not unbox
  dictionaries in `get_dmd`.  See Note [Do not unbox class dictionaries]
  in that module

* I also found that imported wrappers were being fruitlessly
  specialised, so I fixed that too, in canSpecImport.
  See Note [Specialising imported functions] point (2).

In doing due diligence in the testsuite I fixed a number of
other things:

* Improve Note [Specialising unfoldings] in GHC.Core.Unfold.Make,
  and Note [Inline specialisations] in GHC.Core.Opt.Specialise,
  and remove duplication between the two. The new Note describes
  how we specialise functions with an INLINABLE pragma.

  And simplify the defn of `spec_unf` in `GHC.Core.Opt.Specialise.specCalls`.

* Improve Note [Worker/wrapper for INLINABLE functions] in
  GHC.Core.Opt.WorkWrap.

  And (critially) make an actual change which is to propagate the
  user-written pragma from the original function to the wrapper; see
  `mkStrWrapperInlinePrag`.

* Write new Note [Specialising imported functions] in
  GHC.Core.Opt.Specialise

All this has a big effect on some compile times. This is
compiler/perf, showing only changes over 1%:

Metrics: compile_time/bytes allocated
-------------------------------------
                LargeRecord(normal)  -50.2% GOOD
           ManyConstructors(normal)   +1.0%
MultiLayerModulesTH_OneShot(normal)   +2.6%
                  PmSeriesG(normal)   -1.1%
                     T10547(normal)   -1.2%
                     T11195(normal)   -1.2%
                     T11276(normal)   -1.0%
                    T11303b(normal)   -1.6%
                     T11545(normal)   -1.4%
                     T11822(normal)   -1.3%
                     T12150(optasm)   -1.0%
                     T12234(optasm)   -1.2%
                     T13056(optasm)   -9.3% GOOD
                     T13253(normal)   -3.8% GOOD
                     T15164(normal)   -3.6% GOOD
                     T16190(normal)   -2.1%
                     T16577(normal)   -2.8% GOOD
                     T16875(normal)   -1.6%
                     T17836(normal)   +2.2%
                    T17977b(normal)   -1.0%
                     T18223(normal)  -33.3% GOOD
                     T18282(normal)   -3.4% GOOD
                     T18304(normal)   -1.4%
                    T18698a(normal)   -1.4% GOOD
                    T18698b(normal)   -1.3% GOOD
                     T19695(normal)   -2.5% GOOD
                      T5837(normal)   -2.3%
                      T9630(normal)  -33.0% GOOD
                      WWRec(normal)   -9.7% GOOD
             hard_hole_fits(normal)   -2.1% GOOD
                     hie002(normal)   +1.6%

                          geo. mean   -2.2%
                          minimum    -50.2%
                          maximum     +2.6%

I diligently investigated some of the big drops.

* Caused by not doing w/w for dictionaries:
    T13056, T15164, WWRec, T18223

* Caused by not fruitlessly specialising wrappers
    LargeRecord, T9630

For runtimes, here is perf/should+_run:

Metrics: runtime/bytes allocated
--------------------------------
               T12990(normal)   -3.8%
                T5205(normal)   -1.3%
                T9203(normal)  -10.7% GOOD
        haddock.Cabal(normal)   +0.1%
         haddock.base(normal)   -1.1%
     haddock.compiler(normal)   -0.3%
        lazy-bs-alloc(normal)   -0.2%
------------------------------------------
                    geo. mean   -0.3%
                    minimum    -10.7%
                    maximum     +0.1%

I did not investigate exactly what happens in T9203.

Nofib is a wash:

+-------------------------------++--+-----------+-----------+
|                               ||  | tsv (rel) | std. err. |
+===============================++==+===========+===========+
|                     real/anna ||  |    -0.13% |      0.0% |
|                      real/fem ||  |    +0.13% |      0.0% |
|                   real/fulsom ||  |    -0.16% |      0.0% |
|                     real/lift ||  |    -1.55% |      0.0% |
|                  real/reptile ||  |    -0.11% |      0.0% |
|                  real/smallpt ||  |    +0.51% |      0.0% |
|          spectral/constraints ||  |    +0.20% |      0.0% |
|               spectral/dom-lt ||  |    +1.80% |      0.0% |
|               spectral/expert ||  |    +0.33% |      0.0% |
+===============================++==+===========+===========+
|                     geom mean ||  |           |           |
+-------------------------------++--+-----------+-----------+

I spent quite some time investigating dom-lt, but it's pretty
complicated.  See my note on !7847.  Conclusion: it's just a delicate
inlining interaction, and we have plenty of those.

Metric Decrease:
    LargeRecord
    T13056
    T13253
    T15164
    T16577
    T18223
    T18282
    T18698a
    T18698b
    T19695
    T9630
    WWRec
    hard_hole_fits
    T9203

2a53ac18

Sep 27, 2022

Demand: Clear distinction between Call SubDmd and eval Dmd (#21717) · aeafdba5

Sebastian Graf authored 2 years ago

In #21717 we saw a reportedly unsound strictness signature due to an unsound
definition of plusSubDmd on Calls. This patch contains a description and the fix
to the unsoundness as outlined in `Note [Call SubDemand vs. evaluation Demand]`.

This fix means we also get rid of the special handling of `-fpedantic-bottoms`
in eta-reduction. Thanks to less strict and actually sound strictness results,
we will no longer eta-reduce the problematic cases in the first place, even
without `-fpedantic-bottoms`.

So fixing the unsoundness also makes our eta-reduction code simpler with less
hacks to explain. But there is another, more unfortunate side-effect:
We *unfix* #21085, but fortunately we have a new fix ready:
See `Note [mkCall and plusSubDmd]`.

There's another change:
I decided to make `Note [SubDemand denotes at least one evaluation]` a lot
simpler by using `plusSubDmd` (instead of `lubPlusSubDmd`) even if both argument
demands are lazy. That leads to less precise results, but in turn rids ourselves
from the need for 4 different `OpMode`s and the complication of
`Note [Manual specialisation of lub*Dmd/plus*Dmd]`. The result is simpler code
that is in line with the paper draft on Demand Analysis.

I left the abandoned idea in `Note [Unrealised opportunity in plusDmd]` for
posterity. The fallout in terms of regressions is negligible, as the testsuite
and NoFib shows.

```
        Program         Allocs    Instrs
--------------------------------------------------------------------------------
         hidden          +0.2%     -0.2%
         linear          -0.0%     -0.7%
--------------------------------------------------------------------------------
            Min          -0.0%     -0.7%
            Max          +0.2%     +0.0%
 Geometric Mean          +0.0%     -0.0%
```

Fixes #21717.

aeafdba5

Aug 25, 2022

Fix arityType: -fpedantic-bottoms, join points, etc · a90298cc

Simon Peyton Jones authored 2 years ago

This MR fixes #21694, #21755.  It also makes sure that #21948 and
fix to #21694.

* For #21694 the underlying problem was that we were calling arityType
  on an expression that had free join points.  This is a Bad Bad Idea.
  See Note [No free join points in arityType].

* To make "no free join points in arityType" work out I had to avoid
  trying to use eta-expansion for runRW#. This entailed a few changes
  in the Simplifier's treatment of runRW#.  See
  GHC.Core.Opt.Simplify.Iteration Note [No eta-expansion in runRW#]

* I also made andArityType work correctly with -fpedantic-bottoms;
  see Note [Combining case branches: andWithTail].

* Rewrote Note [Combining case branches: optimistic one-shot-ness]

* arityType previously treated join points differently to other
  let-bindings. This patch makes them unform; arityType analyses
  the RHS of all bindings to get its ArityType, and extends am_sigs.

  I realised that, now we have am_sigs giving the ArityType for
  let-bound Ids, we don't need the (pre-dating) special code in
  arityType for join points. But instead we need to extend the env for
  Rec bindings, which weren't doing before.  More uniform now.  See
  Note [arityType for let-bindings].

  This meant we could get rid of ae_joins, and in fact get rid of
  EtaExpandArity altogether.  Simpler.

* And finally, it was the strange treatment of join-point Ids in
  arityType (involving a fake ABot type) that led to a serious bug:
  #21755.  Fixed by this refactoring, which treats them uniformly;
  but without breaking #18328.

  In fact, the arity for recursive join bindings is pretty tricky;
  see the long Note [Arity for recursive join bindings]
  in GHC.Core.Opt.Simplify.Utils.  That led to more refactoring,
  including deciding that an Id could have an Arity that is bigger
  than its JoinArity; see Note [Invariants on join points], item
  2(b) in GHC.Core

* Make sure that the "demand threshold" for join points in DmdAnal
  is no bigger than the join-arity.  In GHC.Core.Opt.DmdAnal see
  Note [Demand signatures are computed for a threshold arity based on idArity]

* I moved GHC.Core.Utils.exprIsDeadEnd into GHC.Core.Opt.Arity,
  where it more properly belongs.

* Remove an old, redundant hack in FloatOut.  The old Note was
  Note [Bottoming floats: eta expansion] in GHC.Core.Opt.SetLevels.

Compile time improves very slightly on average:

Metrics: compile_time/bytes allocated
---------------------------------------------------------------------------------------
  T18223(normal) ghc/alloc    725,808,720    747,839,216  +3.0%  BAD
  T6048(optasm)  ghc/alloc    105,006,104    101,599,472  -3.2% GOOD
  geo. mean                                          -0.2%
  minimum                                            -3.2%
  maximum                                            +3.0%

For some reason Windows was better

   T10421(normal) ghc/alloc    125,888,360    124,129,168  -1.4% GOOD
   T18140(normal) ghc/alloc     85,974,520     83,884,224  -2.4% GOOD
  T18698b(normal) ghc/alloc    236,764,568    234,077,288  -1.1% GOOD
   T18923(normal) ghc/alloc     75,660,528     73,994,512  -2.2% GOOD
    T6048(optasm) ghc/alloc    112,232,512    108,182,520  -3.6% GOOD
  geo. mean                                          -0.6%

I had a quick look at T18223 but it is knee deep in coercions and
the size of everything looks similar before and after.  I decided
to accept that 3% increase in exchange for goodness elsewhere.

Metric Decrease:
    T10421
    T18140
    T18698b
    T18923
    T6048

Metric Increase:
    T18223

a90298cc

Jun 27, 2022

Don't mark lambda binders as OtherCon · ac7a7fc8

Andreas Klebinger authored 2 years ago and

Marge Bot committed 2 years ago

We used to put OtherCon unfoldings on lambda binders of workers
and sometimes also join points/specializations with with the
assumption that since the wrapper would force these arguments
once we execute the RHS they would indeed be in WHNF.

This was wrong for reasons detailed in #21472. So now we purge
evaluated unfoldings from *all* lambda binders.

This fixes #21472, but at the cost of sometimes not using as efficient a
calling convention. It can also change inlining behaviour as some
occurances will no longer look like value arguments when they did
before.

As consequence we also change how we compute CBV information for
arguments slightly. We now *always* determine the CBV convention
for arguments during tidy. Earlier in the pipeline we merely mark
functions as candidates for having their arguments treated as CBV.

As before the process is described in the relevant notes:
Note [CBV Function Ids]
Note [Attaching CBV Marks to ids]
Note [Never put `OtherCon` unfoldigns on lambda binders]

-------------------------
Metric Decrease:
    T12425
    T13035
    T18223
    T18223
    T18923
    MultiLayerModulesTH_OneShot
Metric Increase:
    WWRec
-------------------------

ac7a7fc8

Jun 20, 2022

Simplify: Take care with eta reduction in recursive RHSs (#21652 ) · 49fb2f9b

Sebastian Graf authored 2 years ago

Similar to the fix to #20836 in CorePrep, we now track the set of enclosing
recursive binders in the SimplEnv and SimpleOptEnv.
See Note [Eta reduction in recursive RHSs] for details.

I also updated Note [Arity robustness] with the insights Simon and I had in a
call discussing the issue.

Fixes #21652.

Unfortunately, we get a 5% ghc/alloc regression in T16577. That is due to
additional eta reduction in GHC.Read.choose1 and the resulting ANF-isation
of a large list literal at the top-level that didn't happen before (presumably
because it was too interesting to float to the top-level). There's not much we
can do about that.

Metric Increase:
    T16577

49fb2f9b

May 30, 2022

A bunch of changes related to eta reduction · 6656f016

Simon Peyton Jones authored 3 years ago and

Marge Bot committed 2 years ago

This is a large collection of changes all relating to eta
reduction, originally triggered by #18993, but there followed
a long saga.

Specifics:

* Move state-hack stuff from GHC.Types.Id (where it never belonged)
  to GHC.Core.Opt.Arity (which seems much more appropriate).

* Add a crucial mkCast in the Cast case of
  GHC.Core.Opt.Arity.eta_expand; helps with T18223

* Add clarifying notes about eta-reducing to PAPs.
  See Note [Do not eta reduce PAPs]

* I moved tryEtaReduce from GHC.Core.Utils to GHC.Core.Opt.Arity,
  where it properly belongs.  See Note [Eta reduce PAPs]

* In GHC.Core.Opt.Simplify.Utils.tryEtaExpandRhs, pull out the code for
  when eta-expansion is wanted, to make wantEtaExpansion, and all that
  same function in GHC.Core.Opt.Simplify.simplStableUnfolding.  It was
  previously inconsistent, but it's doing the same thing.

* I did a substantial refactor of ArityType; see Note [ArityType].
  This allowed me to do away with the somewhat mysterious takeOneShots;
  more generally it allows arityType to describe the function, leaving
  its clients to decide how to use that information.

  I made ArityType abstract, so that clients have to use functions
  to access it.

* Make GHC.Core.Opt.Simplify.Utils.rebuildLam (was stupidly called
  mkLam before) aware of the floats that the simplifier builds up, so
  that it can still do eta-reduction even if there are some floats.
  (Previously that would not happen.)  That means passing the floats
  to rebuildLam, and an extra check when eta-reducting (etaFloatOk).

* In GHC.Core.Opt.Simplify.Utils.tryEtaExpandRhs, make use of call-info
  in the idDemandInfo of the binder, as well as the CallArity info. The
  occurrence analyser did this but we were failing to take advantage here.

  In the end I moved the heavy lifting to GHC.Core.Opt.Arity.findRhsArity;
  see Note [Combining arityType with demand info], and functions
  idDemandOneShots and combineWithDemandOneShots.

  (These changes partly drove my refactoring of ArityType.)

* In GHC.Core.Opt.Arity.findRhsArity
  * I'm now taking account of the demand on the binder to give
    extra one-shot info.  E.g. if the fn is always called with two
    args, we can give better one-shot info on the binders
    than if we just look at the RHS.

  * Don't do any fixpointing in the non-recursive
    case -- simple short cut.

  * Trim arity inside the loop. See Note [Trim arity inside the loop]

* Make SimpleOpt respect the eta-reduction flag
  (Some associated refactoring here.)

* I made the CallCtxt which the Simplifier uses distinguish between
  recursive and non-recursive right-hand sides.
     data CallCtxt = ... | RhsCtxt RecFlag | ...
  It affects only one thing:
     - We call an RHS context interesting only if it is non-recursive
       see Note [RHS of lets] in GHC.Core.Unfold

* Remove eta-reduction in GHC.CoreToStg.Prep, a welcome simplification.
  See Note [No eta reduction needed in rhsToBody] in GHC.CoreToStg.Prep.

Other incidental changes

* Fix a fairly long-standing outright bug in the ApplyToVal case of
  GHC.Core.Opt.Simplify.mkDupableContWithDmds. I was failing to take the
  tail of 'dmds' in the recursive call, which meant the demands were All
  Wrong.  I have no idea why this has not caused problems before now.

* Delete dead function GHC.Core.Opt.Simplify.Utils.contIsRhsOrArg

Metrics: compile_time/bytes allocated
                               Test    Metric       Baseline      New value Change
---------------------------------------------------------------------------------------
MultiLayerModulesTH_OneShot(normal) ghc/alloc  2,743,297,692  2,619,762,992  -4.5% GOOD
                     T18223(normal) ghc/alloc  1,103,161,360    972,415,992 -11.9% GOOD
                      T3064(normal) ghc/alloc    201,222,500    184,085,360  -8.5% GOOD
                      T8095(normal) ghc/alloc  3,216,292,528  3,254,416,960  +1.2%
                      T9630(normal) ghc/alloc  1,514,131,032  1,557,719,312  +2.9%  BAD
                 parsing001(normal) ghc/alloc    530,409,812    525,077,696  -1.0%

geo. mean                                 -0.1%

Nofib:
       Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
         banner          +0.0%     +0.4%     -8.9%     -8.7%      0.0%
    exact-reals          +0.0%     -7.4%    -36.3%    -37.4%      0.0%
 fannkuch-redux          +0.0%     -0.1%     -1.0%     -1.0%      0.0%
           fft2          -0.1%     -0.2%    -17.8%    -19.2%      0.0%
          fluid          +0.0%     -1.3%     -2.1%     -2.1%      0.0%
             gg          -0.0%     +2.2%     -0.2%     -0.1%      0.0%
  spectral-norm          +0.1%     -0.2%      0.0%      0.0%      0.0%
            tak          +0.0%     -0.3%     -9.8%     -9.8%      0.0%
           x2n1          +0.0%     -0.2%     -3.2%     -3.2%      0.0%
--------------------------------------------------------------------------------
            Min          -3.5%     -7.4%    -58.7%    -59.9%      0.0%
            Max          +0.1%     +2.2%    +32.9%    +32.9%      0.0%
 Geometric Mean          -0.0%     -0.1%    -14.2%    -14.8%     -0.0%

Metric Decrease:
    MultiLayerModulesTH_OneShot
    T18223
    T3064
    T15185
    T14766
Metric Increase:
    T9630

6656f016

May 03, 2022

Assume at least one evaluation for nested SubDemands (#21081 , #21133 ) · 15ffe2b0

Sebastian Graf authored 3 years ago

See the new `Note [SubDemand denotes at least one evaluation]`.

A demand `n :* sd` on a let binder `x=e` now means

> "`x` was evaluated `n` times and in any program trace it is evaluated, `e` is
>  evaluated deeply in sub-demand `sd`."

The "any time it is evaluated" premise is what this patch adds. As a result,
we get better nested strictness. For example (T21081)
```hs
f :: (Bool, Bool) -> (Bool, Bool)
f pr = (case pr of (a,b) -> a /= b, True)
-- before: <MP(L,L)>
-- after:  <MP(SL,SL)>

g :: Int -> (Bool, Bool)
g x = let y = let z = odd x in (z,z) in f y
```
The change in demand signature "before" to "after" allows us to case-bind `z`
here.

Similarly good things happen for the `sd` in call sub-demands `Cn(sd)`, which
allows for more eta-reduction (which is only sound with `-fno-pedantic-bottoms`,
albeit).

We also fix #21085, a surprising inconsistency with `Poly` to `Call` sub-demand
expansion.

In an attempt to fix a regression caused by less inlining due to eta-reduction
in T15426, I eta-expanded the definition of `elemIndex` and `elemIndices`, thus
fixing #21345 on the go.

The main point of this patch is that it fixes #21081 and #21133.

Annoyingly, I discovered that more precise demand signatures for join points can
transform a program into a lazier program if that join point gets floated to the
top-level, see #21392. There is no simple fix at the moment, but !5349 might.
Thus, we accept a ~5% regression in `MultiLayerModulesTH_OneShot`, where #21392
bites us in `addListToUniqDSet`. T21392 reliably reproduces the issue.

Surprisingly, ghc/alloc perf on Windows improves much more than on other jobs, by
0.4% in the geometric mean and by 2% in T16875.

Metric Increase:
    MultiLayerModulesTH_OneShot
Metric Decrease:
    T16875

15ffe2b0

Mar 16, 2022

Demand: Let `Boxed` win in `lubBoxity` (#21119) · 1575c4a5

Sebastian Graf authored 3 years ago and

Marge Bot committed 3 years ago

Previously, we let `Unboxed` win in `lubBoxity`, which is unsoundly optimistic
in terms ob Boxity analysis. "Unsoundly" in the sense that we sometimes unbox
parameters that we better shouldn't unbox. Examples are #18907 and T19871.absent.

Until now, we thought that this hack pulled its weight becuase it worked around
some shortcomings of the phase separation between Boxity analysis and CPR
analysis. But it is a gross hack which caused regressions itself that needed all
kinds of fixes and workarounds. See for example #20767. It became impossible to
work with in !7599, so I want to remove it.

For example, at the moment, `lubDmd B dmd` will not unbox `dmd`,
but `lubDmd A dmd` will. Given that `B` is supposed to be the bottom element of
the lattice, it's hardly justifiable to get a better demand when `lub`bing with
`A`.

The consequence of letting `Boxed` win in `lubBoxity` is that we *would* regress
 #2387, #16040 and parts of #5075 and T19871.sumIO, until Boxity and CPR
are able to communicate better. Fortunately, that is not the case since I could
tweak the other source of optimism in Boxity analysis that is described in
`Note [Unboxed demand on function bodies returning small products]` so that
we *recursively* assume unboxed demands on function bodies returning small
products. See the updated Note.

`Note [Boxity for bottoming functions]` describes why we need bottoming
functions to have signatures that say that they deeply unbox their arguments.
In so doing, I had to tweak `finaliseArgBoxities` so that it will never unbox
recursive data constructors. This is in line with our handling of them in CPR.
I updated `Note [Which types are unboxed?]` to reflect that.

In turn we fix #21119, #20767, #18907, T19871.absent and get a much simpler
implementation (at least to think about). We can also drop the very ad-hoc
definition of `deferAfterPreciseException` and its Note in favor of the
simple, intuitive definition we used to have.

Metric Decrease:
    T16875
    T18223
    T18698a
    T18698b
    hard_hole_fits
Metric Increase:
    LargeRecord
    MultiComponentModulesRecomp
    T15703
    T8095
    T9872d

Out of all the regresions, only the one in T9872d doesn't vanish in a perf
build, where the compiler is bootstrapped with -O2 and thus SpecConstr.
Reason for regressions:

  * T9872d is due to `ty_co_subst` taking its `LiftingContext` boxed.
    That is because the context is passed to a function argument, for
    example in `liftCoSubstTyVarBndrUsing`.
  * In T15703, LargeRecord and T8095, we get a bit more allocations in
    `expand_syn` and `piResultTys`, because a `TCvSubst` isn't unboxed.
    In both cases that guards against reboxing in some code paths.
  * The same is true for MultiComponentModulesRecomp, where we get less unboxing
    in `GHC.Unit.Finder.$wfindInstalledHomeModule`. In a perf build, allocations
    actually *improve* by over 4%!

Results on NoFib:

--------------------------------------------------------------------------------
        Program         Allocs    Instrs
--------------------------------------------------------------------------------
         awards          -0.4%     +0.3%
      cacheprof          -0.3%     +2.4%
            fft          -1.5%     -5.1%
       fibheaps          +1.2%     +0.8%
          fluid          -0.3%     -0.1%
            ida          +0.4%     +0.9%
   k-nucleotide          +0.4%     -0.1%
     last-piece         +10.5%    +13.9%
           lift          -4.4%     +3.5%
        mandel2         -99.7%    -99.8%
           mate          -0.4%     +3.6%
         parser          -1.0%     +0.1%
         puzzle         -11.6%     +6.5%
reverse-complem          -3.0%     +2.0%
            scs          -0.5%     +0.1%
         sphere          -0.4%     -0.2%
      wave4main          -8.2%     -0.3%
--------------------------------------------------------------------------------
Summary excludes mandel2 because of excessive bias
            Min         -11.6%     -5.1%
            Max         +10.5%    +13.9%
 Geometric Mean          -0.2%     +0.3%
--------------------------------------------------------------------------------

Not bad for a bug fix.

The regression in `last-piece` could become a win if SpecConstr would work on
non-recursive functions. The regression in `fibheaps` is due to
`Note [Reboxed crud for bottoming calls]`, e.g., #21128.

1575c4a5

Feb 12, 2022

Tag inference work. · 0e93023e

Andreas Klebinger authored 3 years ago and

Matthew Pickering committed 3 years ago

This does three major things:
* Enforce the invariant that all strict fields must contain tagged
pointers.
* Try to predict the tag on bindings in order to omit tag checks.
* Allows functions to pass arguments unlifted (call-by-value).

The former is "simply" achieved by wrapping any constructor allocations with
a case which will evaluate the respective strict bindings.

The prediction is done by a new data flow analysis based on the STG
representation of a program. This also helps us to avoid generating
redudant cases for the above invariant.

StrictWorkers are created by W/W directly and SpecConstr indirectly.
See the Note [Strict Worker Ids]

Other minor changes:

* Add StgUtil module containing a few functions needed by, but
  not specific to the tag analysis.

-------------------------
Metric Decrease:
	T12545
	T18698b
	T18140
	T18923
        LargeRecord
Metric Increase:
        LargeRecord
	ManyAlternatives
	ManyConstructors
	T10421
	T12425
	T12707
	T13035
	T13056
	T13253
	T13253-spj
	T13379
	T15164
	T18282
	T18304
	T18698a
	T1969
	T20049
	T3294
	T4801
	T5321FD
	T5321Fun
	T783
	T9233
	T9675
	T9961
	T19695
	WWRec
-------------------------

0e93023e

Oct 24, 2021

DmdAnal: Implement Boxity Analysis (#19871 ) · 3bab222c

Sebastian Graf authored 3 years ago and

Marge Bot committed 3 years ago

This patch fixes some abundant reboxing of `DynFlags` in
`GHC.HsToCore.Match.Literal.warnAboutOverflowedLit` (which was the topic
of #19407) by introducing a Boxity analysis to GHC, done as part of demand
analysis. This allows to accurately capture ad-hoc unboxing decisions previously
made in worker/wrapper in demand analysis now, where the boxity info can
propagate through demand signatures.

See the new `Note [Boxity analysis]`. The actual fix for #19407 is described in
`Note [No lazy, Unboxed demand in demand signature]`, but
`Note [Finalising boxity for demand signature]` is probably a better entry-point.

To support the fix for #19407, I had to change (what was)
`Note [Add demands for strict constructors]` a bit
(now `Note [Unboxing evaluated arguments]`). In particular, we now take care of
it in `finaliseBoxity` (which is only called from demand analaysis) instead of
`wantToUnboxArg`.

I also had to resurrect `Note [Product demands for function body]` and rename
it to `Note [Unboxed demand on function bodies returning small products]` to
avoid huge regressions in `join004` and `join007`, thereby fixing #4267 again.
See the updated Note for details.

A nice side-effect is that the worker/wrapper transformation no longer needs to
look at strictness info and other bits such as `InsideInlineableFun` flags
(needed for `Note [Do not unbox class dictionaries]`) at all. It simply collects
boxity info from argument demands and interprets them with a severely simplified
`wantToUnboxArg`. All the smartness is in `finaliseBoxity`, which could be moved
to DmdAnal completely, if it wasn't for the call to `dubiousDataConInstArgTys`
which would be awkward to export.

I spent some time figuring out the reason for why `T16197` failed prior to my
amendments to `Note [Unboxing evaluated arguments]`. After having it figured
out, I minimised it a bit and added `T16197b`, which simply compares computed
strictness signatures and thus should be far simpler to eyeball.

The 12% ghc/alloc regression in T11545 is because of the additional `Boxity`
field in `Poly` and `Prod` that results in more allocation during `lubSubDmd`
and `plusSubDmd`. I made sure in the ticky profiles that the number of calls
to those functions stayed the same. We can bear such an increase here, as we
recently improved it by -68% (in b760c1f7).
T18698* regress slightly because there is more unboxing of dictionaries
happening and that causes Lint (mostly) to allocate more.

Fixes #19871, #19407, #4267, #16859, #18907 and #13331.

Metric Increase:
    T11545
    T18698a
    T18698b

Metric Decrease:
    T12425
    T16577
    T18223
    T18282
    T4267
    T9961

3bab222c

Oct 20, 2021

Bignum: allow Integer predicates to inline (#20361) · 758e0d7b

Sylvain Henry authored 3 years ago and

Marge Bot committed 3 years ago

T17516 allocations increase by 48% because Integer's predicates are
inlined in some Ord instance methods. These methods become too big to be
inlined while they probably should: this is tracked in #20516.

Metric Increase:
    T17516

758e0d7b

Sep 30, 2021

Nested CPR light unleashed (#18174) · c261f220

Sebastian Graf authored 3 years ago and

Marge Bot committed 3 years ago

This patch enables worker/wrapper for nested constructed products, as described
in `Note [Nested CPR]`. The machinery for expressing Nested CPR was already
there, since !5054. Worker/wrapper is equipped to exploit Nested CPR annotations
since !5338. CPR analysis already handles applications in batches since !5753.
This patch just needs to flip a few more switches:

1. In `cprTransformDataConWork`, we need to look at the field expressions
   and their `CprType`s to see whether the evaluation of the expressions
   terminates quickly (= is in HNF) or if they are put in strict fields.
   If that is the case, then we retain their CPR info and may unbox nestedly
   later on. More details in `Note [Nested CPR]`.
2. Enable nested `ConCPR` signatures in `GHC.Types.Cpr`.
3. In the `asConCpr` call in `GHC.Core.Opt.WorkWrap.Utils`, pass CPR info of
   fields to the `Unbox`.
4. Instead of giving CPR signatures to DataCon workers and wrappers, we now have
   `cprTransformDataConWork` for workers and treat wrappers by analysing their
   unfolding. As a result, the code from GHC.Types.Id.Make went away completely.
5. I deactivated worker/wrappering for recursive DataCons and wrote a function
   `isRecDataCon` to detect them. We really don't want to give `repeat` or
   `replicate` the Nested CPR property.
   See Note [CPR for recursive data structures] for which kind of recursive
   DataCons we target.
6. Fix a couple of tests and their outputs.

I also documented that CPR can destroy sharing and lead to asymptotic increase
in allocations (which is tracked by #13331/#19326) in
`Note [CPR for data structures can destroy sharing]`.

Nofib results:
```
--------------------------------------------------------------------------------
        Program         Allocs    Instrs
--------------------------------------------------------------------------------
   ben-raytrace          -3.1%     -0.4%
   binary-trees          +0.8%     -2.9%
   digits-of-e2          +5.8%     +1.2%
          event          +0.8%     -2.1%
 fannkuch-redux          +0.0%     -1.4%
           fish           0.0%     -1.5%
         gamteb          -1.4%     -0.3%
        mkhprog          +1.4%     +0.8%
     multiplier          +0.0%     -1.9%
            pic          -0.6%     -0.1%
        reptile         -20.9%    -17.8%
      wave4main          +4.8%     +0.4%
           x2n1        -100.0%     -7.6%
--------------------------------------------------------------------------------
            Min         -95.0%    -17.8%
            Max          +5.8%     +1.2%
 Geometric Mean          -2.9%     -0.4%
```
The huge wins in x2n1 (loopy list) and reptile (see #19970) are due to
refraining from unboxing (:). Other benchmarks like digits-of-e2 or wave4main
regress because of that. Ultimately there are no great improvements due to
Nested CPR alone, but at least it's a win.
Binary sizes decrease by 0.6%.

There are a significant number of metric decreases. The most notable ones (>1%):
```
       ManyAlternatives(normal) ghc/alloc   771656002.7   762187472.0  -1.2%
       ManyConstructors(normal) ghc/alloc  4191073418.7  4114369216.0  -1.8%
      MultiLayerModules(normal) ghc/alloc  3095678333.3  3128720704.0  +1.1%
              PmSeriesG(normal) ghc/alloc    50096429.3    51495664.0  +2.8%
              PmSeriesS(normal) ghc/alloc    63512989.3    64681600.0  +1.8%
              PmSeriesV(normal) ghc/alloc    62575424.0    63767208.0  +1.9%
                 T10547(normal) ghc/alloc    29347469.3    29944240.0  +2.0%
                T11303b(normal) ghc/alloc    46018752.0    47367576.0  +2.9%
                 T12150(optasm) ghc/alloc    81660890.7    82547696.0  +1.1%
                 T12234(optasm) ghc/alloc    59451253.3    60357952.0  +1.5%
                 T12545(normal) ghc/alloc  1705216250.7  1751278952.0  +2.7%
                 T12707(normal) ghc/alloc   981000472.0   968489800.0  -1.3% GOOD
                 T13056(optasm) ghc/alloc   389322664.0   372495160.0  -4.3% GOOD
                 T13253(normal) ghc/alloc   337174229.3   341954576.0  +1.4%
                 T13701(normal) ghc/alloc  2381455173.3  2439790328.0  +2.4%  BAD
                   T14052(ghci) ghc/alloc  2162530642.7  2139108784.0  -1.1%
                 T14683(normal) ghc/alloc  3049744728.0  2977535064.0  -2.4% GOOD
                 T14697(normal) ghc/alloc   362980213.3   369304512.0  +1.7%
                 T15164(normal) ghc/alloc  1323102752.0  1307480600.0  -1.2%
                 T15304(normal) ghc/alloc  1304607429.3  1291024568.0  -1.0%
                 T16190(normal) ghc/alloc   281450410.7   284878048.0  +1.2%
                 T16577(normal) ghc/alloc  7984960789.3  7811668768.0  -2.2% GOOD
                 T17516(normal) ghc/alloc  1171051192.0  1153649664.0  -1.5%
                 T17836(normal) ghc/alloc  1115569746.7  1098197592.0  -1.6%
                T17836b(normal) ghc/alloc    54322597.3    55518216.0  +2.2%
                 T17977(normal) ghc/alloc    47071754.7    48403408.0  +2.8%
                T17977b(normal) ghc/alloc    42579133.3    43977392.0  +3.3%
                 T18923(normal) ghc/alloc    71764237.3    72566240.0  +1.1%
                  T1969(normal) ghc/alloc   784821002.7   773971776.0  -1.4% GOOD
                  T3294(normal) ghc/alloc  1634913973.3  1614323584.0  -1.3% GOOD
                  T4801(normal) ghc/alloc   295619648.0   292776440.0  -1.0%
                T5321FD(normal) ghc/alloc   278827858.7   276067280.0  -1.0%
                  T5631(normal) ghc/alloc   586618202.7   577579960.0  -1.5%
                  T5642(normal) ghc/alloc   494923048.0   487927208.0  -1.4%
                  T5837(normal) ghc/alloc    37758061.3    39261608.0  +4.0%
                  T9020(optasm) ghc/alloc   257362077.3   254672416.0  -1.0%
                  T9198(normal) ghc/alloc    49313365.3    50603936.0  +2.6%  BAD
                  T9233(normal) ghc/alloc   704944258.7   685692712.0  -2.7% GOOD
                  T9630(normal) ghc/alloc  1476621560.0  1455192784.0  -1.5%
                  T9675(optasm) ghc/alloc   443183173.3   433859696.0  -2.1% GOOD
                 T9872a(normal) ghc/alloc  1720926653.3  1693190072.0  -1.6% GOOD
                 T9872b(normal) ghc/alloc  2185618061.3  2162277568.0  -1.1% GOOD
                 T9872c(normal) ghc/alloc  1765842405.3  1733618088.0  -1.8% GOOD
   TcPlugin_RewritePerf(normal) ghc/alloc  2388882730.7  2365504696.0  -1.0%
                  WWRec(normal) ghc/alloc   607073186.7   597512216.0  -1.6%

                  T9203(normal) run/alloc   107284064.0   102881832.0  -4.1%
          haddock.Cabal(normal) run/alloc 24025329589.3 23768382560.0  -1.1%
           haddock.base(normal) run/alloc 25660521653.3 25370321824.0  -1.1%
       haddock.compiler(normal) run/alloc 74064171706.7 73358712280.0  -1.0%
```
The biggest exception to the rule is T13701 which seems to fluctuate as usual
(not unlike T12545). T14697 has a similar quality, being a generated
multi-module test. T5837 is small enough that it similarly doesn't measure
anything significant besides module loading overhead.
T13253 simply does one additional round of Simplification due to Nested CPR.

There are also some apparent regressions in T9198, T12234 and PmSeriesG that we
(@mpickering and I) were simply unable to reproduce locally. @mpickering tried
to run the CI script in a local Docker container and actually found that T9198
and PmSeriesG *improved*. In MRs that were rebased on top this one, like !4229,
I did not experience such increases. Let's not get hung up on these regression
tests, they were meant to test for asymptotic regressions.

The build-cabal test improves by 1.2% in -O0.

Metric Increase:
    T10421
    T12234
    T12545
    T13035
    T13056
    T13701
    T14697
    T18923
    T5837
    T9198
Metric Decrease:
    ManyConstructors
    T12545
    T12707
    T13056
    T14683
    T16577
    T18223
    T1969
    T3294
    T9203
    T9233
    T9675
    T9872a
    T9872b
    T9872c
    T9961
    TcPlugin_RewritePerf

c261f220

Jun 05, 2021

Avoid useless w/w split, take 2 · ea9a4ef6

Simon Peyton Jones authored 3 years ago and

Marge Bot committed 3 years ago

This commit:

    commit c6faa42b
    Author: Simon Peyton Jones <simonpj@microsoft.com>
    Date:   Mon Mar 9 10:20:42 2020 +0000

    Avoid useless w/w split

    This patch is just a tidy-up for the post-strictness-analysis
    worker wrapper split.  Consider

       f x = x

    Strictnesss analysis does not lead to a w/w split, so the
    obvious thing is to leave it 100% alone.  But actually, because
    the RHS is small, we ended up adding a StableUnfolding for it.

    There is some reason to do this if we choose /not/ do to w/w
    on the grounds that the function is small.  See
    Note [Don't w/w inline small non-loop-breaker things]

    But there is no reason if we would not have done w/w anyway.

    This patch just moves the conditional to later.  Easy.

turns out to have a bug in it.  Instead of /moving/ the conditional,
I /duplicated/ it.  Then in a subsequent unrelated tidy-up
(087ac4eb) I removed the second (redundant) test!

This patch does what I originally intended.

There is also a small refactoring in GHC.Core.Unfold, to make the
code clearer, but with no change in behaviour.

It does, however, have a generally good effect on compile times,
because we aren't dealing with so many silly stable unfoldings.
Here are the non-zero changes:

Metrics: compile_time/bytes allocated
-------------------------------------
                                         Baseline
                     Test    Metric         value     New value Change
---------------------------------------------------------------------------
 ManyAlternatives(normal) ghc/alloc   791969344.0   792665048.0  +0.1%
 ManyConstructors(normal) ghc/alloc  4351126824.0  4358303528.0  +0.2%
        PmSeriesG(normal) ghc/alloc    50362552.0    50482208.0  +0.2%
        PmSeriesS(normal) ghc/alloc    63733024.0    63619912.0  -0.2%
           T10421(normal) ghc/alloc   121224624.0   119695448.0  -1.3% GOOD
          T10421a(normal) ghc/alloc    85256392.0    83714224.0  -1.8%
           T10547(normal) ghc/alloc    29253072.0    29258256.0  +0.0%
           T10858(normal) ghc/alloc   189343152.0   187972328.0  -0.7%
           T11195(normal) ghc/alloc   281208248.0   279727584.0  -0.5%
           T11276(normal) ghc/alloc   141966952.0   142046224.0  +0.1%
          T11303b(normal) ghc/alloc    46228360.0    46259024.0  +0.1%
           T11545(normal) ghc/alloc  2663128768.0  2667412656.0  +0.2%
           T11822(normal) ghc/alloc   138686944.0   138760176.0  +0.1%
           T12227(normal) ghc/alloc   482836000.0   475421056.0  -1.5% GOOD
           T12234(optasm) ghc/alloc    60710520.0    60781808.0  +0.1%
           T12425(optasm) ghc/alloc   104089000.0   104022424.0  -0.1%
           T12545(normal) ghc/alloc  1711759416.0  1705711528.0  -0.4%
           T12707(normal) ghc/alloc   991541120.0   991921776.0  +0.0%
           T13035(normal) ghc/alloc   108199872.0   108370704.0  +0.2%
           T13056(optasm) ghc/alloc   414642544.0   412580384.0  -0.5%
           T13253(normal) ghc/alloc   361701272.0   355838624.0  -1.6%
       T13253-spj(normal) ghc/alloc   157710168.0   157397768.0  -0.2%
           T13379(normal) ghc/alloc   370984400.0   371345888.0  +0.1%
           T13701(normal) ghc/alloc  2439764144.0  2441351984.0  +0.1%
             T14052(ghci) ghc/alloc  2154090896.0  2156671400.0  +0.1%
           T15164(normal) ghc/alloc  1478517688.0  1440317696.0  -2.6% GOOD
           T15630(normal) ghc/alloc   178053912.0   172489808.0  -3.1%
           T16577(normal) ghc/alloc  7859948896.0  7854524080.0  -0.1%
           T17516(normal) ghc/alloc  1271520128.0  1202096488.0  -5.5% GOOD
           T17836(normal) ghc/alloc  1123320632.0  1123922480.0  +0.1%
          T17836b(normal) ghc/alloc    54526280.0    54576776.0  +0.1%
          T17977b(normal) ghc/alloc    42706752.0    42730544.0  +0.1%
           T18140(normal) ghc/alloc   108834568.0   108693816.0  -0.1%
           T18223(normal) ghc/alloc  5539629264.0  5579500872.0  +0.7%
           T18304(normal) ghc/alloc    97589720.0    97196944.0  -0.4%
           T18478(normal) ghc/alloc   770755472.0   771232888.0  +0.1%
          T18698a(normal) ghc/alloc   408691160.0   374364992.0  -8.4% GOOD
          T18698b(normal) ghc/alloc   492419768.0   458809408.0  -6.8% GOOD
           T18923(normal) ghc/alloc    72177032.0    71368824.0  -1.1%
            T1969(normal) ghc/alloc   803523496.0   804655112.0  +0.1%
            T3064(normal) ghc/alloc   198411784.0   198608512.0  +0.1%
            T4801(normal) ghc/alloc   312416688.0   312874976.0  +0.1%
         T5321Fun(normal) ghc/alloc   325230680.0   325474448.0  +0.1%
            T5631(normal) ghc/alloc   592064448.0   593518968.0  +0.2%
            T5837(normal) ghc/alloc    37691496.0    37710904.0  +0.1%
             T783(normal) ghc/alloc   404629536.0   405064432.0  +0.1%
            T9020(optasm) ghc/alloc   266004608.0   266375592.0  +0.1%
            T9198(normal) ghc/alloc    49221336.0    49268648.0  +0.1%
            T9233(normal) ghc/alloc   913464984.0   742680256.0 -18.7% GOOD
            T9675(optasm) ghc/alloc   552296608.0   466322000.0 -15.6% GOOD
           T9872a(normal) ghc/alloc  1789910616.0  1793924472.0  +0.2%
           T9872b(normal) ghc/alloc  2315141376.0  2310338056.0  -0.2%
           T9872c(normal) ghc/alloc  1840422424.0  1841567224.0  +0.1%
           T9872d(normal) ghc/alloc   556713248.0   556838432.0  +0.0%
            T9961(normal) ghc/alloc   383809160.0   384601600.0  +0.2%
            WWRec(normal) ghc/alloc   773751272.0   753949608.0  -2.6% GOOD

Residency goes down too:

Metrics: compile_time/max_bytes_used
------------------------------------
                             Baseline
           Test  Metric         value     New value Change
-----------------------------------------------------------
 T10370(optasm) ghc/max    42058448.0    39481672.0  -6.1%
 T11545(normal) ghc/max    43641392.0    43634752.0  -0.0%
 T15304(normal) ghc/max    29895824.0    29439032.0  -1.5%
 T15630(normal) ghc/max     8822568.0     8772328.0  -0.6%
T18698a(normal) ghc/max    13882536.0    13787112.0  -0.7%
T18698b(normal) ghc/max    14714112.0    13836408.0  -6.0%
  T1969(normal) ghc/max    24724128.0    24733496.0  +0.0%
  T3064(normal) ghc/max    14041152.0    14034768.0  -0.0%
  T3294(normal) ghc/max    32769248.0    32760312.0  -0.0%
  T9630(normal) ghc/max    41605120.0    41572184.0  -0.1%
  T9675(optasm) ghc/max    18652296.0    17253480.0  -7.5%

Metric Decrease:
    T10421
    T12227
    T15164
    T17516
    T18698a
    T18698b
    T9233
    T9675
    WWRec

Metric Increase:
    T12545

ea9a4ef6

Mar 03, 2021

DmdAnal: Better syntax for demand signatures (#19016) · 3630b9ba

Sebastian Graf authored 4 years ago and

Marge Bot committed 4 years ago

The update of the Outputable instance resulted in a slew of
documentation changes within Notes that used the old syntax.
The most important doc changes are to `Note [Demand notation]`
and the user's guide.

Fixes #19016.

3630b9ba

Dec 12, 2020

Demand: Simplify `CU(U)` to `U` (#19005 ) · 3aae036e

Sebastian Graf authored 4 years ago and

Marge Bot committed 4 years ago

Both sub-demands encode the same information.
This is a trivial change and already affects a few regression tests
(e.g. `T5075`), so no separate regression test is necessary.

3aae036e

Nov 20, 2020

Demand: Interleave usage and strictness demands (#18903) · 0aec78b6

Sebastian Graf authored 4 years ago and

Marge Bot committed 4 years ago

As outlined in #18903, interleaving usage and strictness demands not
only means a more compact demand representation, but also allows us to
express demands that we weren't easily able to express before.

Call demands are *relative* in the sense that a call demand `Cn(cd)`
on `g` says "`g` is called `n` times. *Whenever `g` is called*, the
result is used according to `cd`". Example from #18903:

```hs
h :: Int -> Int
h m =
  let g :: Int -> (Int,Int)
      g 1 = (m, 0)
      g n = (2 * n, 2 `div` n)
      {-# NOINLINE g #-}
  in case m of
    1 -> 0
    2 -> snd (g m)
    _ -> uncurry (+) (g m)
```

Without the interleaved representation, we would just get `L` for the
strictness demand on `g`. Now we are able to express that whenever
`g` is called, its second component is used strictly in denoting `g`
by `1C1(P(1P(U),SP(U)))`. This would allow Nested CPR to unbox the
division, for example.

Fixes #18903.
While fixing regressions, I also discovered and fixed #18957.

Metric Decrease:
    T13253-spj

0aec78b6

Nov 13, 2020

Arity: Emit "Exciting arity" warning only after second iteration (#18937) · 197d59fa
Sebastian Graf authored 4 years ago and Marge Bot committed 4 years ago
```
See Note [Exciting arity] why we emit the warning at all and why we only
do after the second iteration now.

Fixes #18937.
```
197d59fa

Arity: Rework `ArityType` to fix monotonicity (#18870 ) · 63fa3997

Sebastian Graf authored 4 years ago and

Marge Bot committed 4 years ago

As we found out in #18870, `andArityType` is not monotone, with
potentially severe consequences for termination of fixed-point
iteration. That showed in an abundance of "Exciting arity" DEBUG
messages that are emitted whenever we do more than one step in
fixed-point iteration.

The solution necessitates also recording `OneShotInfo` info for
`ABot` arity type. Thus we get the following definition for `ArityType`:

```
data ArityType = AT [OneShotInfo] Divergence
```

The majority of changes in this patch are the result of refactoring use
sites of `ArityType` to match the new definition.

The regression test `T18870` asserts that we indeed don't emit any DEBUG
output anymore for a function where we previously would have.
Similarly, there's a regression test `T18937` for #18937, which we
expect to be broken for now.

Fixes #18870.

63fa3997

Oct 18, 2020

Testsuite: Add dead arity analysis tests · 451455fd
Sebastian Graf authored 4 years ago and Marge Bot committed 4 years ago
```
We didn't seem to test these old tests at all, judging from their
expected output.
```
451455fd

Arity: Record arity types for non-recursive lets · 6b3eb06a

Sebastian Graf authored 4 years ago and

Marge Bot committed 4 years ago

In #18793, we saw a compelling example which requires us to look at
non-recursive let-bindings during arity analysis and unleash their arity
types at use sites.

After the refactoring in the previous patch, the needed change is quite
simple and very local to `arityType`'s defn for non-recurisve `Let`.

Apart from that, we had to get rid of the second item of
`Note [Dealing with bottoms]`, which was entirely a safety measure and
hindered optimistic fixed-point iteration.

Fixes #18793.

The following metric increases are all caused by this commit and a
result of the fact that we just do more work now:

Metric Increase:
    T3294
    T12545
    T12707

6b3eb06a

Feb 23, 2016
- Testsuite: delete Windows line endings [skip ci] (#11631) · d5e8b394
  Thomas Miedema authored 9 years ago
  
  d5e8b394
Oct 07, 2014

Delete __GLASGOW_HASKELL__ ifdefs for stage0 < 7.6. · b3e5a7b5

Thomas Miedema authored 10 years ago

Summary:
My understanding is that ghc 7.10 should be buildable with the last 3 versions
of ghc, i.e 7.6, 7.8 and 7.10 itself.

Test Plan: x

Reviewers: austin

Reviewed By: austin

Subscribers: hvr, simonmar, ezyang, carter, thomie

Differential Revision: https://phabricator.haskell.org/D254

b3e5a7b5

Jul 20, 2011
- Move tests from tests/ghc-regress/* to just tests/* · 16514f27
  David Terei authored 13 years ago
  
  16514f27