- Jan 08, 2024
-
-
A separate commit so that the rename is obvious to Git(Lab)
-
- Aug 09, 2023
-
-
A minor improvement to pretty-printing
-
- Apr 26, 2023
-
-
In #23208 we observed that the demand signature of a binder occuring in a RULE wasn't unleashed, leading to a transitively used binder being discarded as absent. The solution was to use the same code path that we already use for handling exported bindings. See the changes to `Note [Absence analysis for stable unfoldings and RULES]` for more details. I took the chance to factor out the old notion of a `PlusDmdArg` (a pair of a `VarEnv Demand` and a `Divergence`) into `DmdEnv`, which fits nicely into our existing framework. As a result, I had to touch quite a few places in the code. This refactoring exposed a few small bugs around correct handling of bottoming demand environments. As a result, some strictness signatures now mention uniques that weren't there before which caused test output changes to T13143, T19969 and T22112. But these tests compared whole -ddump-simpl listings which is a very fragile thing to begin with. I changed what exactly they test for based on the symptoms in the corresponding issues. There is a single regression in T18894 because we are more conservative around stable unfoldings now. Unfortunately it is not easily fixed; let's wait until there is a concrete motivation before invest more time. Fixes #23208.
-
- Jan 11, 2023
-
-
As #22725 shows, in worker/wrapper we must add the void argument /last/, not first. See GHC.Core.Opt.WorkWrap.Utils Note [Worker/wrapper needs to add void arg last]. That led me to to study GHC.Core.Opt.SpecConstr Note [SpecConstr needs to add void args first] which suggests the opposite! And indeed I think it's the other way round for SpecConstr -- or more precisely the void arg must precede the "extra_bndrs". That led me to some refactoring of GHC.Core.Opt.SpecConstr.calcSpecInfo.
-
- Sep 28, 2022
-
-
I finally got tired of the way that IfaceUnfolding reflected a previous structure of unfoldings, not the current one. This MR refactors UnfoldingSource and IfaceUnfolding to be simpler and more consistent. It's largely just a refactor, but in UnfoldingSource (which moves to GHC.Types.Basic, since it is now used in IfaceSyn too), I distinguish between /user-specified/ and /system-generated/ stable unfoldings. data UnfoldingSource = VanillaSrc | StableUserSrc -- From a user-specified pragma | StableSystemSrc -- From a system-generated unfolding | CompulsorySrc This has a minor effect in CSE (see the use of isisStableUserUnfolding in GHC.Core.Opt.CSE), which I tripped over when working on specialisation, but it seems like a Good Thing to know anyway.
-
- Jun 27, 2022
-
-
We used to put OtherCon unfoldings on lambda binders of workers and sometimes also join points/specializations with with the assumption that since the wrapper would force these arguments once we execute the RHS they would indeed be in WHNF. This was wrong for reasons detailed in #21472. So now we purge evaluated unfoldings from *all* lambda binders. This fixes #21472, but at the cost of sometimes not using as efficient a calling convention. It can also change inlining behaviour as some occurances will no longer look like value arguments when they did before. As consequence we also change how we compute CBV information for arguments slightly. We now *always* determine the CBV convention for arguments during tidy. Earlier in the pipeline we merely mark functions as candidates for having their arguments treated as CBV. As before the process is described in the relevant notes: Note [CBV Function Ids] Note [Attaching CBV Marks to ids] Note [Never put `OtherCon` unfoldigns on lambda binders] ------------------------- Metric Decrease: T12425 T13035 T18223 T18223 T18923 MultiLayerModulesTH_OneShot Metric Increase: WWRec -------------------------
-
- May 05, 2022
-
-
The main fix is that in addVoidWorkerArg we now add the argument to the front. This fixes #21448. ------------------------- Metric Decrease: T16875 -------------------------
-
- Mar 16, 2022
-
-
Previously, we let `Unboxed` win in `lubBoxity`, which is unsoundly optimistic in terms ob Boxity analysis. "Unsoundly" in the sense that we sometimes unbox parameters that we better shouldn't unbox. Examples are #18907 and T19871.absent. Until now, we thought that this hack pulled its weight becuase it worked around some shortcomings of the phase separation between Boxity analysis and CPR analysis. But it is a gross hack which caused regressions itself that needed all kinds of fixes and workarounds. See for example #20767. It became impossible to work with in !7599, so I want to remove it. For example, at the moment, `lubDmd B dmd` will not unbox `dmd`, but `lubDmd A dmd` will. Given that `B` is supposed to be the bottom element of the lattice, it's hardly justifiable to get a better demand when `lub`bing with `A`. The consequence of letting `Boxed` win in `lubBoxity` is that we *would* regress #2387, #16040 and parts of #5075 and T19871.sumIO, until Boxity and CPR are able to communicate better. Fortunately, that is not the case since I could tweak the other source of optimism in Boxity analysis that is described in `Note [Unboxed demand on function bodies returning small products]` so that we *recursively* assume unboxed demands on function bodies returning small products. See the updated Note. `Note [Boxity for bottoming functions]` describes why we need bottoming functions to have signatures that say that they deeply unbox their arguments. In so doing, I had to tweak `finaliseArgBoxities` so that it will never unbox recursive data constructors. This is in line with our handling of them in CPR. I updated `Note [Which types are unboxed?]` to reflect that. In turn we fix #21119, #20767, #18907, T19871.absent and get a much simpler implementation (at least to think about). We can also drop the very ad-hoc definition of `deferAfterPreciseException` and its Note in favor of the simple, intuitive definition we used to have. Metric Decrease: T16875 T18223 T18698a T18698b hard_hole_fits Metric Increase: LargeRecord MultiComponentModulesRecomp T15703 T8095 T9872d Out of all the regresions, only the one in T9872d doesn't vanish in a perf build, where the compiler is bootstrapped with -O2 and thus SpecConstr. Reason for regressions: * T9872d is due to `ty_co_subst` taking its `LiftingContext` boxed. That is because the context is passed to a function argument, for example in `liftCoSubstTyVarBndrUsing`. * In T15703, LargeRecord and T8095, we get a bit more allocations in `expand_syn` and `piResultTys`, because a `TCvSubst` isn't unboxed. In both cases that guards against reboxing in some code paths. * The same is true for MultiComponentModulesRecomp, where we get less unboxing in `GHC.Unit.Finder.$wfindInstalledHomeModule`. In a perf build, allocations actually *improve* by over 4%! Results on NoFib: -------------------------------------------------------------------------------- Program Allocs Instrs -------------------------------------------------------------------------------- awards -0.4% +0.3% cacheprof -0.3% +2.4% fft -1.5% -5.1% fibheaps +1.2% +0.8% fluid -0.3% -0.1% ida +0.4% +0.9% k-nucleotide +0.4% -0.1% last-piece +10.5% +13.9% lift -4.4% +3.5% mandel2 -99.7% -99.8% mate -0.4% +3.6% parser -1.0% +0.1% puzzle -11.6% +6.5% reverse-complem -3.0% +2.0% scs -0.5% +0.1% sphere -0.4% -0.2% wave4main -8.2% -0.3% -------------------------------------------------------------------------------- Summary excludes mandel2 because of excessive bias Min -11.6% -5.1% Max +10.5% +13.9% Geometric Mean -0.2% +0.3% -------------------------------------------------------------------------------- Not bad for a bug fix. The regression in `last-piece` could become a win if SpecConstr would work on non-recursive functions. The regression in `fibheaps` is due to `Note [Reboxed crud for bottoming calls]`, e.g., #21128.
-
- Feb 12, 2022
-
-
This does three major things: * Enforce the invariant that all strict fields must contain tagged pointers. * Try to predict the tag on bindings in order to omit tag checks. * Allows functions to pass arguments unlifted (call-by-value). The former is "simply" achieved by wrapping any constructor allocations with a case which will evaluate the respective strict bindings. The prediction is done by a new data flow analysis based on the STG representation of a program. This also helps us to avoid generating redudant cases for the above invariant. StrictWorkers are created by W/W directly and SpecConstr indirectly. See the Note [Strict Worker Ids] Other minor changes: * Add StgUtil module containing a few functions needed by, but not specific to the tag analysis. ------------------------- Metric Decrease: T12545 T18698b T18140 T18923 LargeRecord Metric Increase: LargeRecord ManyAlternatives ManyConstructors T10421 T12425 T12707 T13035 T13056 T13253 T13253-spj T13379 T15164 T18282 T18304 T18698a T1969 T20049 T3294 T4801 T5321FD T5321Fun T783 T9233 T9675 T9961 T19695 WWRec -------------------------
-
- Oct 24, 2021
-
-
This patch fixes some abundant reboxing of `DynFlags` in `GHC.HsToCore.Match.Literal.warnAboutOverflowedLit` (which was the topic of #19407) by introducing a Boxity analysis to GHC, done as part of demand analysis. This allows to accurately capture ad-hoc unboxing decisions previously made in worker/wrapper in demand analysis now, where the boxity info can propagate through demand signatures. See the new `Note [Boxity analysis]`. The actual fix for #19407 is described in `Note [No lazy, Unboxed demand in demand signature]`, but `Note [Finalising boxity for demand signature]` is probably a better entry-point. To support the fix for #19407, I had to change (what was) `Note [Add demands for strict constructors]` a bit (now `Note [Unboxing evaluated arguments]`). In particular, we now take care of it in `finaliseBoxity` (which is only called from demand analaysis) instead of `wantToUnboxArg`. I also had to resurrect `Note [Product demands for function body]` and rename it to `Note [Unboxed demand on function bodies returning small products]` to avoid huge regressions in `join004` and `join007`, thereby fixing #4267 again. See the updated Note for details. A nice side-effect is that the worker/wrapper transformation no longer needs to look at strictness info and other bits such as `InsideInlineableFun` flags (needed for `Note [Do not unbox class dictionaries]`) at all. It simply collects boxity info from argument demands and interprets them with a severely simplified `wantToUnboxArg`. All the smartness is in `finaliseBoxity`, which could be moved to DmdAnal completely, if it wasn't for the call to `dubiousDataConInstArgTys` which would be awkward to export. I spent some time figuring out the reason for why `T16197` failed prior to my amendments to `Note [Unboxing evaluated arguments]`. After having it figured out, I minimised it a bit and added `T16197b`, which simply compares computed strictness signatures and thus should be far simpler to eyeball. The 12% ghc/alloc regression in T11545 is because of the additional `Boxity` field in `Poly` and `Prod` that results in more allocation during `lubSubDmd` and `plusSubDmd`. I made sure in the ticky profiles that the number of calls to those functions stayed the same. We can bear such an increase here, as we recently improved it by -68% (in b760c1f7). T18698* regress slightly because there is more unboxing of dictionaries happening and that causes Lint (mostly) to allocate more. Fixes #19871, #19407, #4267, #16859, #18907 and #13331. Metric Increase: T11545 T18698a T18698b Metric Decrease: T12425 T16577 T18223 T18282 T4267 T9961
-
- Sep 28, 2021
-
-
This flag is used to remove the output of core stats per binding in Core dumps.
-
- Sep 08, 2021
-
-
- Jun 27, 2021
-
-
`mkWWargs`'s job was pushing casts inwards and doing eta expansion to match the arity with the number of argument demands we w/w for. Nowadays, we use the Simplifier to eta expand to arity. In fact, in recent years we have even seen the eta expansion done by w/w as harmful, see Note [Don't eta expand in w/w]. If a function hasn't enough manifest lambdas, don't w/w it! What purpose does `mkWWargs` serve in this world? Not a great one, it turns out! I could remove it by pulling some important bits, notably Note [Freshen WW arguments] and Note [Join points and beta-redexes]. Result: We reuse the freshened binder names of the wrapper in the worker where possible (see testuite changes), much nicer! In order to avoid scoping errors due to lambda-bound unfoldings in worker arguments, we zap those unfoldings now. In doing so, we fix #19766. Fixes #19874.
-
- Apr 20, 2021
-
-
Sebastian Graf authored
In another small step towards bringing a manageable variant of Nested CPR into GHC, this patch refactors worker/wrapper to be able to exploit Nested CPR signatures. See the new Note [Worker/wrapper for CPR]. The nested code path is currently not triggered, though, because all signatures that we annotate are still flat. So purely a refactoring. I am very confident that it works, because I ripped it off !1866 95% unchanged. A few test case outputs changed, but only it's auxiliary names only. I also added test cases for #18109 and #18401. There's a 2.6% metric increase in T13056 after a rebase, caused by an additional Simplifier run. It appears b1d0b9c saw a similar additional iteration. I think it's just a fluke. Metric Increase: T13056
-
- Mar 20, 2021
-
-
While fixing #19232, it became increasingly clear that the vestigial hack described in `Note [Optimistic field binder CPR]` is complicated and causes reboxing. Rather than make the hack worse, this patch gets rid of it completely in favor of giving deeply unboxed parameters the Nested CPR property. Example: ```hs f :: (Int, Int) -> Int f p = case p of (x, y) | x == y = x | otherwise = y ``` Based on `p`'s `idDemandInfo` `1P(1P(L),1P(L))`, we can see that both fields of `p` will be available unboxed. As a result, we give `p` the nested CPR property `1(1,1)`. When analysing the `case`, the field CPRs are transferred to the binders `x` and `y`, respectively, so that we ultimately give `f` the CPR property. I took the liberty to do a bit of refactoring: - I renamed `CprResult` ("Constructed product result result") to plain `Cpr`. - I Introduced `FlatConCpr` in addition to (now nested) `ConCpr` and and according pattern synonym that rewrites flat `ConCpr` to `FlatConCpr`s, purely for compiler perf reasons. - Similarly for performance reasons, we now store binders with a Top signature in a separate `IntSet`, see `Note [Efficient Top sigs in SigEnv]`. - I moved a bit of stuff around in `GHC.Core.Opt.WorkWrap.Utils` and introduced `UnboxingDecision` to replace the `Maybe DataConPatContext` type we used to return from `wantToUnbox`. - Since the `Outputable Cpr` instance changed anyway, I removed the leading `m` which we used to emit for `ConCpr`. It's just noise, especially now that we may output nested CPRs. Fixes #19398.
-
- Mar 03, 2021
-
-
The update of the Outputable instance resulted in a slew of documentation changes within Notes that used the old syntax. The most important doc changes are to `Note [Demand notation]` and the user's guide. Fixes #19016.
-
- Dec 14, 2020
-
-
Ben Gamari authored
During the compilation of programs GHC very frequently deals with the `Type` type, which is a synonym of `TYPE 'LiftedRep`. This patch teaches GHC to avoid expanding the `Type` synonym (and other nullary type synonyms) during type comparisons, saving a good amount of work. This optimisation is described in `Note [Comparing nullary type synonyms]`. To maximize the impact of this optimisation, we introduce a few special-cases to reduce `TYPE 'LiftedRep` to `Type`. See `Note [Prefer Type over TYPE 'LiftedPtrRep]`. Closes #17958. Metric Decrease: T18698b T1969 T12227 T12545 T12707 T14683 T3064 T5631 T5642 T9020 T9630 T9872a T13035 haddock.Cabal haddock.base
-
Ben Gamari authored
This was inadvertently merged. This reverts commit 7e9debd4.
-
Ben Gamari authored
During the compilation of programs GHC very frequently deals with the `Type` type, which is a synonym of `TYPE 'LiftedRep`. This patch teaches GHC to avoid expanding the `Type` synonym (and other nullary type synonyms) during type comparisons, saving a good amount of work. This optimisation is described in `Note [Comparing nullary type synonyms]`. To maximize the impact of this optimisation, we introduce a few special-cases to reduce `TYPE 'LiftedRep` to `Type`. See `Note [Prefer Type over TYPE 'LiftedPtrRep]`. Closes #17958. Metric Decrease: T18698b T1969 T12227 T12545 T12707 T14683 T3064 T5631 T5642 T9020 T9630 T9872a T13035 haddock.Cabal haddock.base
-
- Nov 20, 2020
-
-
As outlined in #18903, interleaving usage and strictness demands not only means a more compact demand representation, but also allows us to express demands that we weren't easily able to express before. Call demands are *relative* in the sense that a call demand `Cn(cd)` on `g` says "`g` is called `n` times. *Whenever `g` is called*, the result is used according to `cd`". Example from #18903: ```hs h :: Int -> Int h m = let g :: Int -> (Int,Int) g 1 = (m, 0) g n = (2 * n, 2 `div` n) {-# NOINLINE g #-} in case m of 1 -> 0 2 -> snd (g m) _ -> uncurry (+) (g m) ``` Without the interleaved representation, we would just get `L` for the strictness demand on `g`. Now we are able to express that whenever `g` is called, its second component is used strictly in denoting `g` by `1C1(P(1P(U),SP(U)))`. This would allow Nested CPR to unbox the division, for example. Fixes #18903. While fixing regressions, I also discovered and fixed #18957. Metric Decrease: T13253-spj
-
- Oct 14, 2020
-
-
There are two signficant changes here: * Ticket #18815 showed that we were missing some opportunities for preInlineUnconditionally. The one-line fix is in the code for GHC.Core.Opt.Simplify.Utils.preInlineUnconditionally, which now switches off only for INLINE pragmas. I expanded Note [Stable unfoldings and preInlineUnconditionally] to explain. * When doing this I discovered a way in which preInlineUnconditionally was occasionally /too/ eager. It's all explained in Note [Occurrences in stable unfoldings] in GHC.Core.Opt.OccurAnal, and the one-line change adding markAllMany to occAnalUnfolding. I also got confused about what NoUserInline meant, so I've renamed it to NoUserInlinePrag, and changed its pretty-printing slightly. That led to soem error messate wibbling, and touches quite a few files, but there is no change in functionality. I did a nofib run. As expected, no significant changes. Program Size Allocs ---------------------------------------- sphere -0.0% -0.4% ---------------------------------------- Min -0.0% -0.4% Max -0.0% +0.0% Geometric Mean -0.0% -0.0% I'm allowing a max-residency increase for T10370, which seems very irreproducible. (See comments on !4241.) There is always sampling error for max-residency measurements; and in any case the change shows up on some platforms but not others. Metric Increase: T10370
-
- Jul 28, 2020
-
-
Specifically: #13253 exponential inlining #10421 ditto #18140 strict constructors #18282 another nested-function call case This patch makes one really significant changes: change the way that mkDupableCont handles StrictArg. The details are explained in GHC.Core.Opt.Simplify Note [Duplicating StrictArg]. Specific changes * In mkDupableCont, when making auxiliary bindings for the other arguments of a call, add extra plumbing so that we don't forget the demand on them. Otherwise we haev to wait for another round of strictness analysis. But actually all the info is to hand. This change affects: - Make the strictness list in ArgInfo be [Demand] instead of [Bool], and rename it to ai_dmds. - Add as_dmd to ValArg - Simplify.makeTrivial takes a Demand - mkDupableContWithDmds takes a [Demand] There are a number of other small changes 1. For Ids that are used at most once in each branch of a case, make the occurrence analyser record the total number of syntactic occurrences. Previously we recorded just OneBranch or MultipleBranches. I thought this was going to be useful, but I ended up barely using it; see Note [Note [Suppress exponential blowup] in GHC.Core.Opt.Simplify.Utils Actual changes: * See the occ_n_br field of OneOcc. * postInlineUnconditionally 2. I found a small perf buglet in SetLevels; see the new function GHC.Core.Opt.SetLevels.hasFreeJoin 3. Remove the sc_cci field of StrictArg. I found I could get its information from the sc_fun field instead. Less to get wrong! 4. In ArgInfo, arrange that ai_dmds and ai_discs have a simpler invariant: they line up with the value arguments beyond ai_args This allowed a bit of nice refactoring; see isStrictArgInfo, lazyArgcontext, strictArgContext There is virtually no difference in nofib. (The runtime numbers are bogus -- I tried a few manually.) Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- fft +0.0% -2.0% -48.3% -49.4% 0.0% multiplier +0.0% -2.2% -50.3% -50.9% 0.0% -------------------------------------------------------------------------------- Min -0.4% -2.2% -59.2% -60.4% 0.0% Max +0.0% +0.1% +3.3% +4.9% 0.0% Geometric Mean +0.0% -0.0% -33.2% -34.3% -0.0% Test T18282 is an existing example of these deeply-nested strict calls. We get a big decrease in compile time (-85%) because so much less inlining takes place. Metric Decrease: T18282
-
- Jul 23, 2020
-
-
There's one backwards compatibility issue: GHC.Prim no longer exports Void#, we now manually re-export it from GHC.Exts.
-
- Jul 13, 2020
-
-
Simon Peyton Jones authored
Ticket #18282 showed that the result discount given by conSize was massively too large. This patch reduces that discount to a constant 10, which just balances the cost of the constructor application itself. Note [Constructor size and result discount] elaborates, as does the ticket #18282. Reducing result discount reduces inlining, which affects perf. I found that I could increase the unfoldingUseThrehold from 80 to 90 in compensation; in combination with the result discount change I get these overall nofib numbers: Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- boyer -0.2% +5.4% -3.2% -3.4% 0.0% cichelli -0.1% +5.9% -11.2% -11.7% 0.0% compress2 -0.2% +9.6% -6.0% -6.8% 0.0% cryptarithm2 -0.1% -3.9% -6.0% -5.7% 0.0% gamteb -0.2% +2.6% -13.8% -14.4% 0.0% genfft -0.1% -1.6% -29.5% -29.9% 0.0% gg -0.0% -2.2% -17.2% -17.8% -20.0% life -0.1% -2.2% -62.3% -63.4% 0.0% mate +0.0% +1.4% -5.1% -5.1% -14.3% parser -0.2% -2.1% +7.4% +6.7% 0.0% primetest -0.2% -12.8% -14.3% -14.2% 0.0% puzzle -0.2% +2.1% -10.0% -10.4% 0.0% rsa -0.2% -11.7% -3.7% -3.8% 0.0% simple -0.2% +2.8% -36.7% -38.3% -2.2% wheel-sieve2 -0.1% -19.2% -48.8% -49.2% -42.9% -------------------------------------------------------------------------------- Min -0.4% -19.2% -62.3% -63.4% -42.9% Max +0.3% +9.6% +7.4% +11.0% +16.7% Geometric Mean -0.1% -0.3% -17.6% -18.0% -0.7% I'm ok with these numbers, remembering that this change removes an *exponential* increase in code size in some in-the-wild cases. I investigated compress2. The difference is entirely caused by this function no longer inlining WriteRoutines.$woutputCodes = \ (w :: [CodeEvent]) -> let result_s1Sr = case WriteRoutines.outputCodes_$s$woutput w 0# 0# 8# 9# of (# ww1, ww2 #) -> (ww1, ww2) in (# case result_s1Sr of (x, _) -> map @Int @Char WriteRoutines.outputCodes1 x , case result_s1Sr of { (_, y) -> y } #) It was right on the cusp before, driven by the excessive result discount. Too bad! Happily, the compiler/perf tests show a number of improvements: T12227 compiler bytes-alloc -6.6% T12545 compiler bytes-alloc -4.7% T13056 compiler bytes-alloc -3.3% T15263 runtime bytes-alloc -13.1% T17499 runtime bytes-alloc -14.3% T3294 compiler bytes-alloc -1.1% T5030 compiler bytes-alloc -11.7% T9872a compiler bytes-alloc -2.0% T9872b compiler bytes-alloc -1.2% T9872c compiler bytes-alloc -1.5% Metric Decrease: T12227 T12545 T13056 T15263 T17499 T3294 T5030 T9872a T9872b T9872c
-
- Jun 10, 2020
-
-
The cast worker/wrapper transformation transforms x = e |> co into y = e x = y |> co This is done by the simplifier, but we were being careless about transferring IdInfo from x to y, and about what to do if x is a NOINLNE function. This resulted in a series of bugs: #17673, #18093, #18078. This patch fixes all that: * Main change is in GHC.Core.Opt.Simplify, and the new prepareBinding function, which does this cast worker/wrapper transform. See Note [Cast worker/wrappers]. * There is quite a bit of refactoring around prepareRhs, makeTrivial etc. It's nicer now. * Some wrappers from strictness and cast w/w, notably those for a function with a NOINLINE, should inline very late. There wasn't really a mechanism for that, which was an existing bug really; so I invented a new finalPhase = Phase (-1). It's used for all simplifier runs after the user-visible phase 2,1,0 have run. (No new runs of the simplifier are introduced thereby.) See new Note [Compiler phases] in GHC.Types.Basic; the main changes are in GHC.Core.Opt.Driver * Doing this made me trip over two places where the AnonArgFlag on a FunTy was being lost so we could end up with (Num a -> ty) rather than (Num a => ty) - In coercionLKind/coercionRKind - In contHoleType in the Simplifier I fixed the former by defining mkFunctionType and using it in coercionLKind/RKind. I could have done the same for the latter, but the information is almost to hand. So I fixed the latter by - adding sc_hole_ty to ApplyToVal (like ApplyToTy), - adding as_hole_ty to ValArg (like TyArg) - adding sc_fun_ty to StrictArg Turned out I could then remove ai_type from ArgInfo. This is just moving the deck chairs around, but it worked out nicely. See the new Note [AnonArgFlag] in GHC.Types.Var * When looking at the 'arity decrease' thing (#18093) I discovered that stable unfoldings had a much lower arity than the actual optimised function. That's what led to the arity-decrease message. Simple solution: eta-expand. It's described in Note [Eta-expand stable unfoldings] in GHC.Core.Opt.Simplify * I also discovered that unsafeCoerce wasn't being inlined if the context was boring. So (\x. f (unsafeCoerce x)) would create a thunk -- yikes! I fixed that by making inlineBoringOK a bit cleverer: see Note [Inline unsafeCoerce] in GHC.Core.Unfold. I also found that unsafeCoerceName was unused, so I removed it. I made a test case for #18078, and a very similar one for #17673. The net effect of all this on nofib is very modest, but positive: -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- anna -0.4% -0.1% -3.1% -3.1% 0.0% fannkuch-redux -0.4% -0.3% -0.1% -0.1% 0.0% maillist -0.4% -0.1% -7.8% -1.0% -14.3% primetest -0.4% -15.6% -7.1% -6.6% 0.0% -------------------------------------------------------------------------------- Min -0.9% -15.6% -13.3% -14.2% -14.3% Max -0.3% 0.0% +12.1% +12.4% 0.0% Geometric Mean -0.4% -0.2% -2.3% -2.2% -0.1% All following metric decreases are compile-time allocation decreases between -1% and -3%: Metric Decrease: T5631 T13701 T14697 T15164
-
- May 13, 2020
-
-
Instead, look through expandable unfoldings in `cprTransform`. See the new Note [CPR for expandable unfoldings]: ``` Long static data structures (whether top-level or not) like xs = x1 : xs1 xs1 = x2 : xs2 xs2 = x3 : xs3 should not get CPR signatures, because they * Never get WW'd, so their CPR signature should be irrelevant after analysis (in fact the signature might even be harmful for that reason) * Would need to be inlined/expanded to see their constructed product * Recording CPR on them blows up interface file sizes and is redundant with their unfolding. In case of Nested CPR, this blow-up can be quadratic! But we can't just stop giving DataCon application bindings the CPR property, for example fac 0 = 1 fac n = n * fac (n-1) fac certainly has the CPR property and should be WW'd! But FloatOut will transform the first clause to lvl = 1 fac 0 = lvl If lvl doesn't have the CPR property, fac won't either. But lvl doesn't have a CPR signature to extrapolate into a CPR transformer ('cprTransform'). So instead we keep on cprAnal'ing through *expandable* unfoldings for these arity 0 bindings via 'cprExpandUnfolding_maybe'. In practice, GHC generates a lot of (nested) TyCon and KindRep bindings, one for each data declaration. It's wasteful to attach CPR signatures to each of them (and intractable in case of Nested CPR). ``` Fixes #18154.
-
- Feb 12, 2020
-
-
Krzysztof Gogolewski authored
We now always show "forall {a}. T" for inferred variables, previously this was controlled by -fprint-explicit-foralls. This implements part 1 of https://github.com/ghc-proposals/ghc-proposals/pull/179. Part of GHC ticket #16320. Furthermore, when printing a levity restriction error, we now display the HsWrap of the expression. This lets users see the full elaboration with -fprint-typechecker-elaboration (see also #17670)
-
Sebastian Graf authored
The reasons for that can be found in the wiki: https://gitlab.haskell.org/ghc/ghc/wikis/nested-cpr/split-off-cpr We now run CPR after demand analysis (except for after the final demand analysis run just before code gen). CPR got its own dump flags (`-ddump-cpr-anal`, `-ddump-cpr-signatures`), but not its own flag to activate/deactivate. It will run with `-fstrictness`/`-fworker-wrapper`. As explained on the wiki page, this step is necessary for a sane Nested CPR analysis. And it has quite positive impact on compiler performance: Metric Decrease: T9233 T9675 T9961 T15263
-
- Jan 31, 2020
-
-
Ömer Sinan Ağacan authored
This patch removes all CafInfo predictions and various hacks to preserve predicted CafInfos from the compiler and assigns final CafInfos to interface Ids after code generation. SRT analysis is extended to support static data, and Cmm generator is modified to allow generating static_link fields after SRT analysis. This also fixes `-fcatch-bottoms`, which introduces error calls in case expressions in CorePrep, which runs *after* CoreTidy (which is where we decide on CafInfos) and turns previously non-CAFFY things into CAFFY. Fixes #17648 Fixes #9718 Evaluation ========== NoFib ----- Boot with: `make boot mode=fast` Run: `make mode=fast EXTRA_RUNTEST_OPTS="-cachegrind" NoFibRuns=1` -------------------------------------------------------------------------------- Program Size Allocs Instrs Reads Writes -------------------------------------------------------------------------------- CS -0.0% 0.0% -0.0% -0.0% -0.0% CSD -0.0% 0.0% -0.0% -0.0% -0.0% FS -0.0% 0.0% -0.0% -0.0% -0.0% S -0.0% 0.0% -0.0% -0.0% -0.0% VS -0.0% 0.0% -0.0% -0.0% -0.0% VSD -0.0% 0.0% -0.0% -0.0% -0.5% VSM -0.0% 0.0% -0.0% -0.0% -0.0% anna -0.1% 0.0% -0.0% -0.0% -0.0% ansi -0.0% 0.0% -0.0% -0.0% -0.0% atom -0.0% 0.0% -0.0% -0.0% -0.0% awards -0.0% 0.0% -0.0% -0.0% -0.0% banner -0.0% 0.0% -0.0% -0.0% -0.0% bernouilli -0.0% 0.0% -0.0% -0.0% -0.0% binary-trees -0.0% 0.0% -0.0% -0.0% -0.0% boyer -0.0% 0.0% -0.0% -0.0% -0.0% boyer2 -0.0% 0.0% -0.0% -0.0% -0.0% bspt -0.0% 0.0% -0.0% -0.0% -0.0% cacheprof -0.0% 0.0% -0.0% -0.0% -0.0% calendar -0.0% 0.0% -0.0% -0.0% -0.0% cichelli -0.0% 0.0% -0.0% -0.0% -0.0% circsim -0.0% 0.0% -0.0% -0.0% -0.0% clausify -0.0% 0.0% -0.0% -0.0% -0.0% comp_lab_zift -0.0% 0.0% -0.0% -0.0% -0.0% compress -0.0% 0.0% -0.0% -0.0% -0.0% compress2 -0.0% 0.0% -0.0% -0.0% -0.0% constraints -0.0% 0.0% -0.0% -0.0% -0.0% cryptarithm1 -0.0% 0.0% -0.0% -0.0% -0.0% cryptarithm2 -0.0% 0.0% -0.0% -0.0% -0.0% cse -0.0% 0.0% -0.0% -0.0% -0.0% digits-of-e1 -0.0% 0.0% -0.0% -0.0% -0.0% digits-of-e2 -0.0% 0.0% -0.0% -0.0% -0.0% dom-lt -0.0% 0.0% -0.0% -0.0% -0.0% eliza -0.0% 0.0% -0.0% -0.0% -0.0% event -0.0% 0.0% -0.0% -0.0% -0.0% exact-reals -0.0% 0.0% -0.0% -0.0% -0.0% exp3_8 -0.0% 0.0% -0.0% -0.0% -0.0% expert -0.0% 0.0% -0.0% -0.0% -0.0% fannkuch-redux -0.0% 0.0% -0.0% -0.0% -0.0% fasta -0.0% 0.0% -0.0% -0.0% -0.0% fem -0.0% 0.0% -0.0% -0.0% -0.0% fft -0.0% 0.0% -0.0% -0.0% -0.0% fft2 -0.0% 0.0% -0.0% -0.0% -0.0% fibheaps -0.0% 0.0% -0.0% -0.0% -0.0% fish -0.0% 0.0% -0.0% -0.0% -0.0% fluid -0.1% 0.0% -0.0% -0.0% -0.0% fulsom -0.0% 0.0% -0.0% -0.0% -0.0% gamteb -0.0% 0.0% -0.0% -0.0% -0.0% gcd -0.0% 0.0% -0.0% -0.0% -0.0% gen_regexps -0.0% 0.0% -0.0% -0.0% -0.0% genfft -0.0% 0.0% -0.0% -0.0% -0.0% gg -0.0% 0.0% -0.0% -0.0% -0.0% grep -0.0% 0.0% -0.0% -0.0% -0.0% hidden -0.0% 0.0% -0.0% -0.0% -0.0% hpg -0.1% 0.0% -0.0% -0.0% -0.0% ida -0.0% 0.0% -0.0% -0.0% -0.0% infer -0.0% 0.0% -0.0% -0.0% -0.0% integer -0.0% 0.0% -0.0% -0.0% -0.0% integrate -0.0% 0.0% -0.0% -0.0% -0.0% k-nucleotide -0.0% 0.0% -0.0% -0.0% -0.0% kahan -0.0% 0.0% -0.0% -0.0% -0.0% knights -0.0% 0.0% -0.0% -0.0% -0.0% lambda -0.0% 0.0% -0.0% -0.0% -0.0% last-piece -0.0% 0.0% -0.0% -0.0% -0.0% lcss -0.0% 0.0% -0.0% -0.0% -0.0% life -0.0% 0.0% -0.0% -0.0% -0.0% lift -0.0% 0.0% -0.0% -0.0% -0.0% linear -0.1% 0.0% -0.0% -0.0% -0.0% listcompr -0.0% 0.0% -0.0% -0.0% -0.0% listcopy -0.0% 0.0% -0.0% -0.0% -0.0% maillist -0.0% 0.0% -0.0% -0.0% -0.0% mandel -0.0% 0.0% -0.0% -0.0% -0.0% mandel2 -0.0% 0.0% -0.0% -0.0% -0.0% mate -0.0% 0.0% -0.0% -0.0% -0.0% minimax -0.0% 0.0% -0.0% -0.0% -0.0% mkhprog -0.0% 0.0% -0.0% -0.0% -0.0% multiplier -0.0% 0.0% -0.0% -0.0% -0.0% n-body -0.0% 0.0% -0.0% -0.0% -0.0% nucleic2 -0.0% 0.0% -0.0% -0.0% -0.0% para -0.0% 0.0% -0.0% -0.0% -0.0% paraffins -0.0% 0.0% -0.0% -0.0% -0.0% parser -0.1% 0.0% -0.0% -0.0% -0.0% parstof -0.1% 0.0% -0.0% -0.0% -0.0% pic -0.0% 0.0% -0.0% -0.0% -0.0% pidigits -0.0% 0.0% -0.0% -0.0% -0.0% power -0.0% 0.0% -0.0% -0.0% -0.0% pretty -0.0% 0.0% -0.3% -0.4% -0.4% primes -0.0% 0.0% -0.0% -0.0% -0.0% primetest -0.0% 0.0% -0.0% -0.0% -0.0% prolog -0.0% 0.0% -0.0% -0.0% -0.0% puzzle -0.0% 0.0% -0.0% -0.0% -0.0% queens -0.0% 0.0% -0.0% -0.0% -0.0% reptile -0.0% 0.0% -0.0% -0.0% -0.0% reverse-complem -0.0% 0.0% -0.0% -0.0% -0.0% rewrite -0.0% 0.0% -0.0% -0.0% -0.0% rfib -0.0% 0.0% -0.0% -0.0% -0.0% rsa -0.0% 0.0% -0.0% -0.0% -0.0% scc -0.0% 0.0% -0.3% -0.5% -0.4% sched -0.0% 0.0% -0.0% -0.0% -0.0% scs -0.0% 0.0% -0.0% -0.0% -0.0% simple -0.1% 0.0% -0.0% -0.0% -0.0% solid -0.0% 0.0% -0.0% -0.0% -0.0% sorting -0.0% 0.0% -0.0% -0.0% -0.0% spectral-norm -0.0% 0.0% -0.0% -0.0% -0.0% sphere -0.0% 0.0% -0.0% -0.0% -0.0% symalg -0.0% 0.0% -0.0% -0.0% -0.0% tak -0.0% 0.0% -0.0% -0.0% -0.0% transform -0.0% 0.0% -0.0% -0.0% -0.0% treejoin -0.0% 0.0% -0.0% -0.0% -0.0% typecheck -0.0% 0.0% -0.0% -0.0% -0.0% veritas -0.0% 0.0% -0.0% -0.0% -0.0% wang -0.0% 0.0% -0.0% -0.0% -0.0% wave4main -0.0% 0.0% -0.0% -0.0% -0.0% wheel-sieve1 -0.0% 0.0% -0.0% -0.0% -0.0% wheel-sieve2 -0.0% 0.0% -0.0% -0.0% -0.0% x2n1 -0.0% 0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Min -0.1% 0.0% -0.3% -0.5% -0.5% Max -0.0% 0.0% -0.0% -0.0% -0.0% Geometric Mean -0.0% -0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Program Size Allocs Instrs Reads Writes -------------------------------------------------------------------------------- circsim -0.1% 0.0% -0.0% -0.0% -0.0% constraints -0.0% 0.0% -0.0% -0.0% -0.0% fibheaps -0.0% 0.0% -0.0% -0.0% -0.0% gc_bench -0.0% 0.0% -0.0% -0.0% -0.0% hash -0.0% 0.0% -0.0% -0.0% -0.0% lcss -0.0% 0.0% -0.0% -0.0% -0.0% power -0.0% 0.0% -0.0% -0.0% -0.0% spellcheck -0.0% 0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Min -0.1% 0.0% -0.0% -0.0% -0.0% Max -0.0% 0.0% -0.0% -0.0% -0.0% Geometric Mean -0.0% +0.0% -0.0% -0.0% -0.0% Manual inspection of programs in testsuite/tests/programs --------------------------------------------------------- I built these programs with a bunch of dump flags and `-O` and compared STG, Cmm, and Asm dumps and file sizes. (Below the numbers in parenthesis show number of modules in the program) These programs have identical compiler (same .hi and .o sizes, STG, and Cmm and Asm dumps): - Queens (1), andre_monad (1), cholewo-eval (2), cvh_unboxing (3), andy_cherry (7), fun_insts (1), hs-boot (4), fast2haskell (2), jl_defaults (1), jq_readsPrec (1), jules_xref (1), jtod_circint (4), jules_xref2 (1), lennart_range (1), lex (1), life_space_leak (1), bargon-mangler-bug (7), record_upd (1), rittri (1), sanders_array (1), strict_anns (1), thurston-module-arith (2), okeefe_neural (1), joao-circular (6), 10queens (1) Programs with different compiler outputs: - jl_defaults (1): For some reason GHC HEAD marks a lot of top-level `[Int]` closures as CAFFY for no reason. With this patch we no longer make them CAFFY and generate less SRT entries. For some reason Main.o is slightly larger with this patch (1.3%) and the executable sizes are the same. (I'd expect both to be smaller) - launchbury (1): Same as jl_defaults: top-level `[Int]` closures marked as CAFFY for no reason. Similarly `Main.o` is 1.4% larger but the executable sizes are the same. - galois_raytrace (13): Differences are in the Parse module. There are a lot, but some of the changes are caused by the fact that for some reason (I think a bug) GHC HEAD marks the dictionary for `Functor Identity` as CAFFY. Parse.o is 0.4% larger, the executable size is the same. - north_array: We now generate less SRT entries because some of array primops used in this program like `NewArrayOp` get eliminated during Stg-to-Cmm and turn some CAFFY things into non-CAFFY. Main.o gets 24% larger (9224 bytes from 9000 bytes), executable sizes are the same. - seward-space-leak: Difference in this program is better shown by this smaller example: module Lib where data CDS = Case [CDS] [(Int, CDS)] | Call CDS CDS instance Eq CDS where Case sels1 rets1 == Case sels2 rets2 = sels1 == sels2 && rets1 == rets2 Call a1 b1 == Call a2 b2 = a1 == a2 && b1 == b2 _ == _ = False In this program GHC HEAD builds a new SRT for the recursive group of `(==)`, `(/=)` and the dictionary closure. Then `/=` points to `==` in its SRT field, and `==` uses the SRT object as its SRT. With this patch we use the closure for `/=` as the SRT and add `==` there. Then `/=` gets an empty SRT field and `==` points to `/=` in its SRT field. This change looks fine to me. Main.o gets 0.07% larger, executable sizes are identical. head.hackage ------------ head.hackage's CI script builds 428 packages from Hackage using this patch with no failures. Compiler performance -------------------- The compiler perf tests report that the compiler allocates slightly more (worst case observed so far is 4%). However most programs in the test suite are small, single file programs. To benchmark compiler performance on something more realistic I build Cabal (the library, 236 modules) with different optimisation levels. For the "max residency" row I run GHC with `+RTS -s -A100k -i0 -h` for more accurate numbers. Other rows are generated with just `-s`. (This is because `-i0` causes running GC much more frequently and as a result "bytes copied" gets inflated by more than 25x in some cases) * -O0 | | GHC HEAD | This MR | Diff | | --------------- | -------------- | -------------- | ------ | | Bytes allocated | 54,413,350,872 | 54,701,099,464 | +0.52% | | Bytes copied | 4,926,037,184 | 4,990,638,760 | +1.31% | | Max residency | 421,225,624 | 424,324,264 | +0.73% | * -O1 | | GHC HEAD | This MR | Diff | | --------------- | --------------- | --------------- | ------ | | Bytes allocated | 245,849,209,992 | 246,562,088,672 | +0.28% | | Bytes copied | 26,943,452,560 | 27,089,972,296 | +0.54% | | Max residency | 982,643,440 | 991,663,432 | +0.91% | * -O2 | | GHC HEAD | This MR | Diff | | --------------- | --------------- | --------------- | ------ | | Bytes allocated | 291,044,511,408 | 291,863,910,912 | +0.28% | | Bytes copied | 37,044,237,616 | 36,121,690,472 | -2.49% | | Max residency | 1,071,600,328 | 1,086,396,256 | +1.38% | Extra compiler allocations -------------------------- Runtime allocations of programs are as reported above (NoFib section). The compiler now allocates more than before. Main source of allocation in this patch compared to base commit is the new SRT algorithm (GHC.Cmm.Info.Build). Below is some of the extra work we do with this patch, numbers generated by profiled stage 2 compiler when building a pathological case (the test 'ManyConstructors') with '-O2': - We now sort the final STG for a module, which means traversing the entire program, generating free variable set for each top-level binding, doing SCC analysis, and re-ordering the program. In ManyConstructors this step allocates 97,889,952 bytes. - We now do SRT analysis on static data, which in a program like ManyConstructors causes analysing 10,000 bindings that we would previously just skip. This step allocates 70,898,352 bytes. - We now maintain an SRT map for the entire module as we compile Cmm groups: data ModuleSRTInfo = ModuleSRTInfo { ... , moduleSRTMap :: SRTMap } (SRTMap is just a strict Map from the 'containers' library) This map gets an entry for most bindings in a module (exceptions are THUNKs and CAFFY static functions). For ManyConstructors this map gets 50015 entries. - Once we're done with code generation we generate a NameSet from SRTMap for the non-CAFFY names in the current module. This set gets the same number of entries as the SRTMap. - Finally we update CafInfos in ModDetails for the non-CAFFY Ids, using the NameSet generated in the previous step. This usually does the least amount of allocation among the work listed here. Only place with this patch where we do less work in the CAF analysis in the tidying pass (CoreTidy). However that doesn't save us much, as the pass still needs to traverse the whole program and update IdInfos for other reasons. Only thing we don't here do is the `hasCafRefs` pass over the RHS of bindings, which is a stateless pass that returns a boolean value, so it doesn't allocate much. (Metric changes blow are all increased allocations) Metric changes -------------- Metric Increase: ManyAlternatives ManyConstructors T13035 T14683 T1969 T9961
-
- Jan 08, 2020
-
-
This brings the pretty-printer for Core in line with how visible type applications are normally printed: namely, with no whitespace after the `@` character (i.e., `f @a` instead of `f @ a`). While I'm in town, I also give the same treatment to type abstractions (i.e., `\(@a)` instead of `\(@ a)`) and coercion applications (i.e., `f @~x` instead of `f @~ x`). Fixes #17643.
-
- Dec 12, 2018
-
-
Simon Peyton Jones authored
This patch collects a few improvements triggered by Trac #15696, and fixing Trac #16029 * Stop making toCleanDmd behave specially for unlifted types. This special case was the cause of stupid behaviour in Trac #16029. And to my joy I discovered the let/app invariant rendered it unnecessary. (Maybe the special case pre-dated the let/app invariant.) Result: less special-case handling in the compiler, and better perf for the compiled code. * In WwLib.mkWWstr_one, treat seqDmd like U(AAA). It was not being so treated before, which again led to stupid code. * Update and improve Notes There are .stderr test wibbles because we get slightly different strictness signatures for an argumment of unlifted type: <L,U> rather than <S,U> for Int# <S,U> rather than <S(S),U(U)> for Int
-
- Jun 07, 2018
-
-
Simon Peyton Jones authored
Back in 1999 I put this ad-hoc code in the Case-handling code for occAnal: occAnal env (Case scrut bndr ty alts) = ... -- Note [Case binder usage] -- ~~~~~~~~~~~~~~~~~~~~~~~~ -- The case binder gets a usage of either "many" or "dead", never "one". -- Reason: we like to inline single occurrences, to eliminate a binding, -- but inlining a case binder *doesn't* eliminate a binding. -- We *don't* want to transform -- case x of w { (p,q) -> f w } -- into -- case x of w { (p,q) -> f (p,q) } tag_case_bndr usage bndr = (usage', setIdOccInfo bndr final_occ_info) where occ_info = lookupDetails usage bndr usage' = usage `delDetails` bndr final_occ_info = case occ_info of IAmDead -> IAmDead _ -> noOccInfo But the comment looks wrong -- the bad inlining will not happen -- and I think it relates to some long-ago version of the simplifier. So I simply removed the special case, which gives more accurate occurrence-info to the case binder. Interestingly I got a slight improvement in nofib binary sizes. -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- cacheprof -0.1% +0.2% -0.7% -1.2% +8.6% -------------------------------------------------------------------------------- Min -0.2% 0.0% -14.5% -30.5% 0.0% Max -0.1% +0.2% +10.0% +10.0% +25.0% Geometric Mean -0.2% +0.0% -1.9% -5.4% +0.3% I have no idea if the improvement in runtime is real. I did look at the tiny increase in allocation for cacheprof and concluded that it was unimportant (I forget the details). Also the more accurate occ-info for the case binder meant that some inlining happens in one pass that previously took successive passes for the test dependent/should_compile/dynamic-paper (which has a known Russel-paradox infinite loop in the simplifier). In short, a small win: less ad-hoc complexity and slightly smaller binaries.
-
- Apr 20, 2018
-
-
Simon Peyton Jones authored
This patch has a single significant change: strictness wrapper functions are inlined earlier, in phase 2 rather than phase 0. As shown by Trac #15056, this gives a better chance for RULEs to fire. Before this change, a function that would have inlined early without strictness analyss was instead inlining late. Result: applying "optimisation" made the program worse. This does not make too much difference in nofib, but I've stumbled over the problem more than once, so even a "no-change" result would be quite acceptable. Here are the headlines: -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- cacheprof -0.5% -0.5% +2.5% +2.5% 0.0% fulsom -1.0% +2.6% -0.1% -0.1% 0.0% mate -0.6% +2.4% -0.9% -0.9% 0.0% veritas -0.7% -23.2% 0.002 0.002 0.0% -------------------------------------------------------------------------------- Min -1.4% -23.2% -12.5% -15.3% 0.0% Max +0.6% +2.6% +4.4% +4.3% +19.0% Geometric Mean -0.7% -0.2% -1.4% -1.7% +0.2% * A worthwhile reduction in binary size. * Runtimes are not to be trusted much but look as if they are moving the right way. * A really big win in veritas, described in comment:1 of Trac #15056; more fusion rules fired. * I investigated the losses in 'mate' and 'fulsom'; see #15056.
-
- Jan 03, 2018
-
-
Simon Peyton Jones authored
See Trac #14626, comment:4. We want to maintain evaluted-ness info on Ids into the code generateor for two reasons (see Note [Preserve evaluated-ness in CorePrep] in CorePrep) - DataToTag magic - Potentially using it in the codegen (this is Gabor's current work) But it was all being done very inconsistently, and actually outright wrong -- the DataToTag magic hasn't been working for years. This patch tidies it all up, with Notes to match.
-
- Sep 12, 2017
-
-
Joachim Breitner authored
the worker/wrapper creates an artificial INLINE pragma, which caused CSE to not do its work. We now recognize such artificial pragmas by using `NoUserInline` instead of `Inline` as the `InlineSpec`. Differential Revision: https://phabricator.haskell.org/D3939
-
- Mar 06, 2017
-
-
Simon Peyton Jones authored
This fixes Trac #13369. It turned out that I really had got the bottoming-float code wrong, again. The new story is explained in Note [Bottoming floats], esp item (3), and Note [Floating from a RHS]. I didn't make a regression test; it's hard to to so. Nofib result are good -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- banner -2.2% -4.6% 0.00 0.00 +0.0% bspt -1.3% -1.6% 0.01 0.01 +0.0% cacheprof -1.8% -0.3% +3.7% +3.7% -0.9% digits-of-e2 -1.0% -1.5% -0.5% -0.5% +0.0% expert -1.3% -0.2% 0.00 0.00 +0.0% n-body -1.1% -0.2% +0.1% +0.1% +0.0% veritas -2.9% -0.1% 0.00 0.00 +0.0% -------------------------------------------------------------------------------- Min -2.9% -4.6% -7.4% -7.4% -19.8% Max -1.0% +0.0% +5.2% +5.1% +10.0% Geometric Mean -1.2% -0.1% +0.5% +0.5% -0.1% I /think/ all this is due to this error-floating change; but it's possible that some was due to commit "Fix CSE (again) on literal strings" a couple of commits earlier.
-
- Mar 02, 2017
-
-
David Feuer authored
Reviewers: austin, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D3260
-
- Feb 06, 2017
-
-
Disabling worker/wrapper for NOINLINE things can cause unnecessary reboxing of values. Consider {-# NOINLINE f #-} f :: Int -> a f x = error (show x) g :: Bool -> Bool -> Int -> Int g True True p = f p g False True p = p + 1 g b False p = g b True p the strictness analysis will discover f and g are strict, but because f has no wrapper, the worker for g will rebox p. So we get $wg x y p# = let p = I# p# in -- Yikes! Reboxing! case x of False -> case y of False -> $wg False True p# True -> +# p# 1# True -> case y of False -> $wg True True p# True -> case f p of { } g x y p = case p of (I# p#) -> $wg x y p# Now, in this case the reboxing will float into the True branch, an so the allocation will only happen on the error path. But it won't float inwards if there are multiple branches that call (f p), so the reboxing will happen on every call of g. Disaster. Solution: do worker/wrapper even on NOINLINE things; but move the NOINLINE pragma to the worker. Test Plan: make test TEST="13143" Reviewers: simonpj, bgamari, dfeuer, austin Reviewed By: simonpj, bgamari Subscribers: dfeuer, thomie Differential Revision: https://phabricator.haskell.org/D3046
-