1. 23 Apr, 2017 1 commit
  2. 22 Apr, 2017 1 commit
  3. 06 Apr, 2017 1 commit
  4. 01 Apr, 2017 2 commits
  5. 31 Mar, 2017 1 commit
    • rwbarton's avatar
      Fix space leaks in simplifier (#13426) · e13419c5
      rwbarton authored
      The Join points commit (8d5cf8bf) introduced a space leak
      somewhere in the simplifier. The extra strictness added in this commit
      fixes the leak. Unfortunately I don't really understand the details.
      
      Unfortunately, the extra strictness appears to result in more overall
      allocations in some cases, even while the peak heap size decreases in others.
      
      Test Plan: harbormaster
      
      Reviewers: austin, bgamari
      
      Reviewed By: bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D3399
      e13419c5
  6. 29 Mar, 2017 1 commit
  7. 26 Mar, 2017 1 commit
  8. 22 Mar, 2017 1 commit
  9. 13 Mar, 2017 1 commit
  10. 06 Mar, 2017 1 commit
  11. 03 Mar, 2017 3 commits
    • Ben Gamari's avatar
      TcTypeable: Try to reuse KindReps · a694cee7
      Ben Gamari authored
      Here we rework the TcTypeable implementation to reuse KindRep bindings
      when possible. This is an attempt at minimizing the impact of Typeable
      binding generation by reducing the number of bindings that we produce.
      
      It turns out that this produces some pretty reasonable compiler
      allocations improvements. It seems to erase most of the increases
      initially introduced by TTypeable in the testsuite. Moreover, nofib
      shows,
      ```
              -1 s.d.                -----          -3.555%
              +1 s.d.                -----          +1.937%
              Average                -----          -0.847%
      ```
      
      Here are a few of the high-scorers (ignore last column, which is for
      D3219),
      ```
      veritas
                Types             88800920         -18.945%         -21.480%
      
      veritas
              Tactics            540766744         -27.256%         -27.338%
      
      sched
                 Main            567013384          -4.947%          -5.358%
      
      listcompr
                 Main            532300000          -4.273%          -4.572%
      
      listcopy
                 Main            537785392          -4.382%          -4.635%
      
      anna
             BaseDefs           1984225032         -10.639%         -10.832%
      
      ```
      as expected, these tend to be modules with either very many or very
      large types.
      
      Test Plan: Validate
      
      Reviewers: austin, dfeuer
      
      Subscribers: simonmar, dfeuer, thomie
      
      Differential Revision: https://phabricator.haskell.org/D3166
      a694cee7
    • Ben Gamari's avatar
      testsuite: Bump down allocations for T12707 · 9808ebc8
      Ben Gamari authored
      9808ebc8
    • David Feuer's avatar
      Fix T12234 stat mistakes · 57d969ec
      David Feuer authored
      I goofed up updating the expected and recent historical results
      here. They should be right now.
      
      Reviewers: austin, bgamari
      
      Reviewed By: bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D3264
      57d969ec
  12. 02 Mar, 2017 1 commit
  13. 28 Feb, 2017 2 commits
    • Simon Peyton Jones's avatar
      The Early Inline Patch · 2effe18a
      Simon Peyton Jones authored
      This very small patch switches on sm_inline even in the InitialPhase
      (aka "gentle" phase).   There is no reason not to... and the results
      are astonishing.
      
      I think the peformance of GHC itself improves by about 5%; and some
      programs get much smaller, quicker.  Result: across the board
      irmprovements in
      compile time performance.  Here are the changes in perf/compiler;
      the numbers are decreases in compiler bytes-allocated:
      
        3%   T5837
        7%   parsing001
        9%   T12234
        35%  T9020
        9%   T3064
        13%  T9961
        20%  T13056
        5%   T9872d
        5%   T9872c
        5%   T9872b
        7%   T9872a
        5%   T783
        35%  T12227
        20%  T1969
      
      Plus in perf/should_run
      
        5%   lazy-bs-alloc
      
      It wasn't as easy as it sounds: I did a raft of preparatory work in
      earlier patches.  But it's great!
      
      Reviewers: austin, bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D3203
      2effe18a
    • Ben Gamari's avatar
      defef527
  14. 23 Feb, 2017 1 commit
    • Ben Gamari's avatar
      testsuite: Bump a performance tests · 050f05df
      Ben Gamari authored
      T5321Fun, T3064, and T12707 are failing, but only on Darwin. I suspect this is
      probably creep from Typeable and pushed over the edge by some of Simon's recent
      commits. Unfortunately the tree brokenness due to the recent submodule bumps
      makes it difficult to pin down.
      050f05df
  15. 22 Feb, 2017 1 commit
  16. 21 Feb, 2017 1 commit
  17. 20 Feb, 2017 1 commit
  18. 18 Feb, 2017 3 commits
    • Ben Gamari's avatar
      Disable Typeable binding generation for unboxed sums · 42ff5d97
      Ben Gamari authored
      These things are simply too expensive to generate at the moment. More
      work is needed here; see #13276 and #13261.
      42ff5d97
    • Ben Gamari's avatar
      Type-indexed Typeable · 8fa4bf9a
      Ben Gamari authored
      This at long last realizes the ideas for type-indexed Typeable discussed in A
      Reflection on Types (#11011). The general sketch of the project is described on
      the Wiki (Typeable/BenGamari). The general idea is that we are adding a type
      index to `TypeRep`,
      
          data TypeRep (a :: k)
      
      This index allows the typechecker to reason about the type represented by the `TypeRep`.
      This index representation mechanism is exposed as `Type.Reflection`, which also provides
      a number of patterns for inspecting `TypeRep`s,
      
      ```lang=haskell
      pattern TRFun :: forall k (fun :: k). ()
                    => forall (r1 :: RuntimeRep) (r2 :: RuntimeRep)
                              (arg :: TYPE r1) (res :: TYPE r2).
                       (k ~ Type, fun ~~ (arg -> res))
                    => TypeRep arg
                    -> TypeRep res
                    -> TypeRep fun
      
      pattern TRApp :: forall k2 (t :: k2). ()
                    => forall k1 (a :: k1 -> k2) (b :: k1). (t ~ a b)
                    => TypeRep a -> TypeRep b -> TypeRep t
      
      -- | Pattern match on a type constructor.
      pattern TRCon :: forall k (a :: k). TyCon -> TypeRep a
      
      -- | Pattern match on a type constructor including its instantiated kind
      -- variables.
      pattern TRCon' :: forall k (a :: k). TyCon -> [SomeTypeRep] -> TypeRep a
      ```
      
      In addition, we give the user access to the kind of a `TypeRep` (#10343),
      
          typeRepKind :: TypeRep (a :: k) -> TypeRep k
      
      Moreover, all of this plays nicely with 8.2's levity polymorphism, including the
      newly levity polymorphic (->) type constructor.
      
      Library changes
      ---------------
      
      The primary change here is the introduction of a Type.Reflection module to base.
      This module provides access to the new type-indexed TypeRep introduced in this
      patch. We also continue to provide the unindexed Data.Typeable interface, which
      is simply a type synonym for the existentially quantified SomeTypeRep,
      
          data SomeTypeRep where SomeTypeRep :: TypeRep a -> SomeTypeRep
      
      Naturally, this change also touched Data.Dynamic, which can now export the
      Dynamic data constructor. Moreover, I removed a blanket reexport of
      Data.Typeable from Data.Dynamic (which itself doesn't even import Data.Typeable
      now).
      
      We also add a kind heterogeneous type equality type, (:~~:), to
      Data.Type.Equality.
      
      Implementation
      --------------
      
      The implementation strategy is described in Note [Grand plan for Typeable] in
      TcTypeable. None of it was difficult, but it did exercise a number of parts of
      the new levity polymorphism story which had not yet been exercised, which took
      some sorting out.
      
      The rough idea is that we augment the TyCon produced for each type constructor
      with information about the constructor's kind (which we call a KindRep). This
      allows us to reconstruct the monomorphic result kind of an particular
      instantiation of a type constructor given its kind arguments.
      
      Unfortunately all of this takes a fair amount of work to generate and send
      through the compilation pipeline. In particular, the KindReps can unfortunately
      get quite large. Moreover, the simplifier will float out various pieces of them,
      resulting in numerous top-level bindings. Consequently we mark the KindRep
      bindings as noinline, ensuring that the float-outs don't make it into the
      interface file. This is important since there is generally little benefit to
      inlining KindReps and they would otherwise strongly affect compiler performance.
      
      Performance
      -----------
      
      Initially I was hoping to also clear up the remaining holes in Typeable's
      coverage by adding support for both unboxed tuples (#12409) and unboxed sums
      (#13276). While the former was fairly straightforward, the latter ended up being
      quite difficult: while the implementation can support them easily, enabling this
      support causes thousands of Typeable bindings to be emitted to the GHC.Types as
      each arity-N sum tycon brings with it N promoted datacons, each of which has a
      KindRep whose size which itself scales with N. Doing this was simply too
      expensive to be practical; consequently I've disabled support for the time
      being.
      
      Even after disabling sums this change regresses compiler performance far more
      than I would like. In particular there are several testcases in the testsuite
      which consist mostly of types which regress by over 30% in compiler allocations.
      These include (considering the "bytes allocated" metric),
      
       * T1969:  +10%
       * T10858: +23%
       * T3294:  +19%
       * T5631:  +41%
       * T6048:  +23%
       * T9675:  +20%
       * T9872a: +5.2%
       * T9872d: +12%
       * T9233:  +10%
       * T10370: +34%
       * T12425: +30%
       * T12234: +16%
       * 13035:  +17%
       * T4029:  +6.1%
      
      I've spent quite some time chasing down the source of this regression and while
      I was able to make som improvements, I think this approach of generating
      Typeable bindings at time of type definition is doomed to give us unnecessarily
      large compile-time overhead.
      
      In the future I think we should consider moving some of all of the Typeable
      binding generation logic back to the solver (where it was prior to
      91c6b1f5). I've opened #13261 documenting this
      proposal.
      8fa4bf9a
    • Ben Gamari's avatar
      Generalize kind of the (->) tycon · b207b536
      Ben Gamari authored
      This is generalizes the kind of `(->)`, as discussed in #11714.
      
      This involves a few things,
      
       * Generalizing the kind of `funTyCon`, adding two new `RuntimeRep`
      binders,
        ```lang=haskell
      (->) :: forall (r1 :: RuntimeRep) (r2 :: RuntimeRep)
                     (a :: TYPE r1) (b :: TYPE r2).
              a -> b -> *
        ```
      
       * Unsaturated applications of `(->)` are expressed as explicit
      `TyConApp`s
      
       * Saturated applications of `(->)` are expressed as `FunTy` as they are
      currently
      
       * Saturated applications of `(->)` are expressed by a new `FunCo`
      constructor in coercions
      
       * `splitTyConApp` needs to ensure that `FunTy`s are split to a
      `TyConApp`
         of `(->)` with the appropriate `RuntimeRep` arguments
      
       * Teach CoreLint to check that all saturated applications of `(->)` are
      represented with `FunTy`
      
      At the moment I assume that `Constraint ~ *`, which is an annoying
      source of complexity. This will
      be simplified once D3023 is resolved.
      
      Also, this introduces two known regressions,
      
      `tcfail181`, `T10403`
      =====================
      Only shows the instance,
      
          instance Monad ((->) r) -- Defined in ‘GHC.Base’
      
      in its error message when -fprint-potential-instances is used. This is
      because its instance head now mentions 'LiftedRep which is not in scope.
      I'm not entirely sure of the right way to fix this so I'm just accepting
      the new output for now.
      
      T5963 (Typeable)
      ================
      
      T5963 is now broken since Data.Typeable.Internals.mkFunTy computes its
      fingerprint without the RuntimeRep variables that (->) expects. This
      will be fixed with the merge of D2010.
      
      Haddock performance
      ===================
      
      The `haddock.base` and `haddock.Cabal` tests regress in allocations by
      about 20%. This certainly hurts, but it's also not entirely unexpected:
      the size of every function type grows with this patch and Haddock has a
      lot of functions in its heap.
      b207b536
  19. 14 Feb, 2017 1 commit
    • rwbarton's avatar
      Check local type family instances against all imported ones · bedcb716
      rwbarton authored
      We previously checked type family instance declarations
      in a module for consistency with all instances that we happened
      to have read into the EPS or HPT. It was possible to arrange that
      an imported type family instance (used by an imported function)
      was in a module whose interface file was never read during
      compilation; then we wouldn't check consistency of local instances
      with this imported instance and as a result type safety was lost.
      
      With this patch, we still check consistency of local type family
      instances with all type family instances that we have loaded; but
      we make sure to load the interface files of all our imports that
      define family instances first. More selective consistency checking
      is left to #13102.
      
      On the other hand, we can now safely assume when we import a module
      that it has been checked for consistency with its imports. So we
      can save checking in checkFamInstConsistency, and overall we should
      have less work to do now.
      
      This patch also adds a note describing the Plan for ensuring type
      family consistency.
      
      Test Plan: Two new tests added; harbormaster
      
      Reviewers: austin, simonpj, bgamari
      
      Reviewed By: simonpj, bgamari
      
      Subscribers: ggreif, thomie
      
      Differential Revision: https://phabricator.haskell.org/D2992
      bedcb716
  20. 08 Feb, 2017 1 commit
    • Ben Gamari's avatar
      testsuite: Bump bytes allocated for T5837 · 0aa3f8d4
      Ben Gamari authored
      Simon decreased this earlier today but Harbormaster doesn't reproduce his
      number. I've done two things here:
      
       1. increased the allocations number to the Harbormaster value
       2. increased the acceptance threshold from 5% to 7%, since Simon saw a 6.6%
          change in his environment.
      0aa3f8d4
  21. 07 Feb, 2017 1 commit
    • Simon Peyton Jones's avatar
      Another improvement to SetLevels · b8f58d79
      Simon Peyton Jones authored
      In my recent commit
         commit 432f952e
         Float unboxed expressions by boxing
      I changed how float_me in lvlMFE worked.  That was right, but
      it exposed another bug: an error expression wasn't getting floated
      as it should from a case alternative.  And that led to a collection
      of minor improvements
      
      * I found a much better way to cast it, by using lvlFloatRhs for
        top-level bindinds as well as nested ones, which is
          (a) more consistent and
          (b) works correctly.
      
        See Note [Floating from a RHS]
      
      * I also found some delicacy in the "floating to the top" stuff, so I
        greatly elaborated the Note [Floating to the top].
      
      * I simplified the "bottoming-float" stuff; the change is in the treatment
        of bottoming lambdas (\x y. error blah), where we now float the
        (error blah) part instead of the whole lambda (which risks just making
        duplicate lambdas.  See Note [Bottoming floats], esp (2).
      
      Perf effects are minor.
      
      * perf/compiler/T13056 improved sligtly (about 2%) in compiler
        allocations. Also T9233 improved by 1%.  I'm not sure why.
      
      * Some small nofib changes:
          - Generally some very small reductions in run-time
            allocation, except k-nucleotide, which halves for some
            reason.  (I did try to look but it's a big complicated
            function and it was far from obvious.  Had it been a loss
            I would have looked harder!
      
      NB: there's a nearby patch "Do not inline bottoming things" that could
      also be responsible for either or both.  I didn't think it was worth
      more testing to distinguish.
      
      --------------------------------------------------------------------------------
              Program           Size    Allocs   Runtime   Elapsed  TotalMem
      --------------------------------------------------------------------------------
                 grep          +0.1%     -0.2%      0.00      0.00     +0.0%
               mandel          -0.1%     -1.4%      0.13      0.13     +0.0%
         k-nucleotide          +0.1%    -51.6%     -1.0%     -1.0%     +0.0%
      --------------------------------------------------------------------------------
                  Min          -0.3%    -51.6%     -9.4%     -9.1%     -4.0%
                  Max          +0.2%     +0.0%    +31.8%    +32.7%     +0.0%
       Geometric Mean          -0.0%     -0.8%     +1.4%     +1.4%     -0.1%
      b8f58d79
  22. 06 Feb, 2017 1 commit
  23. 03 Feb, 2017 1 commit
    • Joachim Breitner's avatar
      Bump performance mark for T9020 · c2becee4
      Joachim Breitner authored
      according to the graph at perf.haskell.org, it has regressed due to join
      points, which moved it very very close to the +10% mark and hence made it
      fail just sometimes.
      c2becee4
  24. 01 Feb, 2017 2 commits
  25. 24 Jan, 2017 1 commit
  26. 23 Jan, 2017 1 commit
    • Simon Peyton Jones's avatar
      Record evaluated-ness on workers and wrappers · 596dece7
      Simon Peyton Jones authored
      Summary:
      This patch is a refinement of the original commit (which
      was reverted):
      
        commit 6b976eb8
        Date:   Fri Jan 13 08:56:53 2017 +0000
            Record evaluated-ness on workers and wrappers
      
      In Trac #13027, comment:20, I noticed that wrappers created after
      demand analysis weren't recording the evaluated-ness of strict
      constructor arguments.  In the ticket that led to a (debatable)
      Lint error but in general the more we know about evaluated-ness
      the better we can optimise.
      
      This commit adds that info
        * both in the worker (on args)
        * and in the wrapper (on CPR result patterns).
      See Note [Record evaluated-ness in worker/wrapper] in WwLib
      
      On the way I defined Id.setCaseBndrEvald, and used it to shorten
      the code in a few other places
      
      Then I added test T13077a to test the CPR aspect of this patch,
      but I found that Lint failed!
      
      Reason: simpleOptExpr was discarding evaluated-ness info on
      lambda binders because zapFragileIdInfo was discarding an
      Unfolding of (OtherCon _).  But actually that's a robust
      unfolding; there is no need to discard it. To fix this:
      
      * zapFragileIdInfo only zaps fragile unfoldings
      
      * Replace isClosedUnfolding with isFragileUnfolding (the latter
        is just the negation of the former, but the nomenclature is
        more consistent).  Better documentation too
             Note [Fragile unfoldings]
      
      * And Simplify.simplLamBndr can now look at isFragileUnfolding
        to decide whether to use the longer route of simplUnfolding.
      
      For some reason perf/compiler/T9233 improves in compile-time
      allocation by 10%.  Hooray
      
      Nofib: essentially no change:
      
      --------------------------------------------------------------------------------
              Program           Size    Allocs   Runtime   Elapsed  TotalMem
      --------------------------------------------------------------------------------
            cacheprof          +0.0%     -0.3%     +0.9%     +0.4%     +0.0%
      --------------------------------------------------------------------------------
                  Min          +0.0%     -0.3%     -2.4%     -2.4%     +0.0%
                  Max          +0.0%     +0.0%     +9.8%    +11.4%     +2.4%
       Geometric Mean          +0.0%     -0.0%     +1.1%     +1.0%     +0.0%
      596dece7
  27. 22 Jan, 2017 1 commit
  28. 20 Jan, 2017 1 commit
    • takano-akio's avatar
      Allow top-level string literals in Core (#8472) · d49b2bb2
      takano-akio authored
      This commits relaxes the invariants of the Core syntax so that a
      top-level variable can be bound to a primitive string literal of type
      Addr#.
      
      This commit:
      
      * Relaxes the invatiants of the Core, and allows top-level bindings whose
        type is Addr# as long as their RHS is either a primitive string literal or
        another variable.
      
      * Allows the simplifier and the full-laziness transformer to float out
        primitive string literals to the top leve.
      
      * Introduces the new StgGenTopBinding type to accomodate top-level Addr#
        bindings.
      
      * Introduces a new type of labels in the object code, with the suffix "_bytes",
        for exported top-level Addr# bindings.
      
      * Makes some built-in rules more robust. This was necessary to keep them
        functional after the above changes.
      
      This is a continuation of D2554.
      
      Rebasing notes:
      This had two slightly suspicious performance regressions:
      
      * T12425: bytes allocated regressed by roughly 5%
      * T4029: bytes allocated regressed by a bit over 1%
      * T13035: bytes allocated regressed by a bit over 5%
      
      These deserve additional investigation.
      
      Rebased by: bgamari.
      
      Test Plan: ./validate --slow
      
      Reviewers: goldfire, trofi, simonmar, simonpj, austin, hvr, bgamari
      
      Reviewed By: trofi, simonpj, bgamari
      
      Subscribers: trofi, simonpj, gridaphobe, thomie
      
      Differential Revision: https://phabricator.haskell.org/D2605
      
      GHC Trac Issues: #8472
      d49b2bb2
  29. 17 Jan, 2017 1 commit
  30. 12 Jan, 2017 1 commit
  31. 10 Jan, 2017 1 commit
  32. 06 Jan, 2017 2 commits
    • Ryan Scott's avatar
      Add performance test for #13056 · 50881100
      Ryan Scott authored
      This performance regression was fixed by commit
      517d03e4 (#12234). Let's add a performance test
      to ensure that it doesn't break again.
      50881100
    • Simon Peyton Jones's avatar
      Fix the implementation of the "push rules" · b4f2afe7
      Simon Peyton Jones authored
      Richard pointed out (comment:12 of Trac #13025) that my
      implementation of the coercion "push rules", newly added
      in exprIsConAppMaybe by commit b4c3a668, wasn't quite right.
      
      But in fact that means that the implementation of those same
      rules in Simplify.simplCast was wrong too.
      
      Hence this commit:
      
      * Refactor the push rules so they are implemented in just
        one place (CoreSubst.pushCoArgs, pushCoTyArg, pushCoValArg)
        The code in Simplify gets simpler, which is nice.
      
      * Fix the bug that Richard pointed out (to do with hetero-kinded
        coercions)
      
      Then compiler performance worsened, which led mt do discover
      two performance bugs:
      
      * The smart constructor Coercion.mkNthCo didn't have a case
        for ForAllCos, which meant we stupidly build a complicated
        coercion where a simple one would do
      
      * In OptCoercion there was one place where we used CoherenceCo
        (the data constructor) rather than mkCoherenceCo (the smart
        constructor), which meant that the the stupid complicated
        coercion wasn't optimised away
      
      For reasons I don't fully understand, T5321Fun did 2% less compiler
      allocation after all this, which is good.
      b4f2afe7