1. 03 Jul, 2015 1 commit
  2. 15 Jun, 2015 1 commit
  3. 11 Jun, 2015 1 commit
    • Simon Peyton Jones's avatar
      Another major improvement of "improvement" · ddbb97d0
      Simon Peyton Jones authored
      I wasn't very happy with my fix to Trac #10009. This is much better.
      
      The main idea is that the inert set now contains a "model", which
      embodies *all* the (nominal) equalities that we know about, with
      a view to exposing unifications.  This requires a lot fewer iterations
      of the solver than before.
      
      There are extensive comments in
       TcSMonad:  Note [inert_model: the inert model]
                  Note [Adding an inert canonical constraint the InertCans]
      
      The big changes are
      
        * New inert_model field in InertCans
      
        * Functions addInertEq, addInertCan deal with adding a
          constraint, maintaining the model
      
        * A nice improvement is that unification variables can
          unify with fmvs, so that from, say   alpha ~ fmv
          we get              alpha := fmv
          See Note [Orientation of equalities with fmvs] in TcFlatten
          It's still not perfect, as the Note explains
      
      New flag -fconstraint-solver-iterations=n, allows us to control
      the number of constraint solver iterations, and in particular
      will flag up when it's more than a small number.
      
      Performance is generally slightly better:
      T5837 is a lot better for some reason.
      ddbb97d0
  4. 01 Jun, 2015 1 commit
    • Herbert Valerio Riedel's avatar
      Re-center perf numbers for T5631 · 34dcf8a0
      Herbert Valerio Riedel authored
      7dd0ea74 seems to have tipped this one over,
      although 7dd0ea74 itself had only a minimal impact on my local system.
      
      Locally, I measured right before 7dd0ea74:
      
        Expected    T5631(normal) bytes allocated: 776121120 +/-5%
        Actual      T5631(normal) bytes allocated: 811973144
        Deviation   T5631(normal) bytes allocated:       4.6 %
      
      and at 7dd0ea74:
      
        Expected    T5631(normal) bytes allocated: 776121120 +/-5%
        Actual      T5631(normal) bytes allocated: 812288344
        Deviation   T5631(normal) bytes allocated:       4.7 %
      
      Reviewed By: austin
      
      Differential Revision: https://phabricator.haskell.org/D936
      34dcf8a0
  5. 22 May, 2015 1 commit
    • Simon Peyton Jones's avatar
      Fix a huge space leak in the mighty Simplifier · 45d9a15c
      Simon Peyton Jones authored
      This long-standing, terrible, adn somewhat subtle bug was exposed
      by Trac #10370, thanks to Reid Barton's brilliant test case (comment:3).
      
      The effect is large on the Trac #10370 test.
      Here is what the profile report says:
      
      Before:
       total time  =       24.35 secs   (24353 ticks @ 1000 us, 1 processor)
       total alloc = 11,864,360,816 bytes  (excludes profiling overheads)
      
      After:
       total time  =       21.16 secs   (21160 ticks @ 1000 us, 1 processor)
       total alloc = 7,947,141,136 bytes  (excludes profiling overheads)
      
      The /combined/ effect of the tidyOccName fix, plus this one, is dramtic
      for Trac #10370.  Here is what +RTS -s says:
      
      Before:
        15,490,210,952 bytes allocated in the heap
         1,783,919,456 bytes maximum residency (20 sample(s))
      
        MUT     time   30.117s  ( 31.383s elapsed)
        GC      time   90.103s  ( 90.107s elapsed)
        Total   time  120.843s  (122.065s elapsed)
      
      After:
         7,928,671,936 bytes allocated in the heap
            52,914,832 bytes maximum residency (25 sample(s))
      
        MUT     time   13.912s  ( 15.110s elapsed)
        GC      time    6.809s  (  6.808s elapsed)
        Total   time   20.789s  ( 21.954s elapsed)
      
      - Heap allocation halved
      - Residency cut by a factor of more than 30.
      - ELapsed time cut by a factor of 6
      
      Not bad!
      
      The details
      ~~~~~~~~~~~
      The culprit was SimplEnv.mkCoreSubst, which used mapVarEnv to do some
      impedence-matching from the substitituion used by the simplifier to
      the one used by CoreSubst.  But the impedence-mactching was recursive!
      
        mk_subst tv_env cv_env id_env
          = CoreSubst.mkSubst in_scope tv_env cv_env (mapVarEnv fiddle id_env)
      
        fiddle (DoneEx e)          = e
        fiddle (DoneId v)          = Var v
        fiddle (ContEx tv cv id e) = CoreSubst.substExpr (mk_subst tv cv id) e
      
      Inside fiddle, in the ContEx case, we may do another whole level of
      fiddle.  And so on.  Moreover, UniqFM (which is built on Data.IntMap) is
      strict, so the fiddling is done eagerly.  I didn't wok through all the
      details but the result is a gargatuan blow-up of entirely unnecessary work.
      
      Laziness would make this go away, I think, but I don't want to mess
      with IntMap.  And in any case, the impedence matching is a royal pain.
      
      In the end I simply ceased trying to use CoreSubst.substExpr in the
      simplifier, and instead just use simplExpr.  That does mean bit of
      duplication; e.g.  new code for simplRules.  But it's not a big deal
      and it's far more direct and easy to reason about.
      
      A bit of knock-on refactoring:
      
       * Data type ArgSummary moves to CoreUnfold.
      
       * interestingArg moves from CoreUnfold to SimplUtils, and gets a
         SimplEnv argument which can be used when we encounter a variable.
      
       * simplLamBndrs, addBndrRules move from SimplEnv to Simplify
         (because they now calls simplUnfolding, simplRules resp)
      
       * SimplUtils.substExpr, substUnfolding, mkCoreSubst die completely
      
       * In Simplify some several functions that were previously pure
         substitution-based functions are now monadic:
           - addBndrRules, simplRule
           - addCoerce, add_coerce in simplCast
      
       * In case 2c of Simplify.rebuildCase, there was a pretty disgusting
         expression-substitution taking place for 'rhs'; and we really don't
         want to make that monadic becuase 'rhs' can be big.
         Solution: reduce the arity of the rules for seq.
         See Note [User-defined RULES for seq] in MkId.
      45d9a15c
  6. 16 May, 2015 1 commit
  7. 10 Apr, 2015 1 commit
  8. 07 Apr, 2015 1 commit
  9. 30 Mar, 2015 1 commit
    • Joachim Breitner's avatar
      Refactor the story around switches (#10137) · de1160be
      Joachim Breitner authored
      This re-implements the code generation for case expressions at the Stg →
      Cmm level, both for data type cases as well as for integral literal
      cases. (Cases on float are still treated as before).
      
      The goal is to allow for fancier strategies in implementing them, for a
      cleaner separation of the strategy from the gritty details of Cmm, and
      to run this later than the Common Block Optimization, allowing for one
      way to attack #10124. The new module CmmSwitch contains a number of
      notes explaining this changes. For example, it creates larger
      consecutive jump tables than the previous code, if possible.
      
      nofib shows little significant overall improvement of runtime. The
      rather large wobbling comes from changes in the code block order
      (see #8082, not much we can do about it). But the decrease in code size
      alone makes this worthwhile.
      
      ```
              Program           Size    Allocs   Runtime   Elapsed  TotalMem
                  Min          -1.8%      0.0%     -6.1%     -6.1%     -2.9%
                  Max          -0.7%     +0.0%     +5.6%     +5.7%     +7.8%
       Geometric Mean          -1.4%     -0.0%     -0.3%     -0.3%     +0.0%
      ```
      
      Compilation time increases slightly:
      ```
              -1 s.d.                -----            -2.0%
              +1 s.d.                -----            +2.5%
              Average                -----            +0.3%
      ```
      
      The test case T783 regresses a lot, but it is the only one exhibiting
      any regression. The cause is the changed order of branches in an
      if-then-else tree, which makes the hoople data flow analysis traverse
      the blocks in a suboptimal order. Reverting that gets rid of this
      regression, but has a consistent, if only very small (+0.2%), negative
      effect on runtime. So I conclude that this test is an extreme outlier
      and no reason to change the code.
      
      Differential Revision: https://phabricator.haskell.org/D720
      de1160be
  10. 23 Mar, 2015 1 commit
    • eir@cis.upenn.edu's avatar
      Do proper depth checking in the flattener to avoid looping. · c1edbdfd
      eir@cis.upenn.edu authored
      This implements (roughly) the plan put forward in comment:14:ticket:7788,
      fixing #7788, #8550, #9554, #10139, and addressing concerns raised in #10079.
      There are some regressions w.r.t. GHC 7.8, but only with pathological type
      families (like F a = F a). This also (hopefully -- don't have a test case)
      fixes #10158. Unsolved problems include #10184 and #10185, which are both
      known deficiencies of the approach used here.
      
      As part of this change, the plumbing around detecting infinite loops has
      changed. Instead of -fcontext-stack and -ftype-function-depth, we now have
      one combined -freduction-depth parameter. Setting it to 0 disbales the
      check, which is now the recommended way to get (terminating) code to
      typecheck in releases. (The number of reduction steps may well change between
      minor GHC releases!)
      
      This commit also introduces a new IntWithInf type in BasicTypes
      that represents an integer+infinity. This type is used in a few
      places throughout the code.
      
      Tests in
        indexed-types/should_fail/T7788
        indexed-types/should_fail/T8550
        indexed-types/should_fail/T9554
        indexed-types/should_compile/T10079
        indexed-types/should_compile/T10139
        typecheck/should_compile/T10184  (expected broken)
        typecheck/should_compile/T10185  (expected broken)
      
      This commit also changes performance testsuite numbers, for the better.
      c1edbdfd
  11. 22 Jan, 2015 3 commits
  12. 20 Jan, 2015 1 commit
  13. 19 Jan, 2015 1 commit
  14. 08 Jan, 2015 1 commit
  15. 07 Jan, 2015 1 commit
    • Edward Z. Yang's avatar
      Compress TypeMap TrieMap leaves with singleton constructor. · da64ab53
      Edward Z. Yang authored
      Suppose we have a handful H of entries in a TrieMap, each with a very large
      key, size K. If you fold over such a TrieMap you'd expect time O(H). That would
      certainly be true of an association list! But with TrieMap we actually have to
      navigate down a long singleton structure to get to the elements, so it takes
      time O(K*H).  The point of a TrieMap is that you need to navigate to the point
      where only one key remains, and then things should be fast.
      
      This is a starting point: we can improve the patch by generalizing the
      singleton constructor so it applies to CoercionMap and CoreMap; I'll do this
      in a later commit.
      
      Summary: Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
      
      Test Plan: validate
      
      Reviewers: simonpj, austin
      
      Subscribers: carter, thomie
      
      Differential Revision: https://phabricator.haskell.org/D606
      
      GHC Trac Issues: #9960
      da64ab53
  16. 06 Jan, 2015 1 commit
  17. 23 Dec, 2014 1 commit
    • Simon Peyton Jones's avatar
      Eliminate so-called "silent superclass parameters" · a6f0f5ab
      Simon Peyton Jones authored
      The purpose of silent superclass parameters was to solve the
      awkward problem of superclass dictinaries being bound to bottom.
      See THE PROBLEM in Note [Recursive superclasses] in TcInstDcls
      
      Although the silent-superclass idea worked,
      
        * It had non-local consequences, and had effects even in Haddock,
          where we had to discard silent parameters before displaying
          instance declarations
      
        * It had unexpected peformance costs, shown up by Trac #3064 and its
          test case.  In monad-transformer code, when constructing a Monad
          dictionary you had to pass an Applicative dictionary; and to
          construct that you neede a Functor dictionary. Yet these extra
          dictionaries were often never used.  (All this got much worse when
          we added Applicative as a superclass of Monad.) Test T3064
          compiled *far* faster after silent superclasses were eliminated.
      
        * It introduced new bugs.  For example SilentParametersOverlapping,
          T5051, and T7862, all failed to compile because of instance overlap
          directly because of the silent-superclass trick.
      
      So this patch takes a new approach, which I worked out with Dimitrios
      in the closing hours before Christmas.  It is described in detail
      in THE PROBLEM in Note [Recursive superclasses] in TcInstDcls.
      
      Seems to work great!
      
      Quite a bit of knock-on effect
      
       * The main implementation work is in tcSuperClasses in TcInstDcls
         Everything else is fall-out
      
       * IdInfo.DFunId no longer needs its n-silent argument
         * Ditto IDFunId in IfaceSyn
         * Hence interface file format changes
      
       * Now that DFunIds do not have silent superclass parameters, printing
         out instance declarations is simpler. There is tiny knock-on effect
         in Haddock, so that submodule is updated
      
       * I realised that when computing the "size of a dictionary type"
         in TcValidity.sizePred, we should be rather conservative about
         type functions, which can arbitrarily increase the size of a type.
         Hence the new datatype TypeSize, which has a TSBig constructor for
         "arbitrarily big".
      
       * instDFunType moves from TcSMonad to Inst
      
       * Interestingly, CmmNode and CmmExpr both now need a non-silent
         (Ord r) in a couple of instance declarations. These were previously
         silent but must now be explicit.
      
       * Quite a bit of wibbling in error messages
      a6f0f5ab
  18. 20 Dec, 2014 2 commits
  19. 19 Dec, 2014 1 commit
  20. 17 Dec, 2014 1 commit
    • eir@cis.upenn.edu's avatar
      Performance enhancements in TcFlatten. · 922168fd
      eir@cis.upenn.edu authored
      This commit fixes some performance regressions introduced by 0cc47eb9,
      adding more `Coercible` magic to the solver. See Note
      [flatten_many performance] in TcFlatten for more info.
      
      The improvements do not quite restore the old numbers. Given that
      the solver is really more involved now, I am accepting this regression.
      
      The way forward (I believe) would be to have *two* flatteners: one
      that deals only with nominal equalities and thus never checks roles,
      and the more general one. A nice design of keeping this performant
      without duplicating code eludes me, but someone else is welcome
      to take a stab.
      922168fd
  21. 11 Dec, 2014 2 commits
  22. 10 Dec, 2014 2 commits
  23. 08 Dec, 2014 2 commits
    • Gabor Greif's avatar
      catch some recent typos · 2515686f
      Gabor Greif authored
      2515686f
    • Simon Peyton Jones's avatar
      Revise the inert-set invariants again · 1d44261c
      Simon Peyton Jones authored
      In particular this patch
      
      - Accepts that rewriting with the inert CTyEqCans should be done recursively
        (hence removing the Bool result from flattenTyVarOuter)
      
      - Refines the kick-out criterion, in paticular to avoid kick-out of (a -f-> ty)
        when f cannot rewrite f.  This is true of Wanteds and hence reduces kick-outs
        of Wanteds, perhaps by a lot
      
      This stuff is not fully documented because the details are still settling, but
      it's looking good.
      
      (And it validates.)
      
      This patch includes the testsuite wibbles.  perf/compiler/T5030 and
      T5837 both improve in bytes-allocated (by 11% and 13% resp), which is
      good.  I'm not sure which of today's short series of patches is
      responsible, nor do I mind much.  (One could find out if necessary.)
      1d44261c
  24. 03 Dec, 2014 1 commit
  25. 12 Nov, 2014 1 commit
  26. 07 Nov, 2014 1 commit
    • David Feuer's avatar
      Improve Applicative definitions · abba3812
      David Feuer authored
      Generally clean up things relating to Applicative and Monad in `GHC.Base`
      and `Control.Applicative` to make `Applicative` feel like a bit more of a
      first-class citizen rather than just playing second fiddle to `Monad`. Use
      `coerce` and GND to improve performance and clarity.
      
      Change the default definition of `(*>)` to use `(<$)`, in case the
      `Functor` instance optimizes that.
      
      Moreover, some manually written instances are made into compiler-derived
      instances.
      
      Finally, this also adds a few AMP-related laws to the `Applicative` docstring.
      
      NOTE: These changes result in a 13% decrease in allocation for T9020
      
      Reviewed By: ekmett, hvr
      
      Differential Revision: https://phabricator.haskell.org/D432
      abba3812
  27. 06 Nov, 2014 2 commits
  28. 05 Nov, 2014 1 commit
  29. 04 Nov, 2014 3 commits
  30. 27 Oct, 2014 1 commit
  31. 19 Oct, 2014 1 commit
    • Krzysztof Gogolewski's avatar
      Python 3 support, second attempt (Trac #9184) · d576fc38
      Krzysztof Gogolewski authored
      Summary:
      This is a fixup of https://phabricator.haskell.org/D233
      
      The only difference is in findTFiles (first commit), which
      previously broke Windows runner; now I translated literally
      instead attempting to improve it, and checked it works.
      
      Test Plan:
      I used validate under 2,3 on Linux and under 2 on msys2.
      On Windows I've seen a large number of failures, but they don't
      seem to be connected with the patch.
      
      Reviewers: hvr, simonmar, thomie, austin
      
      Reviewed By: austin
      
      Subscribers: thomie, carter, ezyang, simonmar
      
      Differential Revision: https://phabricator.haskell.org/D310
      
      GHC Trac Issues: #9184
      d576fc38