1. 07 Jul, 2015 1 commit
  2. 25 Jun, 2015 1 commit
  3. 20 Jun, 2015 1 commit
  4. 16 Jun, 2015 1 commit
  5. 26 May, 2015 1 commit
  6. 07 Apr, 2015 1 commit
  7. 06 Apr, 2015 1 commit
  8. 30 Mar, 2015 1 commit
    • Joachim Breitner's avatar
      Refactor the story around switches (#10137) · de1160be
      Joachim Breitner authored
      This re-implements the code generation for case expressions at the Stg →
      Cmm level, both for data type cases as well as for integral literal
      cases. (Cases on float are still treated as before).
      
      The goal is to allow for fancier strategies in implementing them, for a
      cleaner separation of the strategy from the gritty details of Cmm, and
      to run this later than the Common Block Optimization, allowing for one
      way to attack #10124. The new module CmmSwitch contains a number of
      notes explaining this changes. For example, it creates larger
      consecutive jump tables than the previous code, if possible.
      
      nofib shows little significant overall improvement of runtime. The
      rather large wobbling comes from changes in the code block order
      (see #8082, not much we can do about it). But the decrease in code size
      alone makes this worthwhile.
      
      ```
              Program           Size    Allocs   Runtime   Elapsed  TotalMem
                  Min          -1.8%      0.0%     -6.1%     -6.1%     -2.9%
                  Max          -0.7%     +0.0%     +5.6%     +5.7%     +7.8%
       Geometric Mean          -1.4%     -0.0%     -0.3%     -0.3%     +0.0%
      ```
      
      Compilation time increases slightly:
      ```
              -1 s.d.                -----            -2.0%
              +1 s.d.                -----            +2.5%
              Average                -----            +0.3%
      ```
      
      The test case T783 regresses a lot, but it is the only one exhibiting
      any regression. The cause is the changed order of branches in an
      if-then-else tree, which makes the hoople data flow analysis traverse
      the blocks in a suboptimal order. Reverting that gets rid of this
      regression, but has a consistent, if only very small (+0.2%), negative
      effect on runtime. So I conclude that this test is an extreme outlier
      and no reason to change the code.
      
      Differential Revision: https://phabricator.haskell.org/D720
      de1160be
  9. 02 Mar, 2015 1 commit
  10. 19 Feb, 2015 1 commit
  11. 13 Jan, 2015 1 commit
  12. 15 Dec, 2014 1 commit
    • Carter Schonwald's avatar
      Changing prefetch primops to have a `seq`-like interface · f44333ea
      Carter Schonwald authored
      Summary:
      The current primops for prefetching do not properly work in pure code;
      namely, the primops are not 'hoisted' into the correct call sites based
      on when arguments are evaluated. Instead, they should use a `seq`-like
      interface, which will cause it to be evaluated when the needed term is.
      
      See #9353 for the full discussion.
      
      Test Plan: updated tests for pure prefetch in T8256 to reflect the design changes in #9353
      
      Reviewers: simonmar, hvr, ekmett, austin
      
      Reviewed By: ekmett, austin
      
      Subscribers: merijn, thomie, carter, simonmar
      
      Differential Revision: https://phabricator.haskell.org/D350
      
      GHC Trac Issues: #9353
      f44333ea
  13. 10 Dec, 2014 1 commit
  14. 25 Nov, 2014 1 commit
    • Simon Marlow's avatar
      Make clearNursery free · e22bc0de
      Simon Marlow authored
      Summary:
      clearNursery resets all the bd->free pointers of nursery blocks to
      make the blocks empty.  In profiles we've seen clearNursery taking
      significant amounts of time particularly with large -N and -A values.
      
      This patch moves the work of clearNursery to the point at which we
      actually need the new block, thereby introducing an invariant that
      blocks to the right of the CurrentNursery pointer still need their
      bd->free pointer reset.  This should make things faster overall,
      because we don't need to clear blocks that we don't use.
      
      Test Plan: validate
      
      Reviewers: AndreasVoellmy, ezyang, austin
      
      Subscribers: thomie, carter, ezyang, simonmar
      
      Differential Revision: https://phabricator.haskell.org/D318
      e22bc0de
  15. 21 Oct, 2014 1 commit
  16. 14 Aug, 2014 1 commit
    • Herbert Valerio Riedel's avatar
      Implement new CLZ and CTZ primops (re #9340) · e0c1767d
      Herbert Valerio Riedel authored
      This implements the new primops
      
        clz#, clz32#, clz64#,
        ctz#, ctz32#, ctz64#
      
      which provide efficient implementations of the popular
      count-leading-zero and count-trailing-zero respectively
      (see testcase for a pure Haskell reference implementation).
      
      On x86, NCG as well as LLVM generates code based on the BSF/BSR
      instructions (which need extra logic to make the 0-case well-defined).
      
      Test Plan: validate and succesful tests on i686 and amd64
      
      Reviewers: rwbarton, simonmar, ezyang, austin
      
      Subscribers: simonmar, relrod, ezyang, carter
      
      Differential Revision: https://phabricator.haskell.org/D144
      
      GHC Trac Issues: #9340
      e0c1767d
  17. 11 Aug, 2014 1 commit
  18. 09 Aug, 2014 1 commit
  19. 28 Jun, 2014 1 commit
    • Herbert Valerio Riedel's avatar
      Simplify .gitignore files · 767b9ddf
      Herbert Valerio Riedel authored
      
      
      It's a bit confusing to have .gitignore files spread all over the
      filesystem. This commit tries to consolidate those into one .gitignore
      file per component. Moreover, we try to describe files to be ignored which
      happen to have a common identifying pattern by glob patterns.
      Signed-off-by: Herbert Valerio Riedel's avatarHerbert Valerio Riedel <hvr@gnu.org>
      767b9ddf
  20. 30 May, 2014 1 commit
  21. 14 May, 2014 1 commit
  22. 10 May, 2014 1 commit
  23. 03 May, 2014 1 commit
  24. 19 Apr, 2014 1 commit
  25. 29 Mar, 2014 2 commits
    • Joachim Breitner's avatar
      CopySmallArrayStressTest needs random · c3108234
      Joachim Breitner authored
      c3108234
    • tibbe's avatar
      Add SmallArray# and SmallMutableArray# types · 90329b6c
      tibbe authored
      These array types are smaller than Array# and MutableArray# and are
      faster when the array size is small, as they don't have the overhead
      of a card table. Having no card table reduces the closure size with 2
      words in the typical small array case and leads to less work when
      updating or GC:ing the array.
      
      Reduces both the runtime and memory allocation by 8.8% on my insert
      benchmark for the HashMap type in the unordered-containers package,
      which makes use of lots of small arrays. With tuned GC settings
      (i.e. `+RTS -A6M`) the runtime reduction is 15%.
      
      Fixes #8923.
      90329b6c
  26. 22 Mar, 2014 2 commits
    • tibbe's avatar
      Enable popcnt test now when segfault is fixed · 16d04d90
      tibbe authored
      The fix was to ghc-prim.
      16d04d90
    • tibbe's avatar
      codeGen: inline allocation optimization for clone array primops · 1eece456
      tibbe authored
      The inline allocation version is 69% faster than the out-of-line
      version, when cloning an array of 16 unit elements on a 64-bit
      machine.
      
      Comparing the new and the old primop implementations isn't
      straightforward. The old version had a missing heap check that I
      discovered during the development of the new version. Comparing the
      old and the new version would requiring fixing the old version, which
      in turn means reimplementing the equivalent of MAYBE_CG in StgCmmPrim.
      
      The inline allocation threshold is configurable via
      -fmax-inline-alloc-size which gives the maximum array size, in bytes,
      to allocate inline. The size does not include the closure header size.
      
      Allowing the same primop to be either inline or out-of-line has some
      implication for how we lay out heap checks. We always place a heap
      check around out-of-line primops, as they may allocate outside of our
      knowledge. However, for the inline primops we only allow allocation
      via the standard...
      1eece456
  27. 13 Mar, 2014 1 commit
  28. 11 Mar, 2014 2 commits
  29. 08 Feb, 2014 1 commit
  30. 28 Nov, 2013 1 commit
  31. 11 Oct, 2013 2 commits
  32. 02 Oct, 2013 2 commits
  33. 18 Sep, 2013 2 commits
  34. 15 Sep, 2013 1 commit