1. 07 Jul, 2015 1 commit
  2. 25 Jun, 2015 1 commit
  3. 20 Jun, 2015 1 commit
  4. 16 Jun, 2015 1 commit
  5. 14 Jun, 2015 1 commit
  6. 26 May, 2015 1 commit
  7. 07 Apr, 2015 1 commit
  8. 06 Apr, 2015 1 commit
  9. 30 Mar, 2015 1 commit
    • Joachim Breitner's avatar
      Refactor the story around switches (#10137) · de1160be
      Joachim Breitner authored
      This re-implements the code generation for case expressions at the Stg →
      Cmm level, both for data type cases as well as for integral literal
      cases. (Cases on float are still treated as before).
      
      The goal is to allow for fancier strategies in implementing them, for a
      cleaner separation of the strategy from the gritty details of Cmm, and
      to run this later than the Common Block Optimization, allowing for one
      way to attack #10124. The new module CmmSwitch contains a number of
      notes explaining this changes. For example, it creates larger
      consecutive jump tables than the previous code, if possible.
      
      nofib shows little significant overall improvement of runtime. The
      rather large wobbling comes from changes in the code block order
      (see #8082, not much we can do about it). But the decrease in code size
      alone makes this worthwhile.
      
      ```
              Program           Size    Allocs   Runtime   Elapsed  TotalMem
                  Min          -1.8%      0.0%     -6.1%     -6.1%     -2.9%
                  Max          -0.7%     +0.0%     +5.6%     +5.7%     +7.8%
       Geometric Mean          -1.4%     -0.0%     -0.3%     -0.3%     +0.0%
      ```
      
      Compilation time increases slightly:
      ```
              -1 s.d.                -----            -2.0%
              +1 s.d.                -----            +2.5%
              Average                -----            +0.3%
      ```
      
      The test case T783 regresses a lot, but it is the only one exhibiting
      any regression. The cause is the changed order of branches in an
      if-then-else tree, which makes the hoople data flow analysis traverse
      the blocks in a suboptimal order. Reverting that gets rid of this
      regression, but has a consistent, if only very small (+0.2%), negative
      effect on runtime. So I conclude that this test is an extreme outlier
      and no reason to change the code.
      
      Differential Revision: https://phabricator.haskell.org/D720
      de1160be
  10. 02 Mar, 2015 1 commit
  11. 20 Feb, 2015 1 commit
    • Simon Peyton Jones's avatar
      Add a bizarre corner-case to cgExpr (Trac #9964) · 9c78d09e
      Simon Peyton Jones authored
      David Feuer managed to tickle a corner case in the
      code generator. See Note [Scrutinising VoidRep]
      in StgCmmExpr.
      
      I rejigged the comments in that area of the code generator
        Note [Dodgy unsafeCoerce 1]
        Note [Dodgy unsafeCoerce 2]
      but I can't say I fully understand them, alas.
      9c78d09e
  12. 19 Feb, 2015 1 commit
  13. 13 Jan, 2015 1 commit
  14. 06 Jan, 2015 1 commit
    • Simon Peyton Jones's avatar
      Major patch to add -fwarn-redundant-constraints · 32973bf3
      Simon Peyton Jones authored
      The idea was promted by Trac #9939, but it was Christmas, so I did
      some recreational programming that went much further.
      
      The idea is to warn when a constraint in a user-supplied context is
      redundant.  Everything is described in detail in
        Note [Tracking redundant constraints]
      in TcSimplify.
      
      Main changes:
      
       * The new ic_status field in an implication, of type ImplicStatus.
         It replaces ic_insol, and includes information about redundant
         constraints.
      
       * New function TcSimplify.setImplicationStatus sets the ic_status.
      
       * TcSigInfo has sig_report_redundant field to say whenther a
         redundant constraint should be reported; and similarly
         the FunSigCtxt constructor of UserTypeCtxt
      
       * EvBinds has a field eb_is_given, to record whether it is a given
         or wanted binding. Some consequential chagnes to creating an evidence
         binding (so that we record whether it is given or wanted).
      
       * AbsBinds field abs_ev_binds is now a *list* of TcEvBiinds;
         see Note [Typechecking plan for instance declarations] in
         TcInstDcls
      
       * Some significant changes to the type checking of instance
         declarations; Note [Typechecking plan for instance declarations]
         in TcInstDcls.
      
       * I found that TcErrors.relevantBindings was failing to zonk the
         origin of the constraint it was looking at, and hence failing to
         find some relevant bindings.  Easy to fix, and orthogonal to
         everything else, but hard to disentangle.
      
      Some minor refactorig:
      
       * TcMType.newSimpleWanteds moves to Inst, renamed as newWanteds
      
       * TcClassDcl and TcInstDcls now have their own code for typechecking
         a method body, rather than sharing a single function. The shared
         function (ws TcClassDcl.tcInstanceMethodBody) didn't have much code
         and the differences were growing confusing.
      
       * Add new function TcRnMonad.pushLevelAndCaptureConstraints, and
         use it
      
       * Add new function Bag.catBagMaybes, and use it in TcSimplify
      32973bf3
  15. 17 Dec, 2014 1 commit
  16. 16 Dec, 2014 1 commit
    • Peter Wortmann's avatar
      Debug test case and test suite way · c6306140
      Peter Wortmann authored
      Adds a test way for debug (-g -dannot-lint) as well as a test covering
      basic source tick functionality.
      
      The debug way fails for a number of test cases because of annotation
      linting: Tracing simplification (e.g. rule firings) will see
      duplicated output, and sometimes expression matching might take so
      long that the test case timeouts. We blacklist these tests.
      
      (From Phabricator D169)
      c6306140
  17. 15 Dec, 2014 1 commit
    • Carter Schonwald's avatar
      Changing prefetch primops to have a `seq`-like interface · f44333ea
      Carter Schonwald authored
      Summary:
      The current primops for prefetching do not properly work in pure code;
      namely, the primops are not 'hoisted' into the correct call sites based
      on when arguments are evaluated. Instead, they should use a `seq`-like
      interface, which will cause it to be evaluated when the needed term is.
      
      See #9353 for the full discussion.
      
      Test Plan: updated tests for pure prefetch in T8256 to reflect the design changes in #9353
      
      Reviewers: simonmar, hvr, ekmett, austin
      
      Reviewed By: ekmett, austin
      
      Subscribers: merijn, thomie, carter, simonmar
      
      Differential Revision: https://phabricator.haskell.org/D350
      
      GHC Trac Issues: #9353
      f44333ea
  18. 10 Dec, 2014 1 commit
  19. 25 Nov, 2014 1 commit
    • Simon Marlow's avatar
      Make clearNursery free · e22bc0de
      Simon Marlow authored
      Summary:
      clearNursery resets all the bd->free pointers of nursery blocks to
      make the blocks empty.  In profiles we've seen clearNursery taking
      significant amounts of time particularly with large -N and -A values.
      
      This patch moves the work of clearNursery to the point at which we
      actually need the new block, thereby introducing an invariant that
      blocks to the right of the CurrentNursery pointer still need their
      bd->free pointer reset.  This should make things faster overall,
      because we don't need to clear blocks that we don't use.
      
      Test Plan: validate
      
      Reviewers: AndreasVoellmy, ezyang, austin
      
      Subscribers: thomie, carter, ezyang, simonmar
      
      Differential Revision: https://phabricator.haskell.org/D318
      e22bc0de
  20. 21 Oct, 2014 1 commit
  21. 25 Aug, 2014 1 commit
  22. 23 Aug, 2014 1 commit
  23. 14 Aug, 2014 1 commit
    • Herbert Valerio Riedel's avatar
      Implement new CLZ and CTZ primops (re #9340) · e0c1767d
      Herbert Valerio Riedel authored
      This implements the new primops
      
        clz#, clz32#, clz64#,
        ctz#, ctz32#, ctz64#
      
      which provide efficient implementations of the popular
      count-leading-zero and count-trailing-zero respectively
      (see testcase for a pure Haskell reference implementation).
      
      On x86, NCG as well as LLVM generates code based on the BSF/BSR
      instructions (which need extra logic to make the 0-case well-defined).
      
      Test Plan: validate and succesful tests on i686 and amd64
      
      Reviewers: rwbarton, simonmar, ezyang, austin
      
      Subscribers: simonmar, relrod, ezyang, carter
      
      Differential Revision: https://phabricator.haskell.org/D144
      
      GHC Trac Issues: #9340
      e0c1767d
  24. 11 Aug, 2014 1 commit
  25. 09 Aug, 2014 1 commit
  26. 01 Aug, 2014 1 commit
  27. 31 Jul, 2014 1 commit
    • Simon Marlow's avatar
      Allow multiple entry points when allocating recursive groups (#9303) · da70f9ef
      Simon Marlow authored
      Summary:
      In this example we ended up with some code that was only reachable via
      an info table, because a branch had been optimised away by the native
      code generator.  The register allocator then got confused because it
      was only considering the first block of the proc to be an entry point,
      when actually any of the info tables are entry points.
      
      Test Plan: validate
      
      Reviewers: simonpj, austin
      
      Subscribers: simonmar, relrod, carter
      
      Differential Revision: https://phabricator.haskell.org/D88
      da70f9ef
  28. 28 Jun, 2014 1 commit
    • Herbert Valerio Riedel's avatar
      Simplify .gitignore files · 767b9ddf
      Herbert Valerio Riedel authored
      
      
      It's a bit confusing to have .gitignore files spread all over the
      filesystem. This commit tries to consolidate those into one .gitignore
      file per component. Moreover, we try to describe files to be ignored which
      happen to have a common identifying pattern by glob patterns.
      Signed-off-by: Herbert Valerio Riedel's avatarHerbert Valerio Riedel <hvr@gnu.org>
      767b9ddf
  29. 06 Jun, 2014 1 commit
  30. 30 May, 2014 1 commit
  31. 14 May, 2014 1 commit
  32. 10 May, 2014 1 commit
  33. 03 May, 2014 1 commit
  34. 19 Apr, 2014 1 commit
  35. 29 Mar, 2014 2 commits
    • Joachim Breitner's avatar
      CopySmallArrayStressTest needs random · c3108234
      Joachim Breitner authored
      c3108234
    • tibbe's avatar
      Add SmallArray# and SmallMutableArray# types · 90329b6c
      tibbe authored
      These array types are smaller than Array# and MutableArray# and are
      faster when the array size is small, as they don't have the overhead
      of a card table. Having no card table reduces the closure size with 2
      words in the typical small array case and leads to less work when
      updating or GC:ing the array.
      
      Reduces both the runtime and memory allocation by 8.8% on my insert
      benchmark for the HashMap type in the unordered-containers package,
      which makes use of lots of small arrays. With tuned GC settings
      (i.e. `+RTS -A6M`) the runtime reduction is 15%.
      
      Fixes #8923.
      90329b6c
  36. 26 Mar, 2014 1 commit
    • tibbe's avatar
      Add flags to control memcpy and memset inlining · 11b31c3c
      tibbe authored
      This adds -fmax-inline-memcpy-insns and -fmax-inline-memset-insns.
      These flags control when we inline calls to memcpy/memset with
      statically known arguments. The flag naming style is taken from GCC
      and the same limit is used by both GCC and LLVM.
      11b31c3c
  37. 22 Mar, 2014 2 commits
    • tibbe's avatar
      Enable popcnt test now when segfault is fixed · 16d04d90
      tibbe authored
      The fix was to ghc-prim.
      16d04d90
    • tibbe's avatar
      codeGen: inline allocation optimization for clone array primops · 1eece456
      tibbe authored
      The inline allocation version is 69% faster than the out-of-line
      version, when cloning an array of 16 unit elements on a 64-bit
      machine.
      
      Comparing the new and the old primop implementations isn't
      straightforward. The old version had a missing heap check that I
      discovered during the development of the new version. Comparing the
      old and the new version would requiring fixing the old version, which
      in turn means reimplementing the equivalent of MAYBE_CG in StgCmmPrim.
      
      The inline allocation threshold is configurable via
      -fmax-inline-alloc-size which gives the maximum array size, in bytes,
      to allocate inline. The size does not include the closure header size.
      
      Allowing the same primop to be either inline or out-of-line has some
      implication for how we lay out heap checks. We always place a heap
      check around out-of-line primops, as they may allocate outside of our
      knowledge. However, for the inline primops we only allow allocation
      via the standard...
      1eece456
  38. 13 Mar, 2014 1 commit