1. 29 Mar, 2014 1 commit
    • tibbe's avatar
      Add SmallArray# and SmallMutableArray# types · 90329b6c
      tibbe authored
      These array types are smaller than Array# and MutableArray# and are
      faster when the array size is small, as they don't have the overhead
      of a card table. Having no card table reduces the closure size with 2
      words in the typical small array case and leads to less work when
      updating or GC:ing the array.
      
      Reduces both the runtime and memory allocation by 8.8% on my insert
      benchmark for the HashMap type in the unordered-containers package,
      which makes use of lots of small arrays. With tuned GC settings
      (i.e. `+RTS -A6M`) the runtime reduction is 15%.
      
      Fixes #8923.
      90329b6c
  2. 28 Mar, 2014 1 commit
    • tibbe's avatar
      Make copy array ops out-of-line by default · e54828bf
      tibbe authored
      This should reduce code size when there's little to gain from inlining
      these primops, while still retaining the inlining benefit when the
      size of the copy is known statically.
      e54828bf
  3. 27 Mar, 2014 1 commit
  4. 26 Mar, 2014 1 commit
    • tibbe's avatar
      Add flags to control memcpy and memset inlining · 11b31c3c
      tibbe authored
      This adds -fmax-inline-memcpy-insns and -fmax-inline-memset-insns.
      These flags control when we inline calls to memcpy/memset with
      statically known arguments. The flag naming style is taken from GCC
      and the same limit is used by both GCC and LLVM.
      11b31c3c
  5. 25 Mar, 2014 3 commits
  6. 24 Mar, 2014 10 commits
  7. 23 Mar, 2014 3 commits
  8. 22 Mar, 2014 4 commits
    • eir@cis.upenn.edu's avatar
      Fix #8917. · c99941cf
      eir@cis.upenn.edu authored
      FamInstEnv.normaliseTcApp should normalise arguments even when
      the top-level tycon isn't a type family. This was a regression
      from 7.6 -- not sure when it happened, but it was probably my
      fault. Fixed now, in any case.
      c99941cf
    • eir@cis.upenn.edu's avatar
      Remove redundant compatibility check. · b0bcbc04
      eir@cis.upenn.edu authored
      Previously, the closed type family compatibility check was
      done even when type-checking an interface file. But interface
      files now store compatibility info, so this check was redundant.
      b0bcbc04
    • eir@cis.upenn.edu's avatar
    • tibbe's avatar
      codeGen: inline allocation optimization for clone array primops · 1eece456
      tibbe authored
      The inline allocation version is 69% faster than the out-of-line
      version, when cloning an array of 16 unit elements on a 64-bit
      machine.
      
      Comparing the new and the old primop implementations isn't
      straightforward. The old version had a missing heap check that I
      discovered during the development of the new version. Comparing the
      old and the new version would requiring fixing the old version, which
      in turn means reimplementing the equivalent of MAYBE_CG in StgCmmPrim.
      
      The inline allocation threshold is configurable via
      -fmax-inline-alloc-size which gives the maximum array size, in bytes,
      to allocate inline. The size does not include the closure header size.
      
      Allowing the same primop to be either inline or out-of-line has some
      implication for how we lay out heap checks. We always place a heap
      check around out-of-line primops, as they may allocate outside of our
      knowledge. However, for the inline primops we only allow allocation
      via the standard means (i.e. virtHp). Since the clone primops might be
      either inline or out-of-line the heap check layout code now consults
      shouldInlinePrimOp to know whether a primop will be inlined.
      1eece456
  9. 21 Mar, 2014 1 commit
  10. 19 Mar, 2014 2 commits
  11. 18 Mar, 2014 1 commit
    • Simon Peyton Jones's avatar
      Make sure we occurrence-analyse unfoldings (fixes Trac #8892) · 87bbc69c
      Simon Peyton Jones authored
      For DFunUnfoldings we were failing to occurrence-analyse the unfolding,
      and that meant that a loop breaker wasn't marked as such, which in turn
      meant it was inlined away when it still had occurrence sites.  See
      Note [Occurrrence analysis of unfoldings] in CoreUnfold.
      
      This is a pretty long-standing bug, happily nailed by John Lato.
      87bbc69c
  12. 17 Mar, 2014 4 commits
  13. 16 Mar, 2014 1 commit
  14. 14 Mar, 2014 7 commits