1. 02 May, 2014 1 commit
    • Simon Marlow's avatar
      Per-thread allocation counters and limits · b0534f78
      Simon Marlow authored
      This tracks the amount of memory allocation by each thread in a
      counter stored in the TSO.  Optionally, when the counter drops below
      zero (it counts down), the thread can be sent an asynchronous
      exception: AllocationLimitExceeded.  When this happens, given a small
      additional limit so that it can handle the exception.  See
      documentation in GHC.Conc for more details.
      
      Allocation limits are similar to timeouts, but
      
        - timeouts use real time, not CPU time.  Allocation limits do not
          count anything while the thread is blocked or in foreign code.
      
        - timeouts don't re-trigger if the thread catches the exception,
          allocation limits do.
      
        - timeouts can catch non-allocating loops, if you use
          -fno-omit-yields.  This doesn't work for allocation limits.
      
      I couldn't measure any impact on benchmarks with these changes, even
      for nofib/smp.
      b0534f78
  2. 29 Apr, 2014 1 commit
    • Arash Rouhani's avatar
      Rts: Consistently use StgWord for sizes of bitmaps · 43b3bab3
      Arash Rouhani authored
      A long debate is in issue #8742, but the main motivation is that this
      allows for applying a patch to reuse the function scavenge_small_bitmap
      without changing the .o-file output.
      
      Similarly, I changed the types in rts/sm/Compact.c, so I can create
      a STATIC_INLINE function for the redundant code block:
      
              while (size > 0) {
                  if ((bitmap & 1) == 0) {
                      thread((StgClosure **)p);
                  }
                  p++;
                  bitmap = bitmap >> 1;
                  size--;
              }
      43b3bab3
  3. 24 Apr, 2014 1 commit
  4. 22 Apr, 2014 2 commits
    • Colin Watson's avatar
      Be less untruthful about the prototypes of external functions · 5a31f231
      Colin Watson authored
      GHC's generated C code uses dummy prototypes for foreign imports.  At the
      moment these all claim to be (void), i.e. functions of zero arguments.  On
      most platforms this doesn't matter very much: calls to these functions put
      the parameters in the usual places anyway, and (with the exception of
      varargs) things just work.
      
      However, the ELFv2 ABI on ppc64 optimises stack allocation
      (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01149.html
      
      ): a call to a
      function that has a prototype, is not varargs, and receives all parameters
      in registers rather than on the stack does not require the caller to
      allocate an argument save area.  The incorrect prototypes cause GCC to
      believe that all functions declared this way can be called without an
      argument save area, but if the callee has sufficiently many arguments then
      it will expect that area to be present, and will thus corrupt the caller's
      stack.  This happens in particular with calls to runInteractiveProcess in
      libraries/process/cbits/runProcess.c.
      
      The simplest fix appears to be to declare these external functions with an
      unspecified argument list rather than a void argument list.  This is no
      worse for platforms that don't care either way, and allows a successful
      bootstrap of GHC 7.8 on little-endian Linux ppc64 (which uses the ELFv2
      ABI).
      
      Fixes #8965
      Signed-off-by: default avatarColin Watson <cjwatson@debian.org>
      Signed-off-by: default avatarAustin Seipp <austin@well-typed.com>
      5a31f231
    • Colin Watson's avatar
      ghc: initial AArch64 patches · c29bf984
      Colin Watson authored
      
      Signed-off-by: default avatarAustin Seipp <austin@well-typed.com>
      c29bf984
  5. 29 Mar, 2014 2 commits
    • tibbe's avatar
      Add missing symbols to linker · 838bfb22
      tibbe authored
      The copy array family of primops were moved out-of-line.
      838bfb22
    • tibbe's avatar
      Add SmallArray# and SmallMutableArray# types · 90329b6c
      tibbe authored
      These array types are smaller than Array# and MutableArray# and are
      faster when the array size is small, as they don't have the overhead
      of a card table. Having no card table reduces the closure size with 2
      words in the typical small array case and leads to less work when
      updating or GC:ing the array.
      
      Reduces both the runtime and memory allocation by 8.8% on my insert
      benchmark for the HashMap type in the unordered-containers package,
      which makes use of lots of small arrays. With tuned GC settings
      (i.e. `+RTS -A6M`) the runtime reduction is 15%.
      
      Fixes #8923.
      90329b6c
  6. 28 Mar, 2014 1 commit
    • tibbe's avatar
      Make copy array ops out-of-line by default · e54828bf
      tibbe authored
      This should reduce code size when there's little to gain from inlining
      these primops, while still retaining the inlining benefit when the
      size of the copy is known statically.
      e54828bf
  7. 22 Mar, 2014 2 commits
    • tibbe's avatar
      Follow hs_popcntX changes in ghc-prim · 1a63f17f
      tibbe authored
      1a63f17f
    • tibbe's avatar
      codeGen: inline allocation optimization for clone array primops · 1eece456
      tibbe authored
      The inline allocation version is 69% faster than the out-of-line
      version, when cloning an array of 16 unit elements on a 64-bit
      machine.
      
      Comparing the new and the old primop implementations isn't
      straightforward. The old version had a missing heap check that I
      discovered during the development of the new version. Comparing the
      old and the new version would requiring fixing the old version, which
      in turn means reimplementing the equivalent of MAYBE_CG in StgCmmPrim.
      
      The inline allocation threshold is configurable via
      -fmax-inline-alloc-size which gives the maximum array size, in bytes,
      to allocate inline. The size does not include the closure header size.
      
      Allowing the same primop to be either inline or out-of-line has some
      implication for how we lay out heap checks. We always place a heap
      check around out-of-line primops, as they may allocate outside of our
      knowledge. However, for the inline primops we only allow allocation
      via the standard means (i.e. virtHp). Since the clone primops might be
      either inline or out-of-line the heap check layout code now consults
      shouldInlinePrimOp to know whether a primop will be inlined.
      1eece456
  8. 13 Mar, 2014 1 commit
  9. 27 Feb, 2014 1 commit
  10. 17 Feb, 2014 2 commits
  11. 19 Jan, 2014 1 commit
  12. 13 Jan, 2014 1 commit
  13. 04 Dec, 2013 2 commits
    • parcs's avatar
      Move the allocation of CAF blackholes into 'newCAF' (#8590) · 55c703b8
      parcs authored
      We now do the allocation of the blackhole indirection closure inside the
      RTS procedure 'newCAF' instead of generating the allocation code inline
      in the closure body of each CAF.  This slightly decreases code size in
      modules with a lot of CAFs.
      
      As a result of this change, for example, the size of DynFlags.o drops by
      ~60KB and HsExpr.o by ~100KB.
      55c703b8
    • parcs's avatar
      Untab ClosureTypes.h and ClosureFlags.c · 4f603db2
      parcs authored
      4f603db2
  14. 22 Nov, 2013 3 commits
  15. 21 Nov, 2013 2 commits
    • Simon Marlow's avatar
      Allow the linker to be used without retaining CAFs unconditionally · 5874f13f
      Simon Marlow authored
      This creates a new C API:
      
         initLinker_ (int retain_cafs)
      
      The old initLinker() was left as-is for backwards compatibility.  See
      documentation in Linker.h.
      5874f13f
    • Simon Marlow's avatar
      In the DEBUG rts, track when CAFs are GC'd · e82fa829
      Simon Marlow authored
      This resurrects some old code and makes it work again.  The idea is
      that we want to get an error message if we ever enter a CAF that has
      been GC'd, rather than following its indirection which will likely
      cause a segfault.  Without this patch, these bugs are hard to track
      down in gdb, because the IND_STATIC code overwrites R1 (the pointer to
      the CAF) with its indirectee before jumping into bad memory, so we've
      lost the address of the CAF that got GC'd.
      
      Some associated refactoring while I was here.
      e82fa829
  16. 14 Nov, 2013 1 commit
    • Duncan Coutts's avatar
      Improve the shutdownHaskellAndSignal and add fast exit · a987b800
      Duncan Coutts authored
      This is the RTS part of a patch to base's topHandler to handle exiting
      by a signal.
      
      The intended behaviour is that on Unix, throwing ExitFailure (-sig)
      results in the process terminating with that signal. Previously
      shutdownHaskellAndSignal was only used for exiting with SIGINT due to
      the UserInterrupt exception.
      
      Improve shutdownHaskellAndSignal to do the signal part more carefully.
      In particular, it (should) now reliably terminates the process one way
      or another. Previusly if the signal was blocked, ignored or handled then
      shutdownHaskellAndSignal would actually return!
      
      Also, the topHandler code has two paths a careful shutdown and a "fast
      exit" where it does not give finalisers a chance to run. We want to
      support that mode also when we want to exit by signal. So rather than
      the base code directly calling stg_exit as it did before, we have a
      fastExit bool paramater for both shutdownHaskellAnd{Exit,Signal}.
      a987b800
  17. 25 Oct, 2013 5 commits
  18. 12 Oct, 2013 1 commit
  19. 11 Oct, 2013 2 commits
  20. 01 Oct, 2013 2 commits
    • Simon Marlow's avatar
    • Simon Marlow's avatar
      Remove use of R9, and fix associated bugs · 11b5ce55
      Simon Marlow authored
      We were passing the function address to stg_gc_prim_p in R9, which was
      wrong because the call was a high-level call and didn't declare R9 as
      a parameter.  Passing R9 as an argument is the right way, but
      unfortunately that exposed another bug: we were using the same macro
      in some low-level Cmm, where it is illegal to call functions with
      arguments (see Note [syntax of cmm files]).  So we now have low-level
      variants of STK_CHK() and STK_CHK_P() for use in low-level Cmm code.
      11b5ce55
  21. 23 Sep, 2013 6 commits