1. 04 May, 2014 1 commit
  2. 02 May, 2014 1 commit
    • Simon Marlow's avatar
      Per-thread allocation counters and limits · b0534f78
      Simon Marlow authored
      This tracks the amount of memory allocation by each thread in a
      counter stored in the TSO.  Optionally, when the counter drops below
      zero (it counts down), the thread can be sent an asynchronous
      exception: AllocationLimitExceeded.  When this happens, given a small
      additional limit so that it can handle the exception.  See
      documentation in GHC.Conc for more details.
      
      Allocation limits are similar to timeouts, but
      
        - timeouts use real time, not CPU time.  Allocation limits do not
          count anything while the thread is blocked or in foreign code.
      
        - timeouts don't re-trigger if the thread catches the exception,
          allocation limits do.
      
        - timeouts can catch non-allocating loops, if you use
          -fno-omit-yields.  This doesn't work for allocation limits.
      
      I couldn't measure any impact on benchmarks with these changes, even
      for nofib/smp.
      b0534f78
  3. 01 Feb, 2014 1 commit
  4. 16 Jan, 2014 2 commits
  5. 18 Oct, 2013 2 commits
  6. 17 Oct, 2013 1 commit
  7. 16 Oct, 2013 1 commit
    • Jan Stolarek's avatar
      Generate (old + 0) instead of Sp in stack checks · 94125c97
      Jan Stolarek authored
      When compiling a function we can determine how much stack space it will
      use. We therefore need to perform only a single stack check at the beginning
      of a function to see if we have enough stack space. Instead of referring
      directly to Sp - as we used to do in the past - the code generator uses
      (old + 0) in the stack check. Stack layout phase turns (old + 0) into Sp.
      
      The idea here is that, while we need to perform only one stack check for
      each function, we could in theory place more stack checks later in the
      function. They would be redundant, but not incorrect (in a sense that they
      should not change program behaviour). We need to make sure however that a
      stack check inserted after incrementing the stack pointer checks for a
      respectively smaller stack space. This would not be the case if the code
      generator produced direct references to Sp. By referencing (old + 0) we make
      sure that we always check for a correct amount of stack: when converting
      (old + 0) to Sp the stack layout phase takes into account changes already
      made to stack pointer. The idea for this change came from observations made
      while debugging #8275.
      94125c97
  8. 12 Sep, 2013 1 commit
    • Jan Stolarek's avatar
      Improve sinking pass · ad15c2b4
      Jan Stolarek authored
      This commit does two things:
      
        * Allows duplicating of global registers and literals by inlining
          them. Previously we would only inline global register or literal
          if it was used only once.
      
        * Changes method of determining conflicts between a node and an
          assignment. New method has two advantages. It relies on
          DefinerOfRegs and UserOfRegs typeclasses, so if a set of registers
          defined or used by a node should ever change, `conflicts` function
          will use the changed definition. This definition also catches
          more cases than the previous one (namely CmmCall and CmmForeignCall)
          which is a step towards making it possible to run sinking pass
          before stack layout (currently this doesn't work).
      
      This patch also adds a lot of comments that are result of about two-week
      long investigation of how sinking pass works and why it does what it does.
      ad15c2b4
  9. 24 Jul, 2013 1 commit
    • Simon Marlow's avatar
      Fix a bug in stack layout with safe foreign calls (#8083) · c2348859
      Simon Marlow authored
      We weren't properly tracking the number of stack arguments in the
      continuation of a foreign call.  It happened to work when the
      continuation was not a join point, but when it was a join point we
      were using the wrong amount of stack fixup.
      c2348859
  10. 24 Apr, 2013 1 commit
  11. 23 Jan, 2013 1 commit
  12. 30 Oct, 2012 2 commits
  13. 12 Oct, 2012 1 commit
  14. 08 Oct, 2012 1 commit
    • Simon Marlow's avatar
      Produce new-style Cmm from the Cmm parser · a7c0387d
      Simon Marlow authored
      The main change here is that the Cmm parser now allows high-level cmm
      code with argument-passing and function calls.  For example:
      
      foo ( gcptr a, bits32 b )
      {
        if (b > 0) {
           // we can make tail calls passing arguments:
           jump stg_ap_0_fast(a);
        }
      
        return (x,y);
      }
      
      More details on the new cmm syntax are in Note [Syntax of .cmm files]
      in CmmParse.y.
      
      The old syntax is still more-or-less supported for those occasional
      code fragments that really need to explicitly manipulate the stack.
      However there are a couple of differences: it is now obligatory to
      give a list of live GlobalRegs on every jump, e.g.
      
        jump %ENTRY_CODE(Sp(0)) [R1];
      
      Again, more details in Note [Syntax of .cmm files].
      
      I have rewritten most of the .cmm files in the RTS into the new
      syntax, except for AutoApply.cmm which is generated by the genapply
      program: this file could be generated in the new syntax instead and
      would probably be better off for it, but I ran out of enthusiasm.
      
      Some other changes in this batch:
      
       - The PrimOp calling convention is gone, primops now use the ordinary
         NativeNodeCall convention.  This means that primops and "foreign
         import prim" code must be written in high-level cmm, but they can
         now take more than 10 arguments.
      
       - CmmSink now does constant-folding (should fix #7219)
      
       - .cmm files now go through the cmmPipeline, and as a result we
         generate better code in many cases.  All the object files generated
         for the RTS .cmm files are now smaller.  Performance should be
         better too, but I haven't measured it yet.
      
       - RET_DYN frames are removed from the RTS, lots of code goes away
      
       - we now have some more canned GC points to cover unboxed-tuples with
         2-4 pointers, which will reduce code size a little.
      a7c0387d
  15. 24 Sep, 2012 1 commit
  16. 20 Sep, 2012 1 commit
  17. 16 Sep, 2012 1 commit
  18. 12 Sep, 2012 3 commits
  19. 31 Aug, 2012 1 commit
  20. 07 Aug, 2012 3 commits
  21. 06 Aug, 2012 1 commit
  22. 02 Aug, 2012 1 commit
  23. 30 Jul, 2012 3 commits
  24. 24 Jul, 2012 1 commit
  25. 20 Jul, 2012 1 commit
  26. 17 Jul, 2012 1 commit
  27. 13 Jul, 2012 1 commit
  28. 11 Jul, 2012 1 commit
  29. 09 Jul, 2012 1 commit
  30. 05 Jul, 2012 1 commit
  31. 04 Jul, 2012 1 commit