1. 29 Mar, 2014 1 commit
    • tibbe's avatar
      Add SmallArray# and SmallMutableArray# types · 90329b6c
      tibbe authored
      These array types are smaller than Array# and MutableArray# and are
      faster when the array size is small, as they don't have the overhead
      of a card table. Having no card table reduces the closure size with 2
      words in the typical small array case and leads to less work when
      updating or GC:ing the array.
      
      Reduces both the runtime and memory allocation by 8.8% on my insert
      benchmark for the HashMap type in the unordered-containers package,
      which makes use of lots of small arrays. With tuned GC settings
      (i.e. `+RTS -A6M`) the runtime reduction is 15%.
      
      Fixes #8923.
      90329b6c
  2. 22 Mar, 2014 1 commit
    • tibbe's avatar
      codeGen: inline allocation optimization for clone array primops · 1eece456
      tibbe authored
      The inline allocation version is 69% faster than the out-of-line
      version, when cloning an array of 16 unit elements on a 64-bit
      machine.
      
      Comparing the new and the old primop implementations isn't
      straightforward. The old version had a missing heap check that I
      discovered during the development of the new version. Comparing the
      old and the new version would requiring fixing the old version, which
      in turn means reimplementing the equivalent of MAYBE_CG in StgCmmPrim.
      
      The inline allocation threshold is configurable via
      -fmax-inline-alloc-size which gives the maximum array size, in bytes,
      to allocate inline. The size does not include the closure header size.
      
      Allowing the same primop to be either inline or out-of-line has some
      implication for how we lay out heap checks. We always place a heap
      check around out-of-line primops, as they may allocate outside of our
      knowledge. However, for the inline primops we only allow allocation
      via the standard means (i.e. virtHp). Since the clone primops might be
      either inline or out-of-line the heap check layout code now consults
      shouldInlinePrimOp to know whether a primop will be inlined.
      1eece456
  3. 17 Mar, 2014 1 commit
  4. 13 Mar, 2014 1 commit
  5. 11 Mar, 2014 2 commits
  6. 03 Feb, 2014 2 commits
    • Jan Stolarek's avatar
      Eliminate duplicate code in Cmm pipeline · dba9bf67
      Jan Stolarek authored
      End of Cmm pipeline used to be split into two alternative flows,
      depending on whether we did proc-point splitting or not. There
      was a lot of code duplication between these two branches. But it
      wasn't really necessary as the differences can be easily enclosed
      within an if-then-else. I observed no impact of this change on
      compilation performance.
      dba9bf67
    • Jan Stolarek's avatar
      Document deprecations in Hoopl · 526cbc7a
      Jan Stolarek authored
      526cbc7a
  7. 02 Feb, 2014 2 commits
  8. 01 Feb, 2014 3 commits
  9. 26 Jan, 2014 1 commit
  10. 16 Jan, 2014 4 commits
    • Simon Marlow's avatar
      Allow the argument to 'reserve' to be a compile-time expression · 58e5843a
      Simon Marlow authored
      By using the constant-folder to reduce it to an integer.
      58e5843a
    • Simon Marlow's avatar
      Add a way to reserve temporary stack space in high-level Cmm · eaa37a0f
      Simon Marlow authored
      We occasionally need to reserve some temporary memory in a primop for
      passing to a foreign function.  We've been using the stack for this,
      but when we moved to high-level Cmm it became quite fragile because
      primops are in high-level Cmm and the stack is supposed to be under
      the control of the Cmm pipeline.
      
      So this change puts things on a firmer footing by adding a new Cmm
      construct 'reserve'.  e.g. in decodeFloat_Int#:
      
          reserve 2 = tmp {
      
            mp_tmp1  = tmp + WDS(1);
            mp_tmp_w = tmp;
      
            /* Perform the operation */
            ccall __decodeFloat_Int(mp_tmp1 "ptr", mp_tmp_w "ptr", arg);
      
            r1 = W_[mp_tmp1];
            r2 = W_[mp_tmp_w];
          }
      
      reserve is described in CmmParse.y.
      
      Unfortunately the argument to reserve must be a compile-time constant.
      We might have to extend the parser to allow expressions with
      arithmetic operators if this is too restrictive.
      
      Note also that the return instruction for the procedure must be
      outside the scope of the reserved stack area, so we have to extract
      the values from the reserved area before we close the scope.  This
      means some more local variables (r1, r2 in the example above).  The
      generated code is more or less identical to what we had before though.
      eaa37a0f
    • Gabor Greif's avatar
      Typo in comment · 11f5cd94
      Gabor Greif authored
      11f5cd94
    • Simon Marlow's avatar
      Documentation on the stack layout algorithm · 78a506a9
      Simon Marlow authored
      78a506a9
  11. 10 Jan, 2014 1 commit
  12. 28 Nov, 2013 3 commits
  13. 22 Nov, 2013 3 commits
  14. 03 Nov, 2013 1 commit
  15. 26 Oct, 2013 1 commit
  16. 25 Oct, 2013 2 commits
  17. 24 Oct, 2013 1 commit
  18. 18 Oct, 2013 4 commits
  19. 17 Oct, 2013 1 commit
  20. 16 Oct, 2013 3 commits
    • Jan Stolarek's avatar
      Remove unused code · a05ffbd9
      Jan Stolarek authored
      I am removing old loopification code that has been commented out
      for long long time. We now have loopification implemented in
      the code generator (see Note [Self-recursive tail calls]) so we
      won't need to resurect this old code.
      a05ffbd9
    • Jan Stolarek's avatar
      Trailing whitespaces · 738e2f12
      Jan Stolarek authored
      738e2f12
    • Jan Stolarek's avatar
      Generate (old + 0) instead of Sp in stack checks · 94125c97
      Jan Stolarek authored
      When compiling a function we can determine how much stack space it will
      use. We therefore need to perform only a single stack check at the beginning
      of a function to see if we have enough stack space. Instead of referring
      directly to Sp - as we used to do in the past - the code generator uses
      (old + 0) in the stack check. Stack layout phase turns (old + 0) into Sp.
      
      The idea here is that, while we need to perform only one stack check for
      each function, we could in theory place more stack checks later in the
      function. They would be redundant, but not incorrect (in a sense that they
      should not change program behaviour). We need to make sure however that a
      stack check inserted after incrementing the stack pointer checks for a
      respectively smaller stack space. This would not be the case if the code
      generator produced direct references to Sp. By referencing (old + 0) we make
      sure that we always check for a correct amount of stack: when converting
      (old + 0) to Sp the stack layout phase takes into account changes already
      made to stack pointer. The idea for this change came from observations made
      while debugging #8275.
      94125c97
  21. 12 Oct, 2013 2 commits