16 Aug, 2014
      Revert "Fix typos 'resizze'"
      this is z-encoding (as hvr tells me)
      This reverts commit 425d5178.
      Fix typos 'resizze'
      Implement {resize,shrink}MutableByteArray# primops
      The two new primops with the type-signatures
        resizeMutableByteArray# :: MutableByteArray# s -> Int#
                                -> State# s -> (# State# s, MutableByteArray# s #)
        shrinkMutableByteArray# :: MutableByteArray# s -> Int#
                                -> State# s -> State# s
      allow to resize MutableByteArray#s in-place (when possible), and are useful
      for algorithms where memory is temporarily over-allocated. The motivating
      use-case is for implementing integer backends, where the final target size of
      the result is either N or N+1, and only known after the operation has been
      A future commit will implement a stateful variant of the
      `sizeofMutableByteArray#` operation (see #9447 for details), since now the
      size of a `MutableByteArray#` may change over its lifetime (i.e before
      it gets frozen or GCed).
      Test Plan: ./validate --slow
      Reviewers: ezyang, austin, simonmar
      Reviewed By: austin, simonmar
      Differential Revision: https://phabricator.haskell.org/D133
  30 Jun, 2014
      Re-add more primops for atomic ops on byte arrays
      This is the second attempt to add this functionality. The first
      attempt was reverted in 950fcae4, due
      to register allocator failure on x86. Given how the register
      allocator currently works, we don't have enough registers on x86 to
      support cmpxchg using complicated addressing modes. Instead we fall
      back to a simpler addressing mode on x86.
      Adds the following primops:
       * atomicReadIntArray#
       * atomicWriteIntArray#
       * fetchSubIntArray#
       * fetchOrIntArray#
       * fetchXorIntArray#
       * fetchAndIntArray#
      Makes these pre-existing out-of-line primops inline:
       * fetchAddIntArray#
       * casIntArray#
  26 Jun, 2014
  24 Jun, 2014
      Add more primops for atomic ops on byte arrays
      Add more primops for atomic ops on byte arrays
      Adds the following primops:
       * atomicReadIntArray#
       * atomicWriteIntArray#
       * fetchSubIntArray#
       * fetchOrIntArray#
       * fetchXorIntArray#
       * fetchAndIntArray#
      Makes these pre-existing out-of-line primops inline:
       * fetchAddIntArray#
       * casIntArray#
  29 Mar, 2014
      Add missing symbols to linker
      The copy array family of primops were moved out-of-line.
      Add SmallArray# and SmallMutableArray# types
      These array types are smaller than Array# and MutableArray# and are
      faster when the array size is small, as they don't have the overhead
      of a card table. Having no card table reduces the closure size with 2
      words in the typical small array case and leads to less work when
      updating or GC:ing the array.
      Reduces both the runtime and memory allocation by 8.8% on my insert
      benchmark for the HashMap type in the unordered-containers package,
      which makes use of lots of small arrays. With tuned GC settings
      (i.e. `+RTS -A6M`) the runtime reduction is 15%.
      Fixes #8923.
  22 Mar, 2014
      codeGen: inline allocation optimization for clone array primops
      The inline allocation version is 69% faster than the out-of-line
      version, when cloning an array of 16 unit elements on a 64-bit
      Comparing the new and the old primop implementations isn't
      straightforward. The old version had a missing heap check that I
      discovered during the development of the new version. Comparing the
      old and the new version would requiring fixing the old version, which
      in turn means reimplementing the equivalent of MAYBE_CG in StgCmmPrim.
      The inline allocation threshold is configurable via
      -fmax-inline-alloc-size which gives the maximum array size, in bytes,
      to allocate inline. The size does not include the closure header size.
      Allowing the same primop to be either inline or out-of-line has some
      implication for how we lay out heap checks. We always place a heap
      check around out-of-line primops, as they may allocate outside of our
      knowledge. However, for the inline primops we only allow allocation
      via the standard means (i.e. virtHp). Since the clone primops might be
      either inline or out-of-line the heap check layout code now consults
      shouldInlinePrimOp to know whether a primop will be inlined.
  17 Feb, 2014
  13 Jan, 2014
  21 Nov, 2013
      In the DEBUG rts, track when CAFs are GC'd
      This resurrects some old code and makes it work again.  The idea is
      that we want to get an error message if we ever enter a CAF that has
      been GC'd, rather than following its indirection which will likely
      cause a segfault.  Without this patch, these bugs are hard to track
      down in gdb, because the IND_STATIC code overwrites R1 (the pointer to
      the CAF) with its indirectee before jumping into bad memory, so we've
      lost the address of the CAF that got GC'd.
      Some associated refactoring while I was here.
  01 Oct, 2013
      Remove use of R9, and fix associated bugs
      We were passing the function address to stg_gc_prim_p in R9, which was
      wrong because the call was a high-level call and didn't declare R9 as
      a parameter.  Passing R9 as an argument is the right way, but
      unfortunately that exposed another bug: we were using the same macro
      in some low-level Cmm, where it is illegal to call functions with
      arguments (see Note [syntax of cmm files]).  So we now have low-level
      variants of STK_CHK() and STK_CHK_P() for use in low-level Cmm code.
  23 Sep, 2013
  21 Aug, 2013
  13 Jul, 2013
  10 Jul, 2013
  09 Jul, 2013
  22 Jun, 2013
  15 Jun, 2013
      Allow multiple C finalizers to be attached to a Weak#
      The commit replaces mkWeakForeignEnv# with addCFinalizerToWeak#.
      This new primop mutates an existing Weak# object and adds a new
      C finalizer to it.
      This change removes an invariant in MarkWeak.c, namely that the relative
      order of Weak# objects in the list needs to be preserved across GC. This
      makes it easier to split the list into per-generation structures.
      The patch also removes a race condition between two threads calling
      finalizeWeak# on the same WEAK object at that same time.
  18 Feb, 2013
  14 Feb, 2013
  01 Feb, 2013
  30 Jan, 2013
      STM: Only wake up once
      Previously, threads blocked on an STM retry would be sent a wakeup
      message each time an unpark was requested. This could result in the
      accumulation of a large number of wake-up messages, which would slow
      wake-up once the sleeping thread is finally scheduled.
      Here, we introduce a new closure type, STM_AWOKEN, which marks a TSO
      which has been sent a wake-up message, allowing us to send only one
  16 Nov, 2012
      Add a write barrier for TVAR closures
      This improves GC performance when there are a lot of TVars in the
      heap.  For instance, a TChan with a lot of elements causes a massive
      GC drag without this patch.
      There's more to do - several other STM closure types don't have write
      barriers, so GC performance when there are a lot of threads blocked on
      STM isn't great.  But fixing the problem for TVar is a good start.
  25 Oct, 2012
  23 Oct, 2012
  15 Oct, 2012
      Add a new traceMarker# primop for use in profiling output
      In time-based profiling visualisations (e.g. heap profiles and ThreadScope)
      it would be useful to be able to mark particular points in the execution and
      have those points in time marked in the visualisation.
      The traceMarker# primop currently emits an event into the eventlog. In
      principle it could be extended to do something in the heap profiling too.
  08 Oct, 2012
      Produce new-style Cmm from the Cmm parser
      The main change here is that the Cmm parser now allows high-level cmm
      code with argument-passing and function calls.  For example:
      foo ( gcptr a, bits32 b )
        if (b > 0) {
           // we can make tail calls passing arguments:
           jump stg_ap_0_fast(a);
        return (x,y);
      More details on the new cmm syntax are in Note [Syntax of .cmm files]
      in CmmParse.y.
      The old syntax is still more-or-less supported for those occasional
      code fragments that really need to explicitly manipulate the stack.
      However there are a couple of differences: it is now obligatory to
      give a list of live GlobalRegs on every jump, e.g.
        jump %ENTRY_CODE(Sp(0)) [R1];
      Again, more details in Note [Syntax of .cmm files].
      I have rewritten most of the .cmm files in the RTS into the new
      syntax, except for AutoApply.cmm which is generated by the genapply
      program: this file could be generated in the new syntax instead and
      would probably be better off for it, but I ran out of enthusiasm.
      Some other changes in this batch:
       - The PrimOp calling convention is gone, primops now use the ordinary
         NativeNodeCall convention.  This means that primops and "foreign
         import prim" code must be written in high-level cmm, but they can
         now take more than 10 arguments.
       - CmmSink now does constant-folding (should fix #7219)
       - .cmm files now go through the cmmPipeline, and as a result we
         generate better code in many cases.  All the object files generated
         for the RTS .cmm files are now smaller.  Performance should be
         better too, but I haven't measured it yet.
       - RET_DYN frames are removed from the RTS, lots of code goes away
       - we now have some more canned GC points to cover unboxed-tuples with
         2-4 pointers, which will reduce code size a little.
  27 Apr, 2012
  27 Feb, 2012
  07 Dec, 2011
      Add new primtypes 'ArrayArray#' and 'MutableArrayArray#'
      The primitive array types, such as 'ByteArray#', have kind #, but are represented by pointers. They are boxed, but unpointed types (i.e., they cannot be 'undefined').
      The two categories of array types —[Mutable]Array# and [Mutable]ByteArray#— are containers for unboxed (and unpointed) as well as for boxed and pointed types.  So far, we lacked support for containers for boxed, unpointed types (i.e., containers for the primitive arrays themselves).  This is what the new primtypes provide.
      Containers for boxed, unpointed types are crucial for the efficient implementation of scattered nested arrays, which are central to the new DPH backend library dph-lifted-vseg.  Without such containers, we cannot eliminate all unboxing from the inner loops of traversals processing scattered nested arrays.
  29 Nov, 2011
      Make profiling work with multiple capabilities (+RTS -N)
      This means that both time and heap profiling work for parallel
      programs.  Main internal changes:
        - CCCS is no longer a global variable; it is now another
          pseudo-register in the StgRegTable struct.  Thus every
          Capability has its own CCCS.
        - There is a new built-in CCS called "IDLE", which records ticks for
          Capabilities in the idle state.  If you profile a single-threaded
          program with +RTS -N2, you'll see about 50% of time in "IDLE".
        - There is appropriate locking in rts/Profiling.c to protect the
          shared cost-centre-stack data structures.
      This patch does enough to get it working, I have cut one big corner:
      the cost-centre-stack data structure is still shared amongst all
      Capabilities, which means that multiple Capabilities will race when
      updating the "allocations" and "entries" fields of a CCS.  Not only
      does this give unpredictable results, but it runs very slowly due to
      cache line bouncing.
      It is strongly recommended that you use -fno-prof-count-entries to
      disable the "entries" count when profiling parallel programs. (I shall
      add a note to this effect to the docs).
  02 Nov, 2011
      Overhaul of infrastructure for profiling, coverage (HPC) and breakpoints
      User visible changes
      Flags renamed (the old ones are still accepted for now):
        OLD            NEW
        ---------      ------------
        -auto-all      -fprof-auto
        -auto          -fprof-exported
        -caf-all       -fprof-cafs
      New flags:
        -fprof-auto              Annotates all bindings (not just top-level
                                 ones) with SCCs
        -fprof-top               Annotates just top-level bindings with SCCs
        -fprof-exported          Annotates just exported bindings with SCCs
        -fprof-no-count-entries  Do not maintain entry counts when profiling
                                 (can make profiled code go faster; useful with
                                 heap profiling where entry counts are not used)
      Cost-centre stacks have a new semantics, which should in most cases
      result in more useful and intuitive profiles.  If you find this not to
      be the case, please let me know.  This is the area where I have been
      experimenting most, and the current solution is probably not the
      final version, however it does address all the outstanding bugs and
      seems to be better than GHC 7.2.
      Stack traces
      +RTS -xc now gives more information.  If the exception originates from
      a CAF (as is common, because GHC tends to lift exceptions out to the
      top-level), then the RTS walks up the stack and reports the stack in
      the enclosing update frame(s).
      Result: +RTS -xc is much more useful now - but you still have to
      compile for profiling to get it.  I've played around a little with
      adding 'head []' to GHC itself, and +RTS -xc does pinpoint the problem
      quite accurately.
      I plan to add more facilities for stack tracing (e.g. in GHCi) in the
      Coverage (HPC)
       * derived instances are now coloured yellow if they weren't used
       * likewise record field names
       * entry counts are more accurate (hpc --fun-entry-count)
       * tab width is now correct (markup was previously off in source with
      Internal changes
      In Core, the Note constructor has been replaced by
              Tick (Tickish b) (Expr b)
      which is used to represent all the kinds of source annotation we
      support: profiling SCCs, HPC ticks, and GHCi breakpoints.
      Depending on the properties of the Tickish, different transformations
      apply to Tick.  See CoreUtils.mkTick for details.
      This commit closes the following tickets, test cases to follow:
        - Close #2552: not a bug, but the behaviour is now more intuitive
          (test is T2552)
        - Close #680 (test is T680)
        - Close #1531 (test is result001)
        - Close #949 (test is T949)
        - Close #2466: test case has bitrotted (doesn't compile against current
          version of vector-space package)
  19 May, 2011