This project is mirrored from https://gitlab.haskell.org/ghc/ghc.git. Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts, and can be resumed by a project maintainer.
Last successful update .
  1. 13 Sep, 2018 1 commit
  2. 05 Jun, 2018 1 commit
    • Ömer Sinan Ağacan's avatar
      Rename some mutable closure types for consistency · 4075656e
      Ömer Sinan Ağacan authored
      SMALL_MUT_ARR_PTRS_FROZEN0 -> SMALL_MUT_ARR_PTRS_FROZEN_DIRTY
      SMALL_MUT_ARR_PTRS_FROZEN  -> SMALL_MUT_ARR_PTRS_FROZEN_CLEAN
      MUT_ARR_PTRS_FROZEN0       -> MUT_ARR_PTRS_FROZEN_DIRTY
      MUT_ARR_PTRS_FROZEN        -> MUT_ARR_PTRS_FROZEN_CLEAN
      
      Naming is now consistent with other CLEAR/DIRTY objects (MVAR, MUT_VAR,
      MUT_ARR_PTRS).
      
      (alternatively we could rename MVAR_DIRTY/MVAR_CLEAN etc. to MVAR0/MVAR)
      
      Removed a few comments in Scav.c about FROZEN0 being on the mut_list
      because it's now clear from the closure type.
      
      Reviewers: bgamari, simonmar, erikd
      
      Reviewed By: simonmar
      
      Subscribers: rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4784
      4075656e
  3. 02 Jun, 2018 1 commit
  4. 23 May, 2018 1 commit
    • Ben Gamari's avatar
      Disable the SRT offset optimisation on MachO platforms · bf10456e
      Ben Gamari authored
      Unfortunately, this optimisation is infeasible on MachO platforms (e.g.
      Darwin) due to an object format limitation. Specifically, linking fails
      with errors of the form:
      
           error: unsupported relocation with subtraction expression, symbol
           '_integerzmgmp_GHCziIntegerziType_quotInteger_closure' can not be
           undefined in a subtraction expression
      
      Apparently MachO does not permit relocations' subtraction expressions to
      refer to undefined symbols. As far as I can tell this means that it is
      essentially impossible to express an offset between symbols living in
      different compilation units. This means that we lively can't use this
      optimisation on MachO platforms.
      
      Test Plan: Validate on Darwin
      
      Reviewers: simonmar, erikd
      
      Subscribers: rwbarton, thomie, carter, angerman
      
      GHC Trac Issues: #15169
      
      Differential Revision: https://phabricator.haskell.org/D4715
      bf10456e
  5. 20 May, 2018 1 commit
    • patrickdoc's avatar
      Add HeapView functionality · ec22f7dd
      patrickdoc authored
      This pulls parts of Joachim Breitner's ghc-heap-view library inside GHC.
      The bits added are the C hooks into the RTS and a basic Haskell wrapper
      to these C hooks. The main reason for these to be added to GHC proper
      is that the code needs to be kept in sync with the closure types
      defined by the RTS. It is expected that the version of HeapView shipped
      with GHC will always work with that version of GHC and that extra
      functionality can be layered on top with a library like ghc-heap-view
      distributed via Hackage.
      
      Test Plan: validate
      
      Reviewers: simonmar, hvr, nomeata, austin, Phyx, bgamari, erikd
      
      Reviewed By: bgamari
      
      Subscribers: carter, patrickdoc, tmcgilchrist, rwbarton, thomie
      
      Differential Revision: https://phabricator.haskell.org/D3055
      ec22f7dd
  6. 16 May, 2018 4 commits
    • Ben Gamari's avatar
      InfoTables: Fix #if uses introduced by D4634 · 3310f7f1
      Ben Gamari authored
      3310f7f1
    • Simon Marlow's avatar
      Merge FUN_STATIC closure with its SRT · 838b6903
      Simon Marlow authored
      Summary:
      The idea here is to save a little code size and some work in the GC,
      by collapsing FUN_STATIC closures and their SRTs.
      
      This is (4) in a series; see D4632 for more details.
      
      There's a tradeoff here: more complexity in the compiler in exchange
      for a modest code size reduction (probably around 0.5%).
      
      Results:
      * GHC binary itself (statically linked) is 1% smaller
      * -0.2% binary sizes in nofib (-0.5% module sizes)
      
      Full nofib results comparing D4634 with this: P177 (ignore runtimes,
      these aren't stable on my laptop)
      
      Test Plan: validate, nofib
      
      Reviewers: bgamari, niteria, simonpj, erikd
      
      Subscribers: thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4637
      838b6903
    • Simon Marlow's avatar
      Save a word in the info table on x86_64 · 2b0918c9
      Simon Marlow authored
      Summary:
      An info table with an SRT normally looks like this:
      
          StgWord64 srt_offset
          StgClosureInfo layout
          StgWord32 layout
          StgWord32 has_srt
      
      But we only need 32 bits for srt_offset on x86_64, because the small
      memory model requires that code segments are at most 2GB. So we can
      optimise this to
      
          StgClosureInfo layout
          StgWord32 layout
          StgWord32 srt_offset
      
      saving a word.  We can tell whether the info table has an SRT or not,
      because zero is not a valid srt_offset, so zero still indicates that
      there's no SRT.
      
      Test Plan:
      * validate
      * For results, see D4632.
      
      Reviewers: bgamari, niteria, osa1, erikd
      
      Subscribers: thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4634
      2b0918c9
    • Simon Marlow's avatar
      An overhaul of the SRT representation · eb8e692c
      Simon Marlow authored
      Summary:
      - Previously we would hvae a single big table of pointers per module,
        with a set of bitmaps to reference entries within it. The new
        representation is identical to a static constructor, which is much
        simpler for the GC to traverse, and we get to remove the complicated
        bitmap-traversal code from the GC.
      
      - Rewrite all the code to generate SRTs in CmmBuildInfoTables, and
        document it much better (see Note [SRTs]). This has been something
        I've wanted to do since we moved to the new code generator, I
        finally had the opportunity to finish it while on a transatlantic
        flight recently :)
      
      There are a series of 4 diffs:
      
      1. D4632 (this one), which does the bulk of the changes
      
      2. D4633 which adds support for smaller `CmmLabelDiffOff` constants
      
      3. D4634 which takes advantage of D4632 and D4633 to save a word in
         info tables that have an SRT on x86_64. This is where most of the
         binary size improvement comes from.
      
      4. D4637 which makes a further optimisation to merge some SRTs with
         static FUN closures.  This adds some complexity and the benefits
         are fairly modest, so it's not clear yet whether we should do this.
      
      Results (after (3), on x86_64)
      
      - GHC itself (staticaly linked) is 5.2% smaller
      
      - -1.7% binary sizes in nofib, -2.9% module sizes. Full nofib results: P176
      
      - I measured the overhead of traversing all the static objects in a
        major GC in GHC itself by doing `replicateM_ 1000 performGC` as the
        first thing in `Main.main`.  The new version was 5-10% faster, but
        the results did vary quite a bit.
      
      - I'm not sure if there's a compile-time difference, the results are
        too unreliable.
      
      Test Plan: validate
      
      Reviewers: bgamari, michalt, niteria, simonpj, erikd, osa1
      
      Subscribers: thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4632
      eb8e692c
  7. 10 May, 2018 1 commit
  8. 16 Apr, 2018 1 commit
  9. 05 Apr, 2018 1 commit
  10. 30 Mar, 2018 1 commit
  11. 19 Mar, 2018 1 commit
    • Douglas Wilson's avatar
      rts: Add --internal-counters RTS flag and several counters · 2918abf7
      Douglas Wilson authored
      The existing internal counters:
      * gc_alloc_block_sync
      * whitehole_spin
      * gen[g].sync
      * gen[1].sync
      
      are now not shown in the -s report unless --internal-counters is also passed.
      
      If --internal-counters is passed we now show the counters above, reformatted, as
      well as several other counters. In particular, we now count the yieldThread()
      calls that SpinLocks do as well as their spins.
      
      The added counters are:
      * gc_spin (spin and yield)
      * mut_spin (spin and yield)
      * whitehole_threadPaused (spin only)
      * whitehole_executeMessage (spin only)
      * whitehole_lockClosure (spin only)
      * waitForGcThreadsd (spin and yield)
      
      As well as the following, which are not SpinLock-like things:
      * any_work
      * do_work
      * scav_find_work
      
      See the Note for descriptions of what these counters are.
      
      We add busy_wait_nops in these loops along with the counter increment where it
      was absent.
      
      Old internal counters output:
      ```
      gc_alloc_block_sync: 0
      whitehole_gc_spin: 0
      gen[0].sync: 0
      gen[1].sync: 0
      ```
      
      New internal counters output:
      ```
      Internal Counters:
                                                 Spins        Yields
          gc_alloc_block_sync                      323             0
          gc_spin                              9016713           752
          mut_spin                            57360944         47716
          whitehole_gc                               0           n/a
          whitehole_threadPaused                     0           n/a
          whitehole_executeMessage                   0           n/a
          whitehole_lockClosure                      0             0
          waitForGcThreads                           2           415
          gen[0].sync                                6             0
          gen[1].sync                                1             0
      
          any_work                                2017
          no_work                                 2014
          scav_find_work                          1004
      ```
      
      Test Plan:
      ./validate
      
      Check it builds with #define PROF_SPIN removed from includes/rts/Config.h
      
      Reviewers: bgamari, erikd, simonmar, hvr
      
      Reviewed By: simonmar
      
      Subscribers: rwbarton, thomie, carter
      
      GHC Trac Issues: #3553, #9221
      
      Differential Revision: https://phabricator.haskell.org/D4302
      2918abf7
  12. 23 Nov, 2017 1 commit
  13. 26 Sep, 2017 1 commit
  14. 19 Sep, 2017 1 commit
    • Ben Gamari's avatar
      rts/RetainerProfile: Adding missing closure types to isRetainer · 6252292d
      Ben Gamari authored
      orzo in `#ghc` reported seeing a crash due to the retainer profiler encountering
      a BLOCKING_QUEUE closure, which isRetainer didn't know about. I performed an
      audit to make sure that all of the valid closure types were listed; they
      weren't. This is my guess of how they should appear.
      
      Test Plan: Validate
      
      Reviewers: simonmar, austin, erikd
      
      Reviewed By: simonmar
      
      Subscribers: rwbarton, thomie
      
      GHC Trac Issues: #14235
      
      Differential Revision: https://phabricator.haskell.org/D3967
      6252292d
  15. 29 Apr, 2017 1 commit
  16. 24 Apr, 2017 1 commit
    • Sergei Trofimovich's avatar
      compiler/cmm/PprC.hs: constify labels in .rodata · b68697e5
      Sergei Trofimovich authored
      Consider one-line module
          module B (v) where v = "hello"
      in -fvia-C mode it generates code like
          static char gibberish_str[] = "hello";
      
      It resides in data section (precious resource on ia64!).
      The patch switches genrator to emit:
          static const char gibberish_str[] = "hello";
      
      Other types if symbols that gained 'const' qualifier are:
      
      - info tables (from haskell and CMM)
      - static reference tables (from haskell and CMM)
      
      Cleanups along the way:
      
      - fixed info tables defined in .cmm to reside in .rodata
      - split out closure declaration into 'IC_' / 'EC_'
      - added label declaration (based on label type) right before
        each label definition (based on section type) so that C
        compiler could check if declaration and definition matches
        at definition site.
      Signed-off-by: default avatarSergei Trofimovich <slyfox@gentoo.org>
      
      Test Plan: ran testsuite on unregisterised x86_64 compiler
      
      Reviewers: simonmar, ezyang, austin, bgamari, erikd
      
      Reviewed By: bgamari, erikd
      
      Subscribers: rwbarton, thomie
      
      GHC Trac Issues: #8996
      
      Differential Revision: https://phabricator.haskell.org/D3481
      b68697e5
  17. 23 Apr, 2017 1 commit
  18. 04 Apr, 2017 1 commit
  19. 04 Feb, 2017 1 commit
    • Takenobu Tani's avatar
      Fix comment (old file names) in includes/ · 9984024a
      Takenobu Tani authored
      [skip ci]
      
      There ware some old file names (.lhs, ...) at comments.
      
      * includes/rts/Bytecodes.h
        - ghc/compiler/ghci/ByteCodeGen.lhs -> ByteCodeAsm.hs
      
      * includes/rts/Constants.h
        - libraries/base/GHC/Conc.lhs -> libraries/base/GHC/Conc/Sync.hs
      
      * includes/rts/storage/FunTypes.h
        - utils/genapply/GenApply.hs -> utils/genappl/Main.hs
        - compiler/codeGen/CgCallConv.lhs -> compiler/codeGen/StgCmmLayout.hs
      
      * includes/stg/MiscClosures.h
        - compiler/codeGen/CgStackery.lhs -> compiler/codeGen/StgCmmArgRep.hs
        - HeapStackCheck.hc  -> HeapStackCheck.cmm
      
      Reviewers: bgamari, austin, simonmar, erikd
      
      Reviewed By: erikd
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D3074
      9984024a
  20. 13 Dec, 2016 1 commit
  21. 07 Dec, 2016 1 commit
    • Simon Marlow's avatar
      Overhaul of Compact Regions (#12455) · 7036fde9
      Simon Marlow authored
      Summary:
      This commit makes various improvements and addresses some issues with
      Compact Regions (aka Compact Normal Forms).
      
      This was the most important thing I wanted to fix.  Compaction
      previously prevented GC from running until it was complete, which
      would be a problem in a multicore setting.  Now, we compact using a
      hand-written Cmm routine that can be interrupted at any point.  When a
      GC is triggered during a sharing-enabled compaction, the GC has to
      traverse and update the hash table, so this hash table is now stored
      in the StgCompactNFData object.
      
      Previously, compaction consisted of a deepseq using the NFData class,
      followed by a traversal in C code to copy the data.  This is now done
      in a single pass with hand-written Cmm (see rts/Compact.cmm). We no
      longer use the NFData instances, instead the Cmm routine evaluates
      components directly as it compacts.
      
      The new compaction is about 50% faster than the old one with no
      sharing, and a little faster on average with sharing (the cost of the
      hash table dominates when we're doing sharing).
      
      Static objects that don't (transitively) refer to any CAFs don't need
      to be copied into the compact region.  In particular this means we
      often avoid copying Char values and small Int values, because these
      are static closures in the runtime.
      
      Each Compact# object can support a single compactAdd# operation at any
      given time, so the Data.Compact library now enforces mutual exclusion
      using an MVar stored in the Compact object.
      
      We now get exceptions rather than killing everything with a barf()
      when we encounter an object that cannot be compacted (a function, or a
      mutable object).  We now also detect pinned objects, which can't be
      compacted either.
      
      The Data.Compact API has been refactored and cleaned up.  A new
      compactSize operation returns the size (in bytes) of the compact
      object.
      
      Most of the documentation is in the Haddock docs for the compact
      library, which I've expanded and improved here.
      
      Various comments in the code have been improved, especially the main
      Note [Compact Normal Forms] in rts/sm/CNF.c.
      
      I've added a few tests, and expanded a few of the tests that were
      there.  We now also run the tests with GHCi, and in a new test way
      that enables sanity checking (+RTS -DS).
      
      There's a benchmark in libraries/compact/tests/compact_bench.hs for
      measuring compaction speed and comparing sharing vs. no sharing.
      
      The field totalDataW in StgCompactNFData was unnecessary.
      
      Test Plan:
      * new unit tests
      * validate
      * tested manually that we can compact Data.Aeson data
      
      Reviewers: gcampax, bgamari, ezyang, austin, niteria, hvr, erikd
      
      Subscribers: thomie, simonpj
      
      Differential Revision: https://phabricator.haskell.org/D2751
      
      GHC Trac Issues: #12455
      7036fde9
  22. 06 Dec, 2016 1 commit
    • Simon Marlow's avatar
      Overhaul GC stats · 24e6594c
      Simon Marlow authored
      Summary:
      Visible API changes:
      
      * The C struct `GCDetails` gives the stats about a single GC.  This is
        passed to the `gcDone()` callback if one is set via the
        RtsConfig. (previously we just passed a collection of values, so this
        is more extensible, at the expense of breaking the existing API)
      
      * `RTSStats` gives cumulative stats since the start of the program,
        and includes the `GCDetails` for the most recent GC.  This struct
        can be obtained via `getRTSStats()` (the old `getGCStats()` has been
        removed, and `getGCStatsEnabled()` has been renamed to
        `getRTSStatsEnabled()`)
      
      Improvements:
      
      * The per-GC stats and cumulative stats are now cleanly separated.
      
      * Inside the RTS we have a top-level `RTSStats` struct to keep all our
        stats in, previously this was just a collection of strangely-named
        variables.  This struct is mostly just copied in `getRTSStats()`, so
        the implementation of that function is a lot shorter.
      
      * Types are more consistent.  We use a uint64_t byte count for all
        memory values, and Time for all time values.
      
      * Names are more consistent.  We use a suffix `_bytes` for all byte
        counts and `_ns` for all time values.
      
      * We now collect information about the amount of memory in large
        objects and compact objects in `GCDetails`. (the latter was the reason
        I started doing this patch but it seems to have ballooned a bit!)
      
      * I fixed a bug in the calculation of the elapsed MUT time, and added
        an ASSERT to stop the calculations going wrong in the future.
      
      For now I kept the Haskell API in `GHC.Stats` the same, by
      impedence-matching with the new API.  We could either break that API
      and make it match the C API more closely, or we could add a new API
      and deprecate the old one.  Opinions welcome.
      
      This stuff is very easy to get wrong, and it's hard to test.  Reviews
      welcome!
      
      Test Plan:
      manual testing
      validate
      
      Reviewers: bgamari, niteria, austin, ezyang, hvr, erikd, rwbarton, Phyx
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2756
      24e6594c
  23. 29 Nov, 2016 1 commit
  24. 14 Nov, 2016 1 commit
    • Simon Marlow's avatar
      Remove CONSTR_STATIC · 55d535da
      Simon Marlow authored
      Summary:
      We currently have two info tables for a constructor
      
      * XXX_con_info: the info table for a heap-resident instance of the
        constructor, It has type CONSTR, or one of the specialised types like
        CONSTR_1_0
      
      * XXX_static_info: the info table for a static instance of this
        constructor, which has type CONSTR_STATIC or CONSTR_STATIC_NOCAF.
      
      I'm getting rid of the latter, and using the `con_info` info table for
      both static and dynamic constructors.  For rationale and more details
      see Note [static constructors] in SMRep.hs.
      
      I also removed these macros: `isSTATIC()`, `ip_STATIC()`,
      `closure_STATIC()`, since they relied on the CONSTR/CONSTR_STATIC
      distinction, and anyway HEAP_ALLOCED() does the same job.
      
      Test Plan: validate
      
      Reviewers: bgamari, simonpj, austin, gcampax, hvr, niteria, erikd
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2690
      
      GHC Trac Issues: #12455
      55d535da
  25. 02 Nov, 2016 1 commit
  26. 19 Aug, 2016 1 commit
  27. 05 Aug, 2016 1 commit
    • avd's avatar
      codeGen: Remove binutils<2.17 hack, fixes T11758 · e3e2e49a
      avd authored
      There was a complication on the x86_64 platform, where pointers were 64
      bits, but the tools didn't support 64-bit relative relocations.  This
      was true before binutils 2.17, which nowadays is quite standart (even
      CentOs 5 is shipped with 2.17).
      
      Hacks were removed from x86 genSwitch and asm pretty printer. Also
      [x86-64-relative] note was dropped from
      includes/rts/storage/InfoTables.h as it's not referenced anywhere now.
      
      Reviewers: austin, simonmar, rwbarton, erikd, bgamari
      
      Reviewed By: simonmar, erikd, bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2426
      e3e2e49a
  28. 27 Jul, 2016 1 commit
  29. 20 Jul, 2016 1 commit
    • gcampax's avatar
      Compact Regions · cf989ffe
      gcampax authored
      This brings in initial support for compact regions, as described in the
      ICFP 2015 paper "Efficient Communication and Collection with Compact
      Normal Forms" (Edward Z. Yang et.al.) and implemented by Giovanni
      Campagna.
      
      Some things may change before the 8.2 release, but I (Simon M.) wanted
      to get the main patch committed so that we can iterate.
      
      What documentation there is is in the Data.Compact module in the new
      compact package.  We'll need to extend and polish the documentation
      before the release.
      
      Test Plan:
      validate
      (new test cases included)
      
      Reviewers: ezyang, simonmar, hvr, bgamari, austin
      
      Subscribers: vikraman, Yuras, RyanGlScott, qnikst, mboes, facundominguez, rrnewton, thomie, erikd
      
      Differential Revision: https://phabricator.haskell.org/D1264
      
      GHC Trac Issues: #11493
      cf989ffe
  30. 10 Jun, 2016 2 commits
    • Simon Marlow's avatar
      Rts flags cleanup · c88f31a0
      Simon Marlow authored
      * Remove unused/old flags from the structs
      * Update old comments
      * Add missing flags to GHC.RTS
      * Simplify GHC.RTS, remove C code and use hsc2hs instead
      * Make ParFlags unconditional, and add support to GHC.RTS
      c88f31a0
    • Simon Marlow's avatar
      NUMA support · 9e5ea67e
      Simon Marlow authored
      Summary:
      The aim here is to reduce the number of remote memory accesses on
      systems with a NUMA memory architecture, typically multi-socket servers.
      
      Linux provides a NUMA API for doing two things:
      * Allocating memory local to a particular node
      * Binding a thread to a particular node
      
      When given the +RTS --numa flag, the runtime will
      * Determine the number of NUMA nodes (N) by querying the OS
      * Assign capabilities to nodes, so cap C is on node C%N
      * Bind worker threads on a capability to the correct node
      * Keep a separate free lists in the block layer for each node
      * Allocate the nursery for a capability from node-local memory
      * Allocate blocks in the GC from node-local memory
      
      For example, using nofib/parallel/queens on a 24-core 2-socket machine:
      
      ```
      $ ./Main 15 +RTS -N24 -s -A64m
        Total   time  173.960s  (  7.467s elapsed)
      
      $ ./Main 15 +RTS -N24 -s -A64m --numa
        Total   time  150.836s  (  6.423s elapsed)
      ```
      
      The biggest win here is expected to be allocating from node-local
      memory, so that means programs using a large -A value (as here).
      
      According to perf, on this program the number of remote memory accesses
      were reduced by more than 50% by using `--numa`.
      
      Test Plan:
      * validate
      * There's a new flag --debug-numa=<n> that pretends to do NUMA without
        actually making the OS calls, which is useful for testing the code
        on non-NUMA systems.
      * TODO: I need to add some unit tests
      
      Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2199
      9e5ea67e
  31. 17 May, 2016 1 commit
    • Erik de Castro Lopo's avatar
      rts: More const correct-ness fixes · 33c029dd
      Erik de Castro Lopo authored
      In addition to more const-correctness fixes this patch fixes an
      infelicity of the previous const-correctness patch (995cf0f3) which
      left `UNTAG_CLOSURE` taking a `const StgClosure` pointer parameter
      but returning a non-const pointer. Here we restore the original type
      signature of `UNTAG_CLOSURE` and add a new function
      `UNTAG_CONST_CLOSURE` which takes and returns a const `StgClosure`
      pointer and uses that wherever possible.
      
      Test Plan: Validate on Linux, OS X and Windows
      
      Reviewers: Phyx, hsyl20, bgamari, austin, simonmar, trofi
      
      Reviewed By: simonmar, trofi
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2231
      33c029dd
  32. 12 May, 2016 1 commit
  33. 04 May, 2016 1 commit
  34. 29 Apr, 2016 1 commit
  35. 18 Apr, 2016 1 commit
  36. 12 Apr, 2016 1 commit
    • Simon Marlow's avatar
      Allocate blocks in the GC in batches · f4446c5b
      Simon Marlow authored
      Avoids contention for the block allocator lock in the GC; this can be
      seen in the gc_alloc_block_sync counter emitted by +RTS -s.
      
      I experimented with this a while ago, and there was already
      commented-out code for it in GCUtils.c, but I've now improved it so that
      it doesn't result in significantly worse memory usage.
      
      * The old method of putting spare blocks on ws->part_list was wasteful,
        the spare blocks are now shared between all generations and retained
        between GCs.
      
      * repeated allocGroup() results in fragmentation, so I switched to using
        allocLargeChunk() instead which is fragmentation-friendly; we already
        use it for the same reason in nursery allocation.
      f4446c5b