1. 27 Jul, 2015 1 commit
  2. 22 Jul, 2015 1 commit
    • Simon Marlow's avatar
      Eliminate zero_static_objects_list() · b949c96b
      Simon Marlow authored
      In a workload with a large amount of code, zero_static_objects_list()
      takes a significant amount of time, and furthermore it is in the
      single-threaded part of the GC.
      This patch uses a slightly fiddly scheme for marking objects on the
      static object lists, using a flag in the low 2 bits that flips between
      two states to indicate whether an object has been visited during this
      GC or not.  We also have to take into account objects that have not
      been visited yet, which might appear at any time due to runtime linking.
      Test Plan: validate
      Reviewers: austin, bgamari, ezyang, rwbarton
      Subscribers: thomie
      Differential Revision: https://phabricator.haskell.org/D1076
  3. 07 Apr, 2015 1 commit
  4. 25 Nov, 2014 1 commit
  5. 21 Oct, 2014 1 commit
  6. 10 Oct, 2014 1 commit
  7. 29 Sep, 2014 1 commit
  8. 28 Jul, 2014 1 commit
  9. 10 Jul, 2014 1 commit
  10. 30 May, 2014 1 commit
  11. 27 Apr, 2014 1 commit
  12. 09 Dec, 2013 1 commit
  13. 30 Nov, 2013 1 commit
  14. 21 Nov, 2013 2 commits
    • Simon Marlow's avatar
      Allow the linker to be used without retaining CAFs unconditionally · 5874f13f
      Simon Marlow authored
      This creates a new C API:
         initLinker_ (int retain_cafs)
      The old initLinker() was left as-is for backwards compatibility.  See
      documentation in Linker.h.
    • Simon Marlow's avatar
      In the DEBUG rts, track when CAFs are GC'd · e82fa829
      Simon Marlow authored
      This resurrects some old code and makes it work again.  The idea is
      that we want to get an error message if we ever enter a CAF that has
      been GC'd, rather than following its indirection which will likely
      cause a segfault.  Without this patch, these bugs are hard to track
      down in gdb, because the IND_STATIC code overwrites R1 (the pointer to
      the CAF) with its indirectee before jumping into bad memory, so we've
      lost the address of the CAF that got GC'd.
      Some associated refactoring while I was here.
  15. 02 Oct, 2013 1 commit
  16. 01 Oct, 2013 1 commit
  17. 04 Sep, 2013 1 commit
    • Simon Marlow's avatar
      Don't move Capabilities in setNumCapabilities (#8209) · aa779e09
      Simon Marlow authored
      We have various problems with reallocating the array of Capabilities,
      due to threads in waitForReturnCapability that are already holding a
      pointer to a Capability.
      Rather than add more locking to make this safer, I decided it would be
      easier to ensure that we never move the Capabilities at all.  The
      capabilities array is now an array of pointers to Capabaility.  There
      are extra indirections, but it rarely matters - we don't often access
      Capabilities via the array, normally we already have a pointer to
      one.  I ran the parallel benchmarks and didn't see any difference.
  18. 22 Aug, 2013 1 commit
    • Simon Marlow's avatar
      Really unload object code when it is safe to do so (#8039) · bdfefb3b
      Simon Marlow authored
      The next major GC after an unloadObj() will do a traversal of the heap
      to determine whether the object code can be removed from memory or
      not.  We'll keep doing these until it is safe to remove the object
      In my experiments with GHCi, the objects get unloaded immediately,
      which is a good sign: we're not accidentally holding on to any
      references anywhere in the GHC data structures.
      Changes relative to the patch earlier posted on the ticket:
       - fix two memory leaks discovered with Valgrind, after
         testing with tests/rts/linker_unload.c
  19. 21 Aug, 2013 1 commit
  20. 15 Jun, 2013 1 commit
  21. 13 May, 2013 1 commit
  22. 14 Feb, 2013 2 commits
    • Simon Marlow's avatar
      Separate StablePtr and StableName tables (#7674) · 7e7a4e4d
      Simon Marlow authored
      To improve performance of StablePtr.
    • Simon Marlow's avatar
      Simplify the allocation stats accounting · 65a0e1eb
      Simon Marlow authored
      We were doing it in two different ways and asserting that the results
      were the same.  In most cases they were, but I found one case where
      they weren't: the GC itself allocates some memory for running
      finalizers, and this memory was accounted for one way but not the
      It was simpler to remove the old way of counting allocation that to
      try to fix it up, so I did that.
  23. 17 Jan, 2013 1 commit
  24. 16 Nov, 2012 1 commit
    • Simon Marlow's avatar
      Add a write barrier for TVAR closures · 6d784c43
      Simon Marlow authored
      This improves GC performance when there are a lot of TVars in the
      heap.  For instance, a TChan with a lot of elements causes a massive
      GC drag without this patch.
      There's more to do - several other STM closure types don't have write
      barriers, so GC performance when there are a lot of threads blocked on
      STM isn't great.  But fixing the problem for TVar is a good start.
  25. 01 Nov, 2012 1 commit
  26. 21 Sep, 2012 2 commits
  27. 18 Sep, 2012 1 commit
  28. 07 Sep, 2012 4 commits
  29. 21 Aug, 2012 2 commits
  30. 10 Jul, 2012 1 commit
    • Simon Marlow's avatar
      Parallelise clearNurseries() in the parallel GC · 713cf473
      Simon Marlow authored
      The clearNurseries() operation resets the free pointer in each nursery
      block to the start of the block, emptying the nursery.  In the
      parallel GC this was done on the main GC thread, but that's bad
      because it accesses the bdescr of every nursery block, and move all
      those cache lines onto the CPU of the main GC thread.  With large
      nurseries, this can be especially bad.  So instead we want to clear
      each nursery in its local GC thread.
      Thanks to Andreas Voellmy <andreas.voellmy@gmail.com> for idenitfying
      the issue.
      After this change and the previous patch to make the last GC a major
      one, I see these results for nofib/parallel on 8 cores:
         blackscholes          +0.0%     +0.0%     -3.7%     -3.3%     +0.3%
                coins          +0.0%     +0.0%     -5.1%     -5.0%     +0.4%
                 gray          +0.0%     +0.0%     -4.5%     -2.1%     +0.8%
               mandel          +0.0%     -0.0%     -7.6%     -5.1%     -2.3%
              matmult          +0.0%     +5.5%     -2.8%     -1.9%     -5.8%
              minimax          +0.0%     +0.0%    -10.6%    -10.5%     +0.0%
                nbody          +0.0%     -4.4%     +0.0%      0.07     +0.0%
               parfib          +0.0%     +1.0%     +0.5%     +0.9%     +0.0%
              partree          +0.0%     +0.0%     -2.4%     -2.5%     +1.7%
                 prsa          +0.0%     -0.2%     +1.8%     +4.2%     +0.0%
               queens          +0.0%     -0.0%     -1.8%     -1.4%     -4.8%
                  ray          +0.0%     -0.6%    -18.5%    -17.8%     +0.0%
             sumeuler          +0.0%     -0.0%     -3.7%     -3.7%     +0.0%
            transclos          +0.0%     -0.0%    -25.7%    -26.6%     +0.0%
                  Min          +0.0%     -4.4%    -25.7%    -26.6%     -5.8%
                  Max          +0.0%     +5.5%     +1.8%     +4.2%     +1.7%
       Geometric Mean          +0.0%     +0.1%     -6.3%     -6.1%     -0.7%
  31. 04 Apr, 2012 3 commits
    • Mikolaj Konarski's avatar
      Fix the timestamps in GC_START and GC_END events on the GC-initiating cap · 598109eb
      Mikolaj Konarski authored
      There was a discrepancy between GC times reported in +RTS -s
      and the timestamps of GC_START and GC_END events on the cap,
      on which +RTS -s stats for the given GC are based.
      This is fixed by posting the events with exactly the same timestamp
      as generated for the stat calculation. The calls posting the events
      are moved too, so that the events are emitted close to the time instant
      they claim to be emitted at. The GC_STATS_GHC was moved, too, ensuring
      it's emitted before the moved GC_END on all caps, which simplifies tools code.
    • Duncan Coutts's avatar
      Emit final heap alloc events and rearrange code to calculate alloc totals · 1f809ce6
      Duncan Coutts authored
      In stat_exit we want to emit a final EVENT_HEAP_ALLOCATED for each cap
      so that we get the same total allocation count as reported via +RTS -s.
      To do so we need to update the per-cap total_allocated counts.
      Previously we had a single calcAllocated(rtsBool) function that counted
      the large allocations and optionally the nurseries for all caps. The GC
      would always call it with false, and the stat_exit always with true.
      The reason for these two modes is that the GC counts the nurseries via
      clearNurseries() (which also updates the per-cap total_allocated
      counts), so it's only the stat_exit() path that needs to count them.
      We now split the calcAllocated() function into two: countLargeAllocated
      and updateNurseriesStats. As the name suggests, the latter now updates
      the per-cap total_allocated counts, in additon to returning a total.
    • Duncan Coutts's avatar
      Add new eventlog events for various heap and GC statistics · 65aaa9b2
      Duncan Coutts authored
      They cover much the same info as is available via the GHC.Stats module
      or via the '+RTS -s' textual output, but via the eventlog and with a
      better sampling frequency.
      We have three new generic heap info events and two very GHC-specific
      ones. (The hope is the general ones are usable by other implementations
      that use the same eventlog system, or indeed not so sensitive to changes
      in GHC itself.)
      The general ones are:
       * total heap mem allocated since prog start, on a per-HEC basis
       * current size of the heap (MBlocks reserved from OS for the heap)
       * current size of live data in the heap
      Currently these are all emitted by GHC at GC time (live data only at
      major GC).
      The GHC specific ones are:
       * an event giving various static heap paramaters:
         * number of generations (usually 2)
         * max size if any
         * nursary size
         * MBlock and block sizes
       * a event emitted on each GC containing:
         * GC generation (usually just 0,1)
         * total bytes copied
         * bytes lost to heap slop and fragmentation
         * the number of threads in the parallel GC (1 for serial)
         * the maximum number of bytes copied by any par GC thread
         * the total number of bytes copied by all par GC threads
           (these last three can be used to calculate an estimate of the
            work balance in parallel GCs)