This project is mirrored from https://gitlab.haskell.org/ghc/ghc.git. Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts, and can be resumed by a project maintainer.
Last successful update .
  1. 18 Oct, 2019 2 commits
    • Ben Gamari's avatar
      rts: Implement concurrent collection in the nonmoving collector · d7017446
      Ben Gamari authored
      This extends the non-moving collector to allow concurrent collection.
      
      The full design of the collector implemented here is described in detail
      in a technical note
      
          B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
          Compiler" (2018)
      
      This extension involves the introduction of a capability-local
      remembered set, known as the /update remembered set/, which tracks
      objects which may no longer be visible to the collector due to mutation.
      To maintain this remembered set we introduce a write barrier on
      mutations which is enabled while a concurrent mark is underway.
      
      The update remembered set representation is similar to that of the
      nonmoving mark queue, being a chunked array of `MarkEntry`s. Each
      `Capability` maintains a single accumulator chunk, which it flushed
      when it (a) is filled, or (b) when the nonmoving collector enters its
      post-mark synchronization phase.
      
      While the write barrier touches a significant amount of code it is
      conceptually straightforward: the mutator must ensure that the referee
      of any pointer it overwrites is added to the update remembered set.
      However, there are a few details:
      
       * In the case of objects with a dirty flag (e.g. `MVar`s) we can
         exploit the fact that only the *first* mutation requires a write
         barrier.
      
       * Weak references, as usual, complicate things. In particular, we must
         ensure that the referee of a weak object is marked if dereferenced by
         the mutator. For this we (unfortunately) must introduce a read
         barrier, as described in Note [Concurrent read barrier on deRefWeak#]
         (in `NonMovingMark.c`).
      
       * Stable names are also a bit tricky as described in Note [Sweeping
         stable names in the concurrent collector] (`NonMovingSweep.c`).
      
      We take quite some pains to ensure that the high thread count often seen
      in parallel Haskell applications doesn't affect pause times. To this end
      we allow thread stacks to be marked either by the thread itself (when it
      is executed or stack-underflows) or the concurrent mark thread (if the
      thread owning the stack is never scheduled). There is a non-trivial
      handshake to ensure that this happens without racing which is described
      in Note [StgStack dirtiness flags and concurrent marking].
      Co-Authored-by: Ömer Sinan Ağacan's avatarÖmer Sinan Ağacan <omer@well-typed.com>
      d7017446
    • Ömer Sinan Ağacan's avatar
      rts/GC: Add an obvious assertion during block initialization · 697be2b6
      Ömer Sinan Ağacan authored
      Namely ensure that block descriptors are initialized with valid
      generation numbers.
      Co-Authored-By: Ben Gamari's avatarBen Gamari <ben@well-typed.com>
      697be2b6
  2. 27 Jun, 2018 1 commit
  3. 19 Mar, 2018 1 commit
    • duog's avatar
      rts: Add --internal-counters RTS flag and several counters · 2918abf7
      duog authored
      The existing internal counters:
      * gc_alloc_block_sync
      * whitehole_spin
      * gen[g].sync
      * gen[1].sync
      
      are now not shown in the -s report unless --internal-counters is also passed.
      
      If --internal-counters is passed we now show the counters above, reformatted, as
      well as several other counters. In particular, we now count the yieldThread()
      calls that SpinLocks do as well as their spins.
      
      The added counters are:
      * gc_spin (spin and yield)
      * mut_spin (spin and yield)
      * whitehole_threadPaused (spin only)
      * whitehole_executeMessage (spin only)
      * whitehole_lockClosure (spin only)
      * waitForGcThreadsd (spin and yield)
      
      As well as the following, which are not SpinLock-like things:
      * any_work
      * do_work
      * scav_find_work
      
      See the Note for descriptions of what these counters are.
      
      We add busy_wait_nops in these loops along with the counter increment where it
      was absent.
      
      Old internal counters output:
      ```
      gc_alloc_block_sync: 0
      whitehole_gc_spin: 0
      gen[0].sync: 0
      gen[1].sync: 0
      ```
      
      New internal counters output:
      ```
      Internal Counters:
                                                 Spins        Yields
          gc_alloc_block_sync                      323             0
          gc_spin                              9016713           752
          mut_spin                            57360944         47716
          whitehole_gc                               0           n/a
          whitehole_threadPaused                     0           n/a
          whitehole_executeMessage                   0           n/a
          whitehole_lockClosure                      0             0
          waitForGcThreads                           2           415
          gen[0].sync                                6             0
          gen[1].sync                                1             0
      
          any_work                                2017
          no_work                                 2014
          scav_find_work                          1004
      ```
      
      Test Plan:
      ./validate
      
      Check it builds with #define PROF_SPIN removed from includes/rts/Config.h
      
      Reviewers: bgamari, erikd, simonmar, hvr
      
      Reviewed By: simonmar
      
      Subscribers: rwbarton, thomie, carter
      
      GHC Trac Issues: #3553, #9221
      
      Differential Revision: https://phabricator.haskell.org/D4302
      2918abf7
  4. 26 Sep, 2017 1 commit
  5. 23 Apr, 2017 1 commit
  6. 06 Dec, 2016 1 commit
    • Simon Marlow's avatar
      Overhaul GC stats · 24e6594c
      Simon Marlow authored
      Summary:
      Visible API changes:
      
      * The C struct `GCDetails` gives the stats about a single GC.  This is
        passed to the `gcDone()` callback if one is set via the
        RtsConfig. (previously we just passed a collection of values, so this
        is more extensible, at the expense of breaking the existing API)
      
      * `RTSStats` gives cumulative stats since the start of the program,
        and includes the `GCDetails` for the most recent GC.  This struct
        can be obtained via `getRTSStats()` (the old `getGCStats()` has been
        removed, and `getGCStatsEnabled()` has been renamed to
        `getRTSStatsEnabled()`)
      
      Improvements:
      
      * The per-GC stats and cumulative stats are now cleanly separated.
      
      * Inside the RTS we have a top-level `RTSStats` struct to keep all our
        stats in, previously this was just a collection of strangely-named
        variables.  This struct is mostly just copied in `getRTSStats()`, so
        the implementation of that function is a lot shorter.
      
      * Types are more consistent.  We use a uint64_t byte count for all
        memory values, and Time for all time values.
      
      * Names are more consistent.  We use a suffix `_bytes` for all byte
        counts and `_ns` for all time values.
      
      * We now collect information about the amount of memory in large
        objects and compact objects in `GCDetails`. (the latter was the reason
        I started doing this patch but it seems to have ballooned a bit!)
      
      * I fixed a bug in the calculation of the elapsed MUT time, and added
        an ASSERT to stop the calculations going wrong in the future.
      
      For now I kept the Haskell API in `GHC.Stats` the same, by
      impedence-matching with the new API.  We could either break that API
      and make it match the C API more closely, or we could add a new API
      and deprecate the old one.  Opinions welcome.
      
      This stuff is very easy to get wrong, and it's hard to test.  Reviews
      welcome!
      
      Test Plan:
      manual testing
      validate
      
      Reviewers: bgamari, niteria, austin, ezyang, hvr, erikd, rwbarton, Phyx
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2756
      24e6594c
  7. 29 Nov, 2016 1 commit
  8. 27 Jul, 2016 1 commit
  9. 20 Jul, 2016 1 commit
    • gcampax's avatar
      Compact Regions · cf989ffe
      gcampax authored
      This brings in initial support for compact regions, as described in the
      ICFP 2015 paper "Efficient Communication and Collection with Compact
      Normal Forms" (Edward Z. Yang et.al.) and implemented by Giovanni
      Campagna.
      
      Some things may change before the 8.2 release, but I (Simon M.) wanted
      to get the main patch committed so that we can iterate.
      
      What documentation there is is in the Data.Compact module in the new
      compact package.  We'll need to extend and polish the documentation
      before the release.
      
      Test Plan:
      validate
      (new test cases included)
      
      Reviewers: ezyang, simonmar, hvr, bgamari, austin
      
      Subscribers: vikraman, Yuras, RyanGlScott, qnikst, mboes, facundominguez, rrnewton, thomie, erikd
      
      Differential Revision: https://phabricator.haskell.org/D1264
      
      GHC Trac Issues: #11493
      cf989ffe
  10. 10 Jun, 2016 1 commit
    • Simon Marlow's avatar
      Rts flags cleanup · c88f31a0
      Simon Marlow authored
      * Remove unused/old flags from the structs
      * Update old comments
      * Add missing flags to GHC.RTS
      * Simplify GHC.RTS, remove C code and use hsc2hs instead
      * Make ParFlags unconditional, and add support to GHC.RTS
      c88f31a0
  11. 04 May, 2016 1 commit
  12. 08 Jun, 2015 1 commit
    • Simon Marlow's avatar
      Fix for CAF retention when dynamically loading & unloading code · 19ec6a84
      Simon Marlow authored
      In a situaion where we have some statically-linked code and we want to
      load and unload a series of objects, we need the CAFs in the
      statically-linked code to be retained indefinitely, while the CAFs in
      the dynamically-linked code should be GC'd as normal, so that we can
      detect when the code is unloadable.  This was wrong before - we GC'd
      CAFs in the static code, leading to a crash in the rare case where we
      use a CAF, GC it, and then load a new object that uses it again.
      
      I also did some tidy up: RtsConfig now has a field keep_cafs to
      indicate whether we want CAFs to be retained in static code.
      19ec6a84
  13. 15 Dec, 2014 1 commit
  14. 25 Nov, 2014 1 commit
    • Simon Marlow's avatar
      Make clearNursery free · e22bc0de
      Simon Marlow authored
      Summary:
      clearNursery resets all the bd->free pointers of nursery blocks to
      make the blocks empty.  In profiles we've seen clearNursery taking
      significant amounts of time particularly with large -N and -A values.
      
      This patch moves the work of clearNursery to the point at which we
      actually need the new block, thereby introducing an invariant that
      blocks to the right of the CurrentNursery pointer still need their
      bd->free pointer reset.  This should make things faster overall,
      because we don't need to clear blocks that we don't use.
      
      Test Plan: validate
      
      Reviewers: AndreasVoellmy, ezyang, austin
      
      Subscribers: thomie, carter, ezyang, simonmar
      
      Differential Revision: https://phabricator.haskell.org/D318
      e22bc0de
  15. 20 Aug, 2014 1 commit
  16. 04 Dec, 2013 1 commit
    • parcs's avatar
      Move the allocation of CAF blackholes into 'newCAF' (#8590) · 55c703b8
      parcs authored
      We now do the allocation of the blackhole indirection closure inside the
      RTS procedure 'newCAF' instead of generating the allocation code inline
      in the closure body of each CAF.  This slightly decreases code size in
      modules with a lot of CAFs.
      
      As a result of this change, for example, the size of DynFlags.o drops by
      ~60KB and HsExpr.o by ~100KB.
      55c703b8
  17. 22 Nov, 2013 1 commit
    • Austin Seipp's avatar
      GHCi: Properly generate jump code for ARM (#8380) · 5bab1a57
      Austin Seipp authored
      This adds code for jumping to given addresses for ARM, written by Ben
      Gamari.
      
      However, when allocating new infotables for bytecode (which is where
      this jump code occurs), we need to be sure to flush the cache on the
      execute pointer returned from allocateExec() - on systems like ARM, the
      processor won't reliably read back code or automatically cache flush,
      where x86 will.
      
      So we add a new flushExec primitive to call out to GCC's
      __builtin___clear_cache primitive, which will properly generate the
      correct code (nothing on x86, and a call to libgcc's __clear_cache on
      ARM) and make sure we use it after writing the code out.
      Authored-by: default avatarBen Gamari <bgamari.foss@gmail.com>
      Authored-by: default avatarAustin Seipp <austin@well-typed.com>
      Signed-off-by: default avatarAustin Seipp <austin@well-typed.com>
      5bab1a57
  18. 21 Nov, 2013 1 commit
    • Simon Marlow's avatar
      In the DEBUG rts, track when CAFs are GC'd · e82fa829
      Simon Marlow authored
      This resurrects some old code and makes it work again.  The idea is
      that we want to get an error message if we ever enter a CAF that has
      been GC'd, rather than following its indirection which will likely
      cause a segfault.  Without this patch, these bugs are hard to track
      down in gdb, because the IND_STATIC code overwrites R1 (the pointer to
      the CAF) with its indirectee before jumping into bad memory, so we've
      lost the address of the CAF that got GC'd.
      
      Some associated refactoring while I was here.
      e82fa829
  19. 15 Sep, 2013 1 commit
  20. 08 Sep, 2013 2 commits
  21. 15 Jun, 2013 1 commit
  22. 08 Jun, 2013 1 commit
  23. 14 Feb, 2013 1 commit
    • Simon Marlow's avatar
      Simplify the allocation stats accounting · 65a0e1eb
      Simon Marlow authored
      We were doing it in two different ways and asserting that the results
      were the same.  In most cases they were, but I found one case where
      they weren't: the GC itself allocates some memory for running
      finalizers, and this memory was accounted for one way but not the
      other.
      
      It was simpler to remove the old way of counting allocation that to
      try to fix it up, so I did that.
      65a0e1eb
  24. 21 Sep, 2012 1 commit
  25. 07 Sep, 2012 2 commits
    • Simon Marlow's avatar
      Lots of nat -> StgWord changes · bf2d58c2
      Simon Marlow authored
      bf2d58c2
    • Simon Marlow's avatar
      Deprecate lnat, and use StgWord instead · 41737f12
      Simon Marlow authored
      lnat was originally "long unsigned int" but we were using it when we
      wanted a 64-bit type on a 64-bit machine.  This broke on Windows x64,
      where long == int == 32 bits.  Using types of unspecified size is bad,
      but what we really wanted was a type with N bits on an N-bit machine.
      StgWord is exactly that.
      
      lnat was mentioned in some APIs that clients might be using
      (e.g. StackOverflowHook()), so we leave it defined but with a comment
      to say that it's deprecated.
      41737f12
  26. 19 Jun, 2012 1 commit
  27. 04 Apr, 2012 1 commit
    • Duncan Coutts's avatar
      Change the presentation of parallel GC work balance in +RTS -s · cd930da1
      Duncan Coutts authored
      Also rename internal variables to make the names match what they hold.
      The parallel GC work balance is calculated using the total amount of
      memory copied by all GC threads, and the maximum copied by any
      individual thread. You have serial GC when the max is the same as
      copied, and perfectly balanced GC when total/max == n_caps.
      
      Previously we presented this as the ratio total/max and told users
      that the serial value was 1 and the ideal value N, for N caps, e.g.
      
        Parallel GC work balance: 1.05 (4045071 / 3846774, ideal 2)
      
      The downside of this is that the user always has to keep in mind the
      number of cores being used. Our new presentation uses a normalised
      scale 0--1 as a percentage. The 0% means completely serial and 100%
      is perfect balance, e.g.
      
        Parallel GC work balance: 4.56% (serial 0%, perfect 100%)
      cd930da1
  28. 09 Jan, 2012 1 commit
  29. 17 Oct, 2011 1 commit
  30. 06 Aug, 2011 1 commit
  31. 31 Jul, 2011 1 commit
    • Edward Z. Yang's avatar
      Implement public interface for GC statistics. · 2088abaf
      Edward Z. Yang authored
      We add a new RTS flag -T for collecting statistics but not giving any
      new inputs.  There is one new struct in rts/storage/GC.h: GCStats.  We
      add two new global counters current_residency and current_slop, which
      are useful for in-program GC statistics.
      
      See GHC.Stats in base for a Haskell interface to this functionality.
      Signed-off-by: Edward Z. Yang's avatarEdward Z. Yang <ezyang@mit.edu>
      2088abaf
  32. 25 May, 2011 1 commit
  33. 02 Feb, 2011 3 commits
    • Simon Marlow's avatar
      GC refactoring and cleanup · 18896fa2
      Simon Marlow authored
      Now we keep any partially-full blocks in the gc_thread[] structs after
      each GC, rather than moving them to the generation.  This should give
      us slightly better locality (though I wasn't able to measure any
      difference).
      
      Also in this patch: better sanity checking with THREADED.
      18896fa2
    • Simon Marlow's avatar
      A small GC optimisation · bef3da1e
      Simon Marlow authored
      Store the *number* of the destination generation in the Bdescr struct,
      so that in evacuate() we don't have to deref gen to get it.
      This is another improvement ported over from my GC branch.
      bef3da1e
    • Simon Marlow's avatar
      Remove the per-generation mutable lists · 32907722
      Simon Marlow authored
      Now that we use the per-capability mutable lists exclusively.
      32907722
  34. 21 Dec, 2010 1 commit
    • Simon Marlow's avatar
      Count allocations more accurately · db0c13a4
      Simon Marlow authored
      The allocation stats (+RTS -s etc.) used to count the slop at the end
      of each nursery block (except the last) as allocated space, now we
      count the allocated words accurately.  This should make allocation
      figures more predictable, too.
      
      This has the side effect of reducing the apparent allocations by a
      small amount (~1%), so remember to take this into account when looking
      at nofib results.
      db0c13a4
  35. 17 Jun, 2010 1 commit