1. 15 Apr, 2020 2 commits
  2. 17 Mar, 2020 1 commit
    • Ömer Sinan Ağacan's avatar
      Update sanity checking for TSOs: · 92327e3a
      Ömer Sinan Ağacan authored
      - Remove an invalid assumption about GC checking what_next field. The GC
        doesn't care about what_next at all, if a TSO is reachable then all
        its pointers are followed (other than global_tso, which is only
        followed by compacting GC).
      
      - Remove checkSTACK in checkTSO: TSO stacks will be visited in
        checkHeapChain, or checkLargeObjects etc.
      
      - Add an assertion in checkTSO to check that the global_link field is
        sane.
      
      - Did some refactor to remove forward decls in checkGlobalTSOList and
        added braces around single-statement if statements.
      92327e3a
  3. 22 Oct, 2019 2 commits
  4. 21 Oct, 2019 3 commits
    • Ben Gamari's avatar
    • Ben Gamari's avatar
      rts: Implement concurrent collection in the nonmoving collector · bd8e3ff4
      Ben Gamari authored
      This extends the non-moving collector to allow concurrent collection.
      
      The full design of the collector implemented here is described in detail
      in a technical note
      
          B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
          Compiler" (2018)
      
      This extension involves the introduction of a capability-local
      remembered set, known as the /update remembered set/, which tracks
      objects which may no longer be visible to the collector due to mutation.
      To maintain this remembered set we introduce a write barrier on
      mutations which is enabled while a concurrent mark is underway.
      
      The update remembered set representation is similar to that of the
      nonmoving mark queue, being a chunked array of `MarkEntry`s. Each
      `Capability` maintains a single accumulator chunk, which it flushed
      when it (a) is filled, or (b) when the nonmoving collector enters its
      post-mark synchronization phase.
      
      While the write barrier touches a significant amount of code it is
      conceptually straightforward: the mutator must ensure that the referee
      of any pointer it overwrites is added to the update remembered set.
      However, there are a few details:
      
       * In the case of objects with a dirty flag (e.g. `MVar`s) we can
         exploit the fact that only the *first* mutation requires a write
         barrier.
      
       * Weak references, as usual, complicate things. In particular, we must
         ensure that the referee of a weak object is marked if dereferenced by
         the mutator. For this we (unfortunately) must introduce a read
         barrier, as described in Note [Concurrent read barrier on deRefWeak#]
         (in `NonMovingMark.c`).
      
       * Stable names are also a bit tricky as described in Note [Sweeping
         stable names in the concurrent collector] (`NonMovingSweep.c`).
      
      We take quite some pains to ensure that the high thread count often seen
      in parallel Haskell applications doesn't affect pause times. To this end
      we allow thread stacks to be marked either by the thread itself (when it
      is executed or stack-underflows) or the concurrent mark thread (if the
      thread owning the stack is never scheduled). There is a non-trivial
      handshake to ensure that this happens without racing which is described
      in Note [StgStack dirtiness flags and concurrent marking].
      Co-Authored-by: Ömer Sinan Ağacan's avatarÖmer Sinan Ağacan <omer@well-typed.com>
      bd8e3ff4
    • Ömer Sinan Ağacan's avatar
      rts: Non-concurrent mark and sweep · 68e0647f
      Ömer Sinan Ağacan authored
      This implements the core heap structure and a serial mark/sweep
      collector which can be used to manage the oldest-generation heap.
      This is the first step towards a concurrent mark-and-sweep collector
      aimed at low-latency applications.
      
      The full design of the collector implemented here is described in detail
      in a technical note
      
          B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
          Compiler" (2018)
      
      The basic heap structure used in this design is heavily inspired by
      
          K. Ueno & A. Ohori. "A fully concurrent garbage collector for
          functional programs on multicore processors." /ACM SIGPLAN Notices/
          Vol. 51. No. 9 (presented by ICFP 2016)
      
      This design is intended to allow both marking and sweeping
      concurrent to execution of a multi-core mutator. Unlike the Ueno design,
      which requires no global synchronization pauses, the collector
      introduced here requires a stop-the-world pause at the beginning and end
      of the mark phase.
      
      To avoid heap fragmentation, the allocator consists of a number of
      fixed-size /sub-allocators/. Each of these sub-allocators allocators into
      its own set of /segments/, themselves allocated from the block
      allocator. Each segment is broken into a set of fixed-size allocation
      blocks (which back allocations) in addition to a bitmap (used to track
      the liveness of blocks) and some additional metadata (used also used
      to track liveness).
      
      This heap structure enables collection via mark-and-sweep, which can be
      performed concurrently via a snapshot-at-the-beginning scheme (although
      concurrent collection is not implemented in this patch).
      
      The mark queue is a fairly straightforward chunked-array structure.
      The representation is a bit more verbose than a typical mark queue to
      accomodate a combination of two features:
      
       * a mark FIFO, which improves the locality of marking, reducing one of
         the major overheads seen in mark/sweep allocators (see [1] for
         details)
      
       * the selector optimization and indirection shortcutting, which
         requires that we track where we found each reference to an object
         in case we need to update the reference at a later point (e.g. when
         we find that it is an indirection). See Note [Origin references in
         the nonmoving collector] (in `NonMovingMark.h`) for details.
      
      Beyond this the mark/sweep is fairly run-of-the-mill.
      
      [1] R. Garner, S.M. Blackburn, D. Frampton. "Effective Prefetch for
          Mark-Sweep Garbage Collection." ISMM 2007.
      Co-Authored-By: Ben Gamari's avatarBen Gamari <ben@well-typed.com>
      68e0647f
  5. 18 Oct, 2019 1 commit
  6. 18 Aug, 2019 1 commit
  7. 28 Jun, 2019 1 commit
    • Travis Whitaker's avatar
      Correct closure observation, construction, and mutation on weak memory machines. · 11bac115
      Travis Whitaker authored
      Here the following changes are introduced:
          - A read barrier machine op is added to Cmm.
          - The order in which a closure's fields are read and written is changed.
          - Memory barriers are added to RTS code to ensure correctness on
            out-or-order machines with weak memory ordering.
      
      Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this
      is lowered to an instruction that ensures memory reads that occur after said
      instruction in program order are not performed before reads coming before said
      instruction in program order. On machines with strong memory ordering properties
      (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so
      MO_ReadBarrier is simply erased. However, such an instruction is necessary on
      weakly ordered machines, e.g. ARM and PowerPC.
      
      Weam memory ordering has consequences for how closures are observed and mutated.
      For example, consider a closure that needs to be updated to an indirection. In
      order for the indirection to be safe for concurrent observers to enter, said
      observers must read the indirection's info table before they read the
      indirectee. Furthermore, the entering observer makes assumptions about the
      closure based on its info table contents, e.g. an INFO_TYPE of IND imples the
      closure has an indirectee pointer that is safe to follow.
      
      When a closure is updated with an indirection, both its info table and its
      indirectee must be written. With weak memory ordering, these two writes can be
      arbitrarily reordered, and perhaps even interleaved with other threads' reads
      and writes (in the absence of memory barrier instructions). Consider this
      example of a bad reordering:
      
      - An updater writes to a closure's info table (INFO_TYPE is now IND).
      - A concurrent observer branches upon reading the closure's INFO_TYPE as IND.
      - A concurrent observer reads the closure's indirectee and enters it. (!!!)
      - An updater writes the closure's indirectee.
      
      Here the update to the indirectee comes too late and the concurrent observer has
      jumped off into the abyss. Speculative execution can also cause us issues,
      consider:
      
      - An observer is about to case on a value in closure's info table.
      - The observer speculatively reads one or more of closure's fields.
      - An updater writes to closure's info table.
      - The observer takes a branch based on the new info table value, but with the
        old closure fields!
      - The updater writes to the closure's other fields, but its too late.
      
      Because of these effects, reads and writes to a closure's info table must be
      ordered carefully with respect to reads and writes to the closure's other
      fields, and memory barriers must be placed to ensure that reads and writes occur
      in program order. Specifically, updates to a closure must follow the following
      pattern:
      
      - Update the closure's (non-info table) fields.
      - Write barrier.
      - Update the closure's info table.
      
      Observing a closure's fields must follow the following pattern:
      
      - Read the closure's info pointer.
      - Read barrier.
      - Read the closure's (non-info table) fields.
      
      This patch updates RTS code to obey this pattern. This should fix long-standing
      SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting
      out-of-order execution) and PowerPC. This fixes issue #15449.
      Co-Authored-By: Ben Gamari's avatarBen Gamari <ben@well-typed.com>
      11bac115
  8. 13 Feb, 2019 1 commit
  9. 10 Jan, 2019 1 commit
  10. 02 Nov, 2018 1 commit
    • Ben Gamari's avatar
      rts: Add FALLTHROUGH macro · 6bb8aaa3
      Ben Gamari authored
      Instead of using the GCC `/* fallthrough */` syntax we now use the
      `__attribute__((fallthrough))`, which Phyx says should be more portable
      than the former.
      
      Also adds a missing fallthrough annotation in the MachO linker,
      fixing #14613.
      
      Reviewers: erikd, simonmar
      
      Reviewed By: simonmar
      
      Subscribers: rwbarton, carter
      
      GHC Trac Issues: #14613
      
      Differential Revision: https://phabricator.haskell.org/D5292
      6bb8aaa3
  11. 24 Sep, 2018 1 commit
  12. 12 Jul, 2018 1 commit
    • Simon Marlow's avatar
      Fix deadlock between STM and throwTo · 7fc418df
      Simon Marlow authored
      There was a lock-order reversal between lockTSO() and the TVar lock,
      see #15136 for the details.
      
      It turns out we can fix this pretty easily by just deleting all the
      locking code(!).  The principle for unblocking a `BlockedOnSTM` thread
      then becomes the same as for other kinds of blocking: if the TSO
      belongs to this capability then we do it directly, otherwise we send a
      message to the capability that owns the TSO. That is, a thread blocked
      on STM is owned by its capability, as it should be.
      
      The possible downside of this is that we might send multiple messages
      to wake up a thread when the thread is on another capability. This is
      safe, it's just not very efficient.  I'll try to do some experiments
      to see if this is a problem.
      
      Test Plan: Test case from #15136 doesn't deadlock any more.
      
      Reviewers: bgamari, osa1, erikd
      
      Reviewed By: osa1
      
      Subscribers: rwbarton, thomie, carter
      
      GHC Trac Issues: #15136
      
      Differential Revision: https://phabricator.haskell.org/D4956
      7fc418df
  13. 09 Jun, 2018 1 commit
  14. 05 Jun, 2018 1 commit
    • Ömer Sinan Ağacan's avatar
      Rename some mutable closure types for consistency · 4075656e
      Ömer Sinan Ağacan authored
      SMALL_MUT_ARR_PTRS_FROZEN0 -> SMALL_MUT_ARR_PTRS_FROZEN_DIRTY
      SMALL_MUT_ARR_PTRS_FROZEN  -> SMALL_MUT_ARR_PTRS_FROZEN_CLEAN
      MUT_ARR_PTRS_FROZEN0       -> MUT_ARR_PTRS_FROZEN_DIRTY
      MUT_ARR_PTRS_FROZEN        -> MUT_ARR_PTRS_FROZEN_CLEAN
      
      Naming is now consistent with other CLEAR/DIRTY objects (MVAR, MUT_VAR,
      MUT_ARR_PTRS).
      
      (alternatively we could rename MVAR_DIRTY/MVAR_CLEAN etc. to MVAR0/MVAR)
      
      Removed a few comments in Scav.c about FROZEN0 being on the mut_list
      because it's now clear from the closure type.
      
      Reviewers: bgamari, simonmar, erikd
      
      Reviewed By: simonmar
      
      Subscribers: rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4784
      4075656e
  15. 16 May, 2018 1 commit
    • Simon Marlow's avatar
      Merge FUN_STATIC closure with its SRT · 838b6903
      Simon Marlow authored
      Summary:
      The idea here is to save a little code size and some work in the GC,
      by collapsing FUN_STATIC closures and their SRTs.
      
      This is (4) in a series; see D4632 for more details.
      
      There's a tradeoff here: more complexity in the compiler in exchange
      for a modest code size reduction (probably around 0.5%).
      
      Results:
      * GHC binary itself (statically linked) is 1% smaller
      * -0.2% binary sizes in nofib (-0.5% module sizes)
      
      Full nofib results comparing D4634 with this: P177 (ignore runtimes,
      these aren't stable on my laptop)
      
      Test Plan: validate, nofib
      
      Reviewers: bgamari, niteria, simonpj, erikd
      
      Subscribers: thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4637
      838b6903
  16. 14 May, 2018 2 commits
  17. 19 Apr, 2018 1 commit
  18. 14 May, 2017 1 commit
  19. 29 Apr, 2017 1 commit
  20. 23 Apr, 2017 1 commit
  21. 07 Dec, 2016 1 commit
    • Simon Marlow's avatar
      Overhaul of Compact Regions (#12455) · 7036fde9
      Simon Marlow authored
      Summary:
      This commit makes various improvements and addresses some issues with
      Compact Regions (aka Compact Normal Forms).
      
      This was the most important thing I wanted to fix.  Compaction
      previously prevented GC from running until it was complete, which
      would be a problem in a multicore setting.  Now, we compact using a
      hand-written Cmm routine that can be interrupted at any point.  When a
      GC is triggered during a sharing-enabled compaction, the GC has to
      traverse and update the hash table, so this hash table is now stored
      in the StgCompactNFData object.
      
      Previously, compaction consisted of a deepseq using the NFData class,
      followed by a traversal in C code to copy the data.  This is now done
      in a single pass with hand-written Cmm (see rts/Compact.cmm). We no
      longer use the NFData instances, instead the Cmm routine evaluates
      components directly as it compacts.
      
      The new compaction is about 50% faster than the old one with no
      sharing, and a little faster on average with sharing (the cost of the
      hash table dominates when we're doing sharing).
      
      Static objects that don't (transitively) refer to any CAFs don't need
      to be copied into the compact region.  In particular this means we
      often avoid copying Char values and small Int values, because these
      are static closures in the runtime.
      
      Each Compact# object can support a single compactAdd# operation at any
      given time, so the Data.Compact library now enforces mutual exclusion
      using an MVar stored in the Compact object.
      
      We now get exceptions rather than killing everything with a barf()
      when we encounter an object that cannot be compacted (a function, or a
      mutable object).  We now also detect pinned objects, which can't be
      compacted either.
      
      The Data.Compact API has been refactored and cleaned up.  A new
      compactSize operation returns the size (in bytes) of the compact
      object.
      
      Most of the documentation is in the Haddock docs for the compact
      library, which I've expanded and improved here.
      
      Various comments in the code have been improved, especially the main
      Note [Compact Normal Forms] in rts/sm/CNF.c.
      
      I've added a few tests, and expanded a few of the tests that were
      there.  We now also run the tests with GHCi, and in a new test way
      that enables sanity checking (+RTS -DS).
      
      There's a benchmark in libraries/compact/tests/compact_bench.hs for
      measuring compaction speed and comparing sharing vs. no sharing.
      
      The field totalDataW in StgCompactNFData was unnecessary.
      
      Test Plan:
      * new unit tests
      * validate
      * tested manually that we can compact Data.Aeson data
      
      Reviewers: gcampax, bgamari, ezyang, austin, niteria, hvr, erikd
      
      Subscribers: thomie, simonpj
      
      Differential Revision: https://phabricator.haskell.org/D2751
      
      GHC Trac Issues: #12455
      7036fde9
  22. 29 Nov, 2016 1 commit
  23. 14 Nov, 2016 1 commit
    • Simon Marlow's avatar
      Remove CONSTR_STATIC · 55d535da
      Simon Marlow authored
      Summary:
      We currently have two info tables for a constructor
      
      * XXX_con_info: the info table for a heap-resident instance of the
        constructor, It has type CONSTR, or one of the specialised types like
        CONSTR_1_0
      
      * XXX_static_info: the info table for a static instance of this
        constructor, which has type CONSTR_STATIC or CONSTR_STATIC_NOCAF.
      
      I'm getting rid of the latter, and using the `con_info` info table for
      both static and dynamic constructors.  For rationale and more details
      see Note [static constructors] in SMRep.hs.
      
      I also removed these macros: `isSTATIC()`, `ip_STATIC()`,
      `closure_STATIC()`, since they relied on the CONSTR/CONSTR_STATIC
      distinction, and anyway HEAP_ALLOCED() does the same job.
      
      Test Plan: validate
      
      Reviewers: bgamari, simonpj, austin, gcampax, hvr, niteria, erikd
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2690
      
      GHC Trac Issues: #12455
      55d535da
  24. 03 Aug, 2016 1 commit
  25. 20 Jul, 2016 1 commit
    • gcampax's avatar
      Compact Regions · cf989ffe
      gcampax authored
      This brings in initial support for compact regions, as described in the
      ICFP 2015 paper "Efficient Communication and Collection with Compact
      Normal Forms" (Edward Z. Yang et.al.) and implemented by Giovanni
      Campagna.
      
      Some things may change before the 8.2 release, but I (Simon M.) wanted
      to get the main patch committed so that we can iterate.
      
      What documentation there is is in the Data.Compact module in the new
      compact package.  We'll need to extend and polish the documentation
      before the release.
      
      Test Plan:
      validate
      (new test cases included)
      
      Reviewers: ezyang, simonmar, hvr, bgamari, austin
      
      Subscribers: vikraman, Yuras, RyanGlScott, qnikst, mboes, facundominguez, rrnewton, thomie, erikd
      
      Differential Revision: https://phabricator.haskell.org/D1264
      
      GHC Trac Issues: #11493
      cf989ffe
  26. 17 May, 2016 1 commit
    • Erik de Castro Lopo's avatar
      rts: More const correct-ness fixes · 33c029dd
      Erik de Castro Lopo authored
      In addition to more const-correctness fixes this patch fixes an
      infelicity of the previous const-correctness patch (995cf0f3) which
      left `UNTAG_CLOSURE` taking a `const StgClosure` pointer parameter
      but returning a non-const pointer. Here we restore the original type
      signature of `UNTAG_CLOSURE` and add a new function
      `UNTAG_CONST_CLOSURE` which takes and returns a const `StgClosure`
      pointer and uses that wherever possible.
      
      Test Plan: Validate on Linux, OS X and Windows
      
      Reviewers: Phyx, hsyl20, bgamari, austin, simonmar, trofi
      
      Reviewed By: simonmar, trofi
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2231
      33c029dd
  27. 12 May, 2016 1 commit
  28. 10 May, 2016 1 commit
    • wereHamster's avatar
      Use stdint types for Stg{Word,Int}{8,16,32,64} · 260a5648
      wereHamster authored
      We can't define Stg{Int,Word} in terms of {,u}intptr_t because STG
      depends on them being the exact same size as void*, and {,u}intptr_t
      does not make that guarantee. Furthermore, we also need to define
      StgHalf{Int,Word}, so the preprocessor if needs to stay. But we can at
      least keep it in a single place instead of repeating it in various
      files.
      
      Also define STG_{INT,WORD}{8,16,32,64}_{MIN,MAX} and use it in HsFFI.h,
      further reducing the need for CPP in other files.
      
      Reviewers: austin, bgamari, simonmar, hvr, erikd
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2182
      260a5648
  29. 04 May, 2016 1 commit
  30. 12 Apr, 2016 1 commit
    • Simon Marlow's avatar
      Allocate blocks in the GC in batches · f4446c5b
      Simon Marlow authored
      Avoids contention for the block allocator lock in the GC; this can be
      seen in the gc_alloc_block_sync counter emitted by +RTS -s.
      
      I experimented with this a while ago, and there was already
      commented-out code for it in GCUtils.c, but I've now improved it so that
      it doesn't result in significantly worse memory usage.
      
      * The old method of putting spare blocks on ws->part_list was wasteful,
        the spare blocks are now shared between all generations and retained
        between GCs.
      
      * repeated allocGroup() results in fragmentation, so I switched to using
        allocLargeChunk() instead which is fragmentation-friendly; we already
        use it for the same reason in nursery allocation.
      f4446c5b
  31. 23 Jan, 2016 1 commit
    • Joachim Breitner's avatar
      Remove unused IND_PERM · f42db157
      Joachim Breitner authored
      it seems that this closure type has not been in use since 5d52d9, so all
      this is dead and untested code. This removes it. Some of the code might
      be useful for a counting indirection as described in #10613, so when
      implementing that, have a look at what this commit removes.
      
      Test Plan: validate on harbormaster
      
      Reviewers: austin, bgamari, simonmar
      
      Reviewed By: simonmar
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D1821
      f42db157
  32. 11 Sep, 2015 1 commit
  33. 28 Jul, 2015 1 commit
    • Simon Marlow's avatar
      Eliminate zero_static_objects_list() · f83aab95
      Simon Marlow authored
      Summary:
      [Revised version of D1076 that was committed and then backed out]
      
      In a workload with a large amount of code, zero_static_objects_list()
      takes a significant amount of time, and furthermore it is in the
      single-threaded part of the GC.
      
      This patch uses a slightly fiddly scheme for marking objects on the
      static object lists, using a flag in the low 2 bits that flips between
      two states to indicate whether an object has been visited during this
      GC or not.  We also have to take into account objects that have not
      been visited yet, which might appear at any time due to runtime linking.
      
      Test Plan: validate
      
      Reviewers: austin, ezyang, rwbarton, bgamari, thomie
      
      Reviewed By: bgamari, thomie
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D1106
      f83aab95
  34. 27 Jul, 2015 1 commit
  35. 22 Jul, 2015 1 commit
    • Simon Marlow's avatar
      Eliminate zero_static_objects_list() · b949c96b
      Simon Marlow authored
      Summary:
      In a workload with a large amount of code, zero_static_objects_list()
      takes a significant amount of time, and furthermore it is in the
      single-threaded part of the GC.
      
      This patch uses a slightly fiddly scheme for marking objects on the
      static object lists, using a flag in the low 2 bits that flips between
      two states to indicate whether an object has been visited during this
      GC or not.  We also have to take into account objects that have not
      been visited yet, which might appear at any time due to runtime linking.
      
      Test Plan: validate
      
      Reviewers: austin, bgamari, ezyang, rwbarton
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D1076
      b949c96b