This project is mirrored from https://gitlab.haskell.org/ghc/ghc.git. Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts, and can be resumed by a project maintainer.
Last successful update .
  1. 19 Oct, 2019 1 commit
  2. 18 Oct, 2019 1 commit
    • Ben Gamari's avatar
      rts: Implement concurrent collection in the nonmoving collector · d7017446
      Ben Gamari authored
      This extends the non-moving collector to allow concurrent collection.
      
      The full design of the collector implemented here is described in detail
      in a technical note
      
          B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
          Compiler" (2018)
      
      This extension involves the introduction of a capability-local
      remembered set, known as the /update remembered set/, which tracks
      objects which may no longer be visible to the collector due to mutation.
      To maintain this remembered set we introduce a write barrier on
      mutations which is enabled while a concurrent mark is underway.
      
      The update remembered set representation is similar to that of the
      nonmoving mark queue, being a chunked array of `MarkEntry`s. Each
      `Capability` maintains a single accumulator chunk, which it flushed
      when it (a) is filled, or (b) when the nonmoving collector enters its
      post-mark synchronization phase.
      
      While the write barrier touches a significant amount of code it is
      conceptually straightforward: the mutator must ensure that the referee
      of any pointer it overwrites is added to the update remembered set.
      However, there are a few details:
      
       * In the case of objects with a dirty flag (e.g. `MVar`s) we can
         exploit the fact that only the *first* mutation requires a write
         barrier.
      
       * Weak references, as usual, complicate things. In particular, we must
         ensure that the referee of a weak object is marked if dereferenced by
         the mutator. For this we (unfortunately) must introduce a read
         barrier, as described in Note [Concurrent read barrier on deRefWeak#]
         (in `NonMovingMark.c`).
      
       * Stable names are also a bit tricky as described in Note [Sweeping
         stable names in the concurrent collector] (`NonMovingSweep.c`).
      
      We take quite some pains to ensure that the high thread count often seen
      in parallel Haskell applications doesn't affect pause times. To this end
      we allow thread stacks to be marked either by the thread itself (when it
      is executed or stack-underflows) or the concurrent mark thread (if the
      thread owning the stack is never scheduled). There is a non-trivial
      handshake to ensure that this happens without racing which is described
      in Note [StgStack dirtiness flags and concurrent marking].
      Co-Authored-by: Ömer Sinan Ağacan's avatarÖmer Sinan Ağacan <omer@well-typed.com>
      d7017446
  3. 28 Jun, 2019 1 commit
    • Travis Whitaker's avatar
      Correct closure observation, construction, and mutation on weak memory machines. · 11bac115
      Travis Whitaker authored
      Here the following changes are introduced:
          - A read barrier machine op is added to Cmm.
          - The order in which a closure's fields are read and written is changed.
          - Memory barriers are added to RTS code to ensure correctness on
            out-or-order machines with weak memory ordering.
      
      Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this
      is lowered to an instruction that ensures memory reads that occur after said
      instruction in program order are not performed before reads coming before said
      instruction in program order. On machines with strong memory ordering properties
      (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so
      MO_ReadBarrier is simply erased. However, such an instruction is necessary on
      weakly ordered machines, e.g. ARM and PowerPC.
      
      Weam memory ordering has consequences for how closures are observed and mutated.
      For example, consider a closure that needs to be updated to an indirection. In
      order for the indirection to be safe for concurrent observers to enter, said
      observers must read the indirection's info table before they read the
      indirectee. Furthermore, the entering observer makes assumptions about the
      closure based on its info table contents, e.g. an INFO_TYPE of IND imples the
      closure has an indirectee pointer that is safe to follow.
      
      When a closure is updated with an indirection, both its info table and its
      indirectee must be written. With weak memory ordering, these two writes can be
      arbitrarily reordered, and perhaps even interleaved with other threads' reads
      and writes (in the absence of memory barrier instructions). Consider this
      example of a bad reordering:
      
      - An updater writes to a closure's info table (INFO_TYPE is now IND).
      - A concurrent observer branches upon reading the closure's INFO_TYPE as IND.
      - A concurrent observer reads the closure's indirectee and enters it. (!!!)
      - An updater writes the closure's indirectee.
      
      Here the update to the indirectee comes too late and the concurrent observer has
      jumped off into the abyss. Speculative execution can also cause us issues,
      consider:
      
      - An observer is about to case on a value in closure's info table.
      - The observer speculatively reads one or more of closure's fields.
      - An updater writes to closure's info table.
      - The observer takes a branch based on the new info table value, but with the
        old closure fields!
      - The updater writes to the closure's other fields, but its too late.
      
      Because of these effects, reads and writes to a closure's info table must be
      ordered carefully with respect to reads and writes to the closure's other
      fields, and memory barriers must be placed to ensure that reads and writes occur
      in program order. Specifically, updates to a closure must follow the following
      pattern:
      
      - Update the closure's (non-info table) fields.
      - Write barrier.
      - Update the closure's info table.
      
      Observing a closure's fields must follow the following pattern:
      
      - Read the closure's info pointer.
      - Read barrier.
      - Read the closure's (non-info table) fields.
      
      This patch updates RTS code to obey this pattern. This should fix long-standing
      SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting
      out-of-order execution) and PowerPC. This fixes issue #15449.
      Co-Authored-By: Ben Gamari's avatarBen Gamari <ben@well-typed.com>
      11bac115
  4. 06 Sep, 2018 1 commit
  5. 19 Mar, 2018 1 commit
    • Douglas Wilson's avatar
      rts: Add --internal-counters RTS flag and several counters · 2918abf7
      Douglas Wilson authored
      The existing internal counters:
      * gc_alloc_block_sync
      * whitehole_spin
      * gen[g].sync
      * gen[1].sync
      
      are now not shown in the -s report unless --internal-counters is also passed.
      
      If --internal-counters is passed we now show the counters above, reformatted, as
      well as several other counters. In particular, we now count the yieldThread()
      calls that SpinLocks do as well as their spins.
      
      The added counters are:
      * gc_spin (spin and yield)
      * mut_spin (spin and yield)
      * whitehole_threadPaused (spin only)
      * whitehole_executeMessage (spin only)
      * whitehole_lockClosure (spin only)
      * waitForGcThreadsd (spin and yield)
      
      As well as the following, which are not SpinLock-like things:
      * any_work
      * do_work
      * scav_find_work
      
      See the Note for descriptions of what these counters are.
      
      We add busy_wait_nops in these loops along with the counter increment where it
      was absent.
      
      Old internal counters output:
      ```
      gc_alloc_block_sync: 0
      whitehole_gc_spin: 0
      gen[0].sync: 0
      gen[1].sync: 0
      ```
      
      New internal counters output:
      ```
      Internal Counters:
                                                 Spins        Yields
          gc_alloc_block_sync                      323             0
          gc_spin                              9016713           752
          mut_spin                            57360944         47716
          whitehole_gc                               0           n/a
          whitehole_threadPaused                     0           n/a
          whitehole_executeMessage                   0           n/a
          whitehole_lockClosure                      0             0
          waitForGcThreads                           2           415
          gen[0].sync                                6             0
          gen[1].sync                                1             0
      
          any_work                                2017
          no_work                                 2014
          scav_find_work                          1004
      ```
      
      Test Plan:
      ./validate
      
      Check it builds with #define PROF_SPIN removed from includes/rts/Config.h
      
      Reviewers: bgamari, erikd, simonmar, hvr
      
      Reviewed By: simonmar
      
      Subscribers: rwbarton, thomie, carter
      
      GHC Trac Issues: #3553, #9221
      
      Differential Revision: https://phabricator.haskell.org/D4302
      2918abf7
  6. 19 Dec, 2017 1 commit
  7. 03 Jul, 2017 1 commit
  8. 27 Jun, 2017 1 commit
    • Ben Gamari's avatar
      rts: Clarify whitehole logic in threadPaused · 1e471265
      Ben Gamari authored
      Previously we would look at the indirectee field of a WHITEHOLE object.
      However, WHITEHOLE isn't a sort of indirection and therefore has no
      indirectee field.
      
      I encountered this while investigating #13615, although it doesn't fix
      that bug.
      
      Test Plan: Validate
      
      Reviewers: simonmar, austin, erikd
      
      Subscribers: rwbarton, thomie
      
      GHC Trac Issues: #13615
      
      Differential Revision: https://phabricator.haskell.org/D3674
      1e471265
  9. 29 Apr, 2017 1 commit
  10. 29 Nov, 2016 1 commit
  11. 17 May, 2016 1 commit
    • Erik de Castro Lopo's avatar
      rts: More const correct-ness fixes · 33c029dd
      Erik de Castro Lopo authored
      In addition to more const-correctness fixes this patch fixes an
      infelicity of the previous const-correctness patch (995cf0f3) which
      left `UNTAG_CLOSURE` taking a `const StgClosure` pointer parameter
      but returning a non-const pointer. Here we restore the original type
      signature of `UNTAG_CLOSURE` and add a new function
      `UNTAG_CONST_CLOSURE` which takes and returns a const `StgClosure`
      pointer and uses that wherever possible.
      
      Test Plan: Validate on Linux, OS X and Windows
      
      Reviewers: Phyx, hsyl20, bgamari, austin, simonmar, trofi
      
      Reviewed By: simonmar, trofi
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2231
      33c029dd
  12. 04 May, 2016 1 commit
  13. 07 Jul, 2015 1 commit
  14. 29 Sep, 2014 1 commit
  15. 01 Sep, 2014 1 commit
    • Arash Rouhani's avatar
      Refactor stack squeezing logic · 2f343b0c
      Arash Rouhani authored
      Summary:
      This patch is only to make the code easier to read.
      
      In addition, this is the first patch I send with the arc/differential workflow.
      So I start with something very small.
      
      Test Plan: I have not even tried to compile it yet.
      
      Reviewers: simonmar, austin
      
      Reviewed By: austin
      
      Subscribers: simonmar, ezyang, carter
      
      Differential Revision: https://phabricator.haskell.org/D189
      2f343b0c
  16. 28 Jul, 2014 2 commits
  17. 15 Nov, 2011 2 commits
  18. 01 Jun, 2011 1 commit
  19. 31 May, 2011 1 commit
  20. 15 Dec, 2010 1 commit
    • Simon Marlow's avatar
      Implement stack chunks and separate TSO/STACK objects · f30d5273
      Simon Marlow authored
      This patch makes two changes to the way stacks are managed:
      
      1. The stack is now stored in a separate object from the TSO.
      
      This means that it is easier to replace the stack object for a thread
      when the stack overflows or underflows; we don't have to leave behind
      the old TSO as an indirection any more.  Consequently, we can remove
      ThreadRelocated and deRefTSO(), which were a pain.
      
      This is obviously the right thing, but the last time I tried to do it
      it made performance worse.  This time I seem to have cracked it.
      
      2. Stacks are now represented as a chain of chunks, rather than
         a single monolithic object.
      
      The big advantage here is that individual chunks are marked clean or
      dirty according to whether they contain pointers to the young
      generation, and the GC can avoid traversing clean stack chunks during
      a young-generation collection.  This means that programs with deep
      stacks will see a big saving in GC overhead when using the default GC
      settings.
      
      A secondary advantage is that there is much less copying involved as
      the stack grows.  Programs that quickly grow a deep stack will see big
      improvements.
      
      In some ways the implementation is simpler, as nothing special needs
      to be done to reclaim stack as the stack shrinks (the GC just recovers
      the dead stack chunks).  On the other hand, we have to manage stack
      underflow between chunks, so there's a new stack frame
      (UNDERFLOW_FRAME), and we now have separate TSO and STACK objects.
      The total amount of code is probably about the same as before.
      
      There are new RTS flags:
      
         -ki<size> Sets the initial thread stack size (default 1k)  Egs: -ki4k -ki2m
         -kc<size> Sets the stack chunk size (default 32k)
         -kb<size> Sets the stack chunk buffer size (default 1k)
      
      -ki was previously called just -k, and the old name is still accepted
      for backwards compatibility.  These new options are documented.
      f30d5273
  21. 01 Jul, 2010 1 commit
  22. 29 Mar, 2010 1 commit
    • Simon Marlow's avatar
      New implementation of BLACKHOLEs · 5d52d9b6
      Simon Marlow authored
      This replaces the global blackhole_queue with a clever scheme that
      enables us to queue up blocked threads on the closure that they are
      blocked on, while still avoiding atomic instructions in the common
      case.
      
      Advantages:
      
       - gets rid of a locked global data structure and some tricky GC code
         (replacing it with some per-thread data structures and different
         tricky GC code :)
      
       - wakeups are more prompt: parallel/concurrent performance should
         benefit.  I haven't seen anything dramatic in the parallel
         benchmarks so far, but a couple of threading benchmarks do improve
         a bit.
      
       - waking up a thread blocked on a blackhole is now O(1) (e.g. if
         it is the target of throwTo).
      
       - less sharing and better separation of Capabilities: communication
         is done with messages, the data structures are strictly owned by a
         Capability and cannot be modified except by sending messages.
      
       - this change will utlimately enable us to do more intelligent
         scheduling when threads block on each other.  This is what started
         off the whole thing, but it isn't done yet (#3838).
      
      I'll be documenting all this on the wiki in due course.
      5d52d9b6
  23. 28 Jan, 2010 1 commit
    • Simon Marlow's avatar
      tweak the totally-bogus arbitrary stack-squeezing heuristic to fix #2797 · 990171bf
      Simon Marlow authored
      In #2797, a program that ran in constant stack space when compiled
      needed linear stack space when interpreted.  It turned out to be
      nothing more than stack-squeezing not happening.  We have a heuristic
      to avoid stack-squeezing when it would be too expensive (shuffling a
      large amount of memory to save a few words), but in some cases even
      expensive stack-squeezing is necessary to avoid linear stack usage.
      One day we should implement stack chunks, which would make this less
      expensive.
      990171bf
  24. 31 Dec, 2009 1 commit
  25. 25 Nov, 2009 1 commit
    • Simon Marlow's avatar
      threadStackOverflow: check whether stack squeezing released some stack (#3677) · d2c874dc
      Simon Marlow authored
      In a stack overflow situation, stack squeezing may reduce the stack
      size, but we don't know whether it has been reduced enough for the
      stack check to succeed if we try again.  Fortunately stack squeezing
      is idempotent, so all we need to do is record whether *any* squeezing
      happened.  If we are at the stack's absolute -K limit, and stack
      squeezing happened, then we try running the thread again.
      
      We also want to avoid enlarging the stack if squeezing has already
      released some of it.  However, we don't want to get into a
      pathalogical situation where a thread has a nearly full stack (near
      its current limit, but not near the absolute -K limit), keeps
      allocating a little bit, squeezing removes a little bit, and then it
      runs again.  So to avoid this, if we squeezed *and* there is still
      less than BLOCK_SIZE_W words free, then we enlarge the stack anyway.
      d2c874dc
  26. 02 Aug, 2009 1 commit
    • Simon Marlow's avatar
      RTS tidyup sweep, first phase · a2a67cd5
      Simon Marlow authored
      The first phase of this tidyup is focussed on the header files, and in
      particular making sure we are exposinng publicly exactly what we need
      to, and no more.
      
       - Rts.h now includes everything that the RTS exposes publicly,
         rather than a random subset of it.
      
       - Most of the public header files have moved into subdirectories, and
         many of them have been renamed.  But clients should not need to
         include any of the other headers directly, just #include the main
         public headers: Rts.h, HsFFI.h, RtsAPI.h.
      
       - All the headers needed for via-C compilation have moved into the
         stg subdirectory, which is self-contained.  Most of the headers for
         the rest of the RTS APIs have moved into the rts subdirectory.
      
       - I left MachDeps.h where it is, because it is so widely used in
         Haskell code.
       
       - I left a deprecated stub for RtsFlags.h in place.  The flag
         structures are now exposed by Rts.h.
      
       - Various internal APIs are no longer exposed by public header files.
      
       - Various bits of dead code and declarations have been removed
      
       - More gcc warnings are turned on, and the RTS code is more
         warning-clean.
      
       - More source files #include "PosixSource.h", and hence only use
         standard POSIX (1003.1c-1995) interfaces.
      
      There is a lot more tidying up still to do, this is just the first
      pass.  I also intend to standardise the names for external RTS APIs
      (e.g use the rts_ prefix consistently), and declare the internal APIs
      as hidden for shared libraries.
      a2a67cd5
  27. 06 Jan, 2009 1 commit
  28. 19 Nov, 2008 1 commit
  29. 18 Nov, 2008 1 commit
    • Simon Marlow's avatar
      Add optional eager black-holing, with new flag -feager-blackholing · d600bf7a
      Simon Marlow authored
      Eager blackholing can improve parallel performance by reducing the
      chances that two threads perform the same computation.  However, it
      has a cost: one extra memory write per thunk entry.  
      
      To get the best results, any code which may be executed in parallel
      should be compiled with eager blackholing turned on.  But since
      there's a cost for sequential code, we make it optional and turn it on
      for the parallel package only.  It might be a good idea to compile
      applications (or modules) with parallel code in with
      -feager-blackholing.
      
      ToDo: document -feager-blackholing.
      d600bf7a
  30. 17 Nov, 2008 1 commit
    • Simon Marlow's avatar
      Fix #2783: detect black-hole loops properly · 0fa59deb
      Simon Marlow authored
      At some point we regressed on detecting simple black-hole loops.  This
      happened due to the introduction of duplicate-work detection for
      parallelism: a black-hole loop looks very much like duplicate work,
      except it's duplicate work being performed by the very same thread.
      So we have to detect and handle this case.
      0fa59deb
  31. 04 Jun, 2008 1 commit
  32. 07 Feb, 2008 1 commit
    • Simon Marlow's avatar
      Tweaks to stack squeezing · 53a442f1
      Simon Marlow authored
      1. We weren't squeezing two frames if one of them was a marked update
         frame.  This is easy to fix.
      
      2. The heuristic to decide whether to squeeze was a little
         conservative.  It's worth copying 3 words to save an update frame.
       
      53a442f1
  33. 06 Mar, 2007 1 commit
  34. 10 Nov, 2006 1 commit
  35. 24 Oct, 2006 1 commit
    • Simon Marlow's avatar
      Split GC.c, and move storage manager into sm/ directory · ab0e778c
      Simon Marlow authored
      In preparation for parallel GC, split up the monolithic GC.c file into
      smaller parts.  Also in this patch (and difficult to separate,
      unfortunatley):
        
        - Don't include Stable.h in Rts.h, instead just include it where
          necessary.
        
        - consistently use STATIC_INLINE in source files, and INLINE_HEADER
          in header files.  STATIC_INLINE is now turned off when DEBUG is on,
          to make debugging easier.
        
        - The GC no longer takes the get_roots function as an argument.
          We weren't making use of this generalisation.
      ab0e778c