Skip to content
Snippets Groups Projects
  1. Mar 23, 2020
  2. Mar 20, 2020
  3. Mar 19, 2020
  4. Mar 18, 2020
  5. Mar 17, 2020
  6. Mar 16, 2020
  7. Mar 15, 2020
  8. Mar 14, 2020
    • Ben Gamari's avatar
      fs.h: Add missing declarations on Windows · 35aab0f9
      Ben Gamari authored
      35aab0f9
    • Ömer Sinan Ağacan's avatar
      Fix global_link of TSOs for threads reachable via dead weaks · 91d1f25c
      Ömer Sinan Ağacan authored
      Fixes #17785
      
      Here's how the problem occurs:
      
      - In generation 0 we have a TSO that is finished (i.e. it has no more
        work to do or it is killed).
      
      - The TSO only becomes reachable after collectDeadWeakPtrs().
      
      - After collectDeadWeakPtrs() we switch to WeakDone phase where we don't
        move TSOs to different lists anymore (like the next gen's thread list
        or the resurrected_threads list).
      
      - So the TSO will never be moved to a generation's thread list, but it
        will be promoted to generation 1.
      
      - Generation 1 collected via mark-compact, and because the TSO is
        reachable it is marked, and its `global_link` field, which is bogus at
        this point (because the TSO is not in a list), will be threaded.
      
      - Chaos ensues.
      
      In other words, when these conditions hold:
      
      - A TSO is reachable only after collectDeadWeakPtrs()
      - It's finished (what_next is ThreadComplete or ThreadKilled)
      - It's retained by mark-compact collector (moving collector doesn't
        evacuate the global_list field)
      
      We end up doing random mutations on the heap because the TSO's
      global_list field is not valid, but it still looks like a heap pointer
      so we thread it during compacting GC.
      
      The fix is simple: when we traverse old_threads lists to resurrect
      unreachable threads the threads that won't be resurrected currently
      stays on the old_threads lists. Those threads will never be visited
      again by MarkWeak so we now reset the global_list fields. This way
      compacting GC does not thread pointers to nowhere.
      
      Testing
      -------
      
      The reproducer in #17785 is quite large and hard to build, because of
      the dependencies, so I'm not adding a regression test.
      
      In my testing the reproducer would take a less than 5 seconds to run,
      and once in every ~5 runs would fail with a segfault or an assertion
      error. In other cases it also fails with a test failure. Because the
      tests never fail with the bug fix, assuming the code is correct, this
      also means that this bug can sometimes lead to incorrect runtime
      results.
      
      After the fix I was able to run the reproducer repeatedly for about an
      hour, with no runtime crashes or test failures.
      
      To run the reproducer clone the git repo:
      
          $ git clone https://github.com/osa1/streamly --branch ghc-segfault
      
      Then clone primitive and atomic-primops from their git repos and point
      to the clones in cabal.project.local. The project should then be
      buildable using GHC HEAD. Run the executable `properties` with `+RTS -c
      -DZ`.
      
      In addition to the reproducer above I run the test suite using:
      
          $ make slowtest EXTRA_HC_OPTS="-debug -with-rtsopts=-DS \
              -with-rtsopts=-c +RTS -c -RTS" SKIPWAY='nonmoving nonmoving_thr'
      
      This enables compacting GC always in both GHC when building the test
      programs and when running the test programs, and also enables sanity
      checking when running the test programs. These set of flags are not
      compatible for all tests so there are some failures, but I got the same
      set of failures with this patch compared to GHC HEAD.
      
      (cherry picked from commit 2e4d572e)
      91d1f25c
    • Ben Gamari's avatar
      05cc8b19
    • Ben Gamari's avatar
      Bump process submodule · e03bae3c
      Ben Gamari authored
      Avoid unreachable case alternative warning on Windows.
      e03bae3c
  9. Mar 13, 2020
  10. Mar 12, 2020
  11. Mar 11, 2020
  12. Mar 10, 2020
  13. Mar 09, 2020
  14. Mar 04, 2020
    • Ben Gamari's avatar
      nonmoving: Fix collection of sparks · 92bc3688
      Ben Gamari authored
      Previously sparks living in the non-moving heap would be promptly GC'd
      by the minor collector since pruneSparkQueue uses the BF_EVACUATED flag,
      which non-moving heap blocks do not have set.
      
      Fix this by implementing proper support in pruneSparkQueue for
      determining reachability in the non-moving heap. The story is told in
      Note [Spark management in the nonmoving heap].
      92bc3688
    • Ben Gamari's avatar
      nonmoving: Don't traverse filled segment list in pause · 2bf7b5b5
      Ben Gamari authored
      The non-moving collector would previously walk the entire filled segment
      list during the preparatory pause. However, this is far more work than
      is strictly necessary. We can rather get away with merely collecting the
      allocators' filled segment list heads and process the lists themselves
      during the concurrent phase. This can significantly reduce the maximum
      gen1 GC pause time in programs with high rates of long-lived allocations.
      
      (cherry picked from commit 927b7a3d)
      2bf7b5b5
    • Ben Gamari's avatar
      nonmoving: Clear segment bitmaps during sweep · a9dcac04
      Ben Gamari authored
      Previously we would clear the bitmaps of segments which we are going to
      sweep during the preparatory pause. However, this is unnecessary: the
      existence of the mark epoch ensures that the sweep will correctly
      identify non-reachable objects, even if we do not clear the bitmap.
      
      We now defer clearing the bitmap to sweep, which happens concurrently
      with mutation.
      
      (cherry picked from commit 69001f54)
      a9dcac04
    • Ben Gamari's avatar
      nonmoving: Fix marking in compact regions · ea7ff702
      Ben Gamari authored
      Previously we were tracing the object we were asked to mark, even if it
      lives in a compact region. However, there is no need to do this; we need
      only to mark the region itself as live.
      
      I have seen a segfault due to this due to the concurrent mark seeing a
      an object in the process of being compacted by the mutator.
      
      (cherry picked from commit e4e9a7ba)
      ea7ff702
  15. Feb 25, 2020
Loading