1. 15 Jul, 2020 2 commits
  2. 28 Jun, 2020 1 commit
  3. 25 Jun, 2020 1 commit
  4. 01 Jun, 2020 2 commits
    • Ben Gamari's avatar
      nonmoving: Optimise log2_ceil · f945eea5
      Ben Gamari authored
      f945eea5
    • Daniel Gröber (dxld)'s avatar
      Cleanup OVERWRITING_CLOSURE logic · 2ee4f36c
      Daniel Gröber (dxld) authored
      The code is just more confusing than it needs to be. We don't need to mix
      the threaded check with the ldv profiling check since ldv's init already
      checks for this. Hence they can be two separate checks. Taking the sanity
      checking into account is also cleaner via DebugFlags.sanity. No need for
      checking the DEBUG define.
      
      The ZERO_SLOP_FOR_LDV_PROF and ZERO_SLOP_FOR_SANITY_CHECK definitions the
      old code had also make things a lot more opaque IMO so I removed those.
      2ee4f36c
  5. 13 May, 2020 1 commit
  6. 06 May, 2020 3 commits
    • Ben Gamari's avatar
      nonmoving: Fix handling of dirty objects · b2d72c75
      Ben Gamari authored
      Previously we (incorrectly) relied on failed_to_evac to be "precise".
      That is, we expected it to only be true if *all* of an object's fields
      lived outside of the non-moving heap. However, does not match the
      behavior of failed_to_evac, which is true if *any* of the object's
      fields weren't promoted (meaning that some others *may* live in the
      non-moving heap).
      
      This is problematic as we skip the non-moving write barrier for dirty
      objects (which we can only safely do if *all* fields point outside of
      the non-moving heap).
      
      Clearly this arises due to a fundamental difference in the behavior
      expected of failed_to_evac in the moving and non-moving collector.
      e.g., in the moving collector it is always safe to conservatively say
      failed_to_evac=true whereas in the non-moving collector the safe value
      is false.
      
      This issue went unnoticed as I never wrote down the dirtiness
      invariant enforced by the non-moving collector. We now define this
      invariant as
      
          An object being marked as dirty implies that all of its fields are
          on the mark queue (or, equivalently, update remembered set).
      
      To maintain this invariant we teach nonmovingScavengeOne to push the
      fields of objects which we fail to evacuate to the update remembered
      set. This is a simple and reasonably cheap solution and avoids the
      complexity and fragility that other, more strict alternative invariants
      would require.
      
      All of this is described in a new Note, Note [Dirty flags in the
      non-moving collector] in NonMoving.c.
      b2d72c75
    • Ben Gamari's avatar
      nonmoving: Fix incorrect failed_to_evac value during deadlock gc · 740b3b8d
      Ben Gamari authored
      Previously we would incorrectly set the failed_to_evac flag if we
      evacuated a value due to a deadlock GC. This would cause us to mark more
      things as dirty than strictly necessary. It also turned up a nasty but
      which I will fix next.
      740b3b8d
    • Ben Gamari's avatar
      rts: Zero block flags with -DZ · 420b957d
      Ben Gamari authored
      Block flags are very useful for determining the state of a block.
      However, some block allocator users don't touch them, leading to
      misleading values. Ensure that we zero then when zero-on-gc is set. This
      is safe and makes the flags more useful during debugging.
      420b957d
  7. 03 May, 2020 1 commit
  8. 01 May, 2020 3 commits
  9. 26 Apr, 2020 1 commit
  10. 15 Apr, 2020 5 commits
    • Daniel Gröber (dxld)'s avatar
      c3c0f662
    • Daniel Gröber (dxld)'s avatar
    • Daniel Gröber (dxld)'s avatar
      15fa9bd6
    • Daniel Gröber (dxld)'s avatar
      Zero out pinned block alignment slop when profiling · 41230e26
      Daniel Gröber (dxld) authored
      The heap profiler currently cannot traverse pinned blocks because of
      alignment slop. This used to just be a minor annoyance as the whole block
      is accounted into a special cost center rather than the respective object's
      CCS, cf. #7275. However for the new root profiler we would like to be able
      to visit _every_ closure on the heap. We need to do this so we can get rid
      of the current 'flip' bit hack in the heap traversal code.
      
      Since info pointers are always non-zero we can in principle skip all the
      slop in the profiler if we can rely on it being zeroed. This assumption
      caused problems in the past though, commit a586b33f ("rts: Correct
      handling of LARGE ARR_WORDS in LDV profiler"), part of !1118, tried to use
      the same trick for BF_LARGE objects but neglected to take into account that
      shrink*Array# functions don't ensure that slop is zeroed when not
      compiling with profiling.
      
      Later, commit 0c114c65 ("Handle large ARR_WORDS in heap census (fix
      as we will only be assuming slop is zeroed when profiling is on.
      
      This commit also reduces the ammount of slop we introduce in the first
      place by calculating the needed alignment before doing the allocation for
      small objects where we know the next available address. For large objects
      we don't know how much alignment we'll have to do yet since those details
      are hidden behind the allocateMightFail function so there we continue to
      allocate the maximum additional words we'll need to do the alignment.
      
      So we don't have to duplicate all this logic in the cmm code we pull it
      into the RTS allocatePinned function instead.
      
      Metric Decrease:
          T7257
          haddock.Cabal
          haddock.base
      41230e26
    • Ben Gamari's avatar
      rts: Don't mark evacuate_large as inline · 27cc2e7b
      Ben Gamari authored
      This function has two callsites and is quite large. GCC consequently
      decides not to inline and warns instead. Given the situation, I can't
      blame it. Let's just remove the inline specifier.
      27cc2e7b
  11. 09 Apr, 2020 1 commit
    • Ömer Sinan Ağacan's avatar
      Fix CNF handling in compacting GC · 39075176
      Ömer Sinan Ağacan authored
      Fixes #17937
      
      Previously compacting GC simply ignored CNFs. This is mostly fine as
      most (see "What about small compacts?" below) CNF objects don't have
      outgoing pointers, and are "large" (allocated in large blocks) and large
      objects are not moved or compacted.
      
      However if we do GC *during* sharing-preserving compaction then the CNF
      will have a hash table mapping objects that have been moved to the CNF
      to their location in the CNF, to be able to preserve sharing.
      
      This case is handled in the copying collector, in `scavenge_compact`,
      where we evacuate hash table entries and then rehash the table.
      
      Compacting GC ignored this case.
      
      We now visit CNFs in all generations when threading pointers to the
      compacted heap and thread hash table keys. A visited CNF is added to the
      list `nfdata_chain`. After compaction is done, we re-visit the CNFs in
      that list and rehash the tables.
      
      The overhead is minimal: the list is static in `Compact.c`, and link
      field is added to `StgCompactNFData` closure. Programs that don't use
      CNFs should not be affected.
      
      To test this CNF tests are now also run in a new way 'compacting_gc',
      which just passes `-c` to the RTS, enabling compacting GC for the oldest
      generation. Before this patch the result would be:
      
          Unexpected failures:
             compact_gc.run          compact_gc [bad exit code (139)] (compacting_gc)
             compact_huge_array.run  compact_huge_array [bad exit code (1)] (compacting_gc)
      
      With this patch all tests pass. I can also pass `-c -DS` without any
      failures.
      
      What about small compacts? Small CNFs are still not handled by the
      compacting GC. However so far I'm unable to write a test that triggers a
      runtime panic ("update_fwd: unknown/strange object") by allocating a
      small CNF in a compated heap. It's possible that I'm missing something
      and it's not possible to have a small CNF.
      
      NoFib Results:
      
      --------------------------------------------------------------------------------
              Program           Size    Allocs    Instrs     Reads    Writes
      --------------------------------------------------------------------------------
                   CS          +0.1%      0.0%      0.0%     +0.0%     +0.0%
                  CSD          +0.1%      0.0%      0.0%      0.0%      0.0%
                   FS          +0.1%      0.0%      0.0%      0.0%      0.0%
                    S          +0.1%      0.0%      0.0%      0.0%      0.0%
                   VS          +0.1%      0.0%      0.0%      0.0%      0.0%
                  VSD          +0.1%      0.0%     +0.0%     +0.0%     -0.0%
                  VSM          +0.1%      0.0%     +0.0%     -0.0%      0.0%
                 anna          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 ansi          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                 atom          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
               awards          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
               banner          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
           bernouilli          +0.1%      0.0%      0.0%     -0.0%     +0.0%
         binary-trees          +0.1%      0.0%     -0.0%     -0.0%      0.0%
                boyer          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
               boyer2          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                 bspt          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
            cacheprof          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
             calendar          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
             cichelli          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
              circsim          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
             clausify          +0.1%      0.0%     -0.0%     +0.0%     +0.0%
        comp_lab_zift          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
             compress          +0.1%      0.0%     +0.0%     +0.0%      0.0%
            compress2          +0.1%      0.0%     -0.0%      0.0%      0.0%
          constraints          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
         cryptarithm1          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
         cryptarithm2          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                  cse          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
         digits-of-e1          +0.1%      0.0%     +0.0%     -0.0%     -0.0%
         digits-of-e2          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
               dom-lt          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                eliza          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                event          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
          exact-reals          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
               exp3_8          +0.1%      0.0%     +0.0%     -0.0%      0.0%
               expert          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
       fannkuch-redux          +0.1%      0.0%     -0.0%      0.0%      0.0%
                fasta          +0.1%      0.0%     -0.0%     +0.0%     +0.0%
                  fem          +0.1%      0.0%     -0.0%     +0.0%      0.0%
                  fft          +0.1%      0.0%     -0.0%     +0.0%     +0.0%
                 fft2          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
             fibheaps          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                 fish          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                fluid          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
               fulsom          +0.1%      0.0%     -0.0%     +0.0%      0.0%
               gamteb          +0.1%      0.0%     +0.0%     +0.0%      0.0%
                  gcd          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
          gen_regexps          +0.1%      0.0%     -0.0%     +0.0%      0.0%
               genfft          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                   gg          +0.1%      0.0%      0.0%     +0.0%     +0.0%
                 grep          +0.1%      0.0%     -0.0%     +0.0%     +0.0%
               hidden          +0.1%      0.0%     +0.0%     -0.0%      0.0%
                  hpg          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
                  ida          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                infer          +0.1%      0.0%     +0.0%      0.0%     -0.0%
              integer          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
            integrate          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
         k-nucleotide          +0.1%      0.0%     +0.0%     +0.0%      0.0%
                kahan          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
              knights          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
               lambda          +0.1%      0.0%     +0.0%     +0.0%     -0.0%
           last-piece          +0.1%      0.0%     +0.0%      0.0%      0.0%
                 lcss          +0.1%      0.0%     +0.0%     +0.0%      0.0%
                 life          +0.1%      0.0%     -0.0%     +0.0%     +0.0%
                 lift          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
               linear          +0.1%      0.0%     -0.0%     +0.0%      0.0%
            listcompr          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
             listcopy          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
             maillist          +0.1%      0.0%     +0.0%     -0.0%     -0.0%
               mandel          +0.1%      0.0%     +0.0%     +0.0%      0.0%
              mandel2          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                 mate          +0.1%      0.0%     +0.0%      0.0%     +0.0%
              minimax          +0.1%      0.0%     -0.0%      0.0%     -0.0%
              mkhprog          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
           multiplier          +0.1%      0.0%     +0.0%      0.0%      0.0%
               n-body          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
             nucleic2          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                 para          +0.1%      0.0%      0.0%     +0.0%     +0.0%
            paraffins          +0.1%      0.0%     +0.0%     -0.0%      0.0%
               parser          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
              parstof          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                  pic          +0.1%      0.0%     -0.0%     -0.0%      0.0%
             pidigits          +0.1%      0.0%     +0.0%     -0.0%     -0.0%
                power          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
               pretty          +0.1%      0.0%     -0.0%     -0.0%     -0.1%
               primes          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
            primetest          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
               prolog          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
               puzzle          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
               queens          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
              reptile          +0.1%      0.0%     -0.0%     -0.0%     +0.0%
      reverse-complem          +0.1%      0.0%     +0.0%      0.0%     -0.0%
              rewrite          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
                 rfib          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                  rsa          +0.1%      0.0%     -0.0%     +0.0%     -0.0%
                  scc          +0.1%      0.0%     -0.0%     -0.0%     -0.1%
                sched          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                  scs          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
               simple          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
                solid          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
              sorting          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
        spectral-norm          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
               sphere          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
               symalg          +0.1%      0.0%     -0.0%     -0.0%     -0.0%
                  tak          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
            transform          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
             treejoin          +0.1%      0.0%     +0.0%     -0.0%     -0.0%
            typecheck          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
              veritas          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
                 wang          +0.1%      0.0%      0.0%     +0.0%     +0.0%
            wave4main          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
         wheel-sieve1          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
         wheel-sieve2          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
                 x2n1          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
      --------------------------------------------------------------------------------
                  Min          +0.0%      0.0%     -0.0%     -0.0%     -0.1%
                  Max          +0.1%      0.0%     +0.0%     +0.0%     +0.0%
       Geometric Mean          +0.1%     -0.0%     -0.0%     -0.0%     -0.0%
      
      Bumping numbers of nonsensical perf tests:
      
      Metric Increase:
          T12150
          T12234
          T12425
          T13035
          T5837
          T6048
      
      It's simply not possible for this patch to increase allocations, and
      I've wasted enough time on these test in the past (see #17686). I think
      these tests should not be perf tests, but for now I'll bump the numbers.
      39075176
  12. 02 Apr, 2020 1 commit
  13. 17 Mar, 2020 1 commit
    • Ömer Sinan Ağacan's avatar
      Update sanity checking for TSOs: · 92327e3a
      Ömer Sinan Ağacan authored
      - Remove an invalid assumption about GC checking what_next field. The GC
        doesn't care about what_next at all, if a TSO is reachable then all
        its pointers are followed (other than global_tso, which is only
        followed by compacting GC).
      
      - Remove checkSTACK in checkTSO: TSO stacks will be visited in
        checkHeapChain, or checkLargeObjects etc.
      
      - Add an assertion in checkTSO to check that the global_link field is
        sane.
      
      - Did some refactor to remove forward decls in checkGlobalTSOList and
        added braces around single-statement if statements.
      92327e3a
  14. 15 Mar, 2020 1 commit
    • Ömer Sinan Ağacan's avatar
      Fix global_link of TSOs for threads reachable via dead weaks · cfcc3c9a
      Ömer Sinan Ağacan authored
      Fixes #17785
      
      Here's how the problem occurs:
      
      - In generation 0 we have a TSO that is finished (i.e. it has no more
        work to do or it is killed).
      
      - The TSO only becomes reachable after collectDeadWeakPtrs().
      
      - After collectDeadWeakPtrs() we switch to WeakDone phase where we don't
        move TSOs to different lists anymore (like the next gen's thread list
        or the resurrected_threads list).
      
      - So the TSO will never be moved to a generation's thread list, but it
        will be promoted to generation 1.
      
      - Generation 1 collected via mark-compact, and because the TSO is
        reachable it is marked, and its `global_link` field, which is bogus at
        this point (because the TSO is not in a list), will be threaded.
      
      - Chaos ensues.
      
      In other words, when these conditions hold:
      
      - A TSO is reachable only after collectDeadWeakPtrs()
      - It's finished (what_next is ThreadComplete or ThreadKilled)
      - It's retained by mark-compact collector (moving collector doesn't
        evacuate the global_list field)
      
      We end up doing random mutations on the heap because the TSO's
      global_list field is not valid, but it still looks like a heap pointer
      so we thread it during compacting GC.
      
      The fix is simple: when we traverse old_threads lists to resurrect
      unreachable threads the threads that won't be resurrected currently
      stays on the old_threads lists. Those threads will never be visited
      again by MarkWeak so we now reset the global_list fields. This way
      compacting GC does not thread pointers to nowhere.
      
      Testing
      -------
      
      The reproducer in #17785 is quite large and hard to build, because of
      the dependencies, so I'm not adding a regression test.
      
      In my testing the reproducer would take a less than 5 seconds to run,
      and once in every ~5 runs would fail with a segfault or an assertion
      error. In other cases it also fails with a test failure. Because the
      tests never fail with the bug fix, assuming the code is correct, this
      also means that this bug can sometimes lead to incorrect runtime
      results.
      
      After the fix I was able to run the reproducer repeatedly for about an
      hour, with no runtime crashes or test failures.
      
      To run the reproducer clone the git repo:
      
          $ git clone https://github.com/osa1/streamly --branch ghc-segfault
      
      Then clone primitive and atomic-primops from their git repos and point
      to the clones in cabal.project.local. The project should then be
      buildable using GHC HEAD. Run the executable `properties` with `+RTS -c
      -DZ`.
      
      In addition to the reproducer above I run the test suite using:
      
          $ make slowtest EXTRA_HC_OPTS="-debug -with-rtsopts=-DS \
              -with-rtsopts=-c +RTS -c -RTS" SKIPWAY='nonmoving nonmoving_thr'
      
      This enables compacting GC always in both GHC when building the test
      programs and when running the test programs, and also enables sanity
      checking when running the test programs. These set of flags are not
      compatible for all tests so there are some failures, but I got the same
      set of failures with this patch compared to GHC HEAD.
      cfcc3c9a
  15. 14 Mar, 2020 2 commits
    • Ben Gamari's avatar
      nonmoving: Remove redundant bitmap clearing · fdfa2d01
      Ben Gamari authored
      nonmovingSweep already clears the bitmap in the sweep loop. There is no
      reason to do so a second time.
      fdfa2d01
    • Ben Gamari's avatar
      nonmoving: Don't traverse filled segment list in pause · 20d4d676
      Ben Gamari authored
      The non-moving collector would previously walk the entire filled segment
      list during the preparatory pause. However, this is far more work than
      is strictly necessary. We can rather get away with merely collecting the
      allocators' filled segment list heads and process the lists themselves
      during the concurrent phase. This can significantly reduce the maximum
      gen1 GC pause time in programs with high rates of long-lived allocations.
      20d4d676
  16. 11 Mar, 2020 1 commit
    • Ömer Sinan Ağacan's avatar
      Zero any slop after compaction in compacting GC · 3aa9b35f
      Ömer Sinan Ağacan authored
      In copying GC, with the relevant debug flags enabled, we release the old
      blocks after a GC, and the block allocator zeroes the space before
      releasing a block. This effectively zeros the old heap.
      
      In compacting GC we reuse the blocks and previously we didn't zero the
      unused space in a compacting generation after compaction. With this
      patch we zero the slop between the free pointer and the end of the block
      when we're done with compaction and when switching to a new block
      (because the current block doesn't have enough space for the next object
      we're shifting).
      3aa9b35f
  17. 09 Mar, 2020 1 commit
    • Ben Gamari's avatar
      nonmoving: Fix collection of sparks · 70d2b995
      Ben Gamari authored
      Previously sparks living in the non-moving heap would be promptly GC'd
      by the minor collector since pruneSparkQueue uses the BF_EVACUATED flag,
      which non-moving heap blocks do not have set.
      
      Fix this by implementing proper support in pruneSparkQueue for
      determining reachability in the non-moving heap. The story is told in
      Note [Spark management in the nonmoving heap].
      70d2b995
  18. 05 Mar, 2020 1 commit
  19. 29 Feb, 2020 1 commit
  20. 28 Feb, 2020 1 commit
    • Ben Gamari's avatar
      nonmoving: Fix marking in compact regions · f4b6b594
      Ben Gamari authored
      Previously we were tracing the object we were asked to mark, even if it
      lives in a compact region. However, there is no need to do this; we need
      only to mark the region itself as live.
      
      I have seen a segfault due to this due to the concurrent mark seeing a
      an object in the process of being compacted by the mutator.
      f4b6b594
  21. 08 Feb, 2020 1 commit
  22. 25 Jan, 2020 2 commits
    • PHO's avatar
      Fix rts allocateExec() on NetBSD · 8b726534
      PHO authored
      Similar to SELinux, NetBSD "PaX mprotect" prohibits marking a page
      mapping both writable and executable at the same time. Use libffi
      which knows how to work around it.
      8b726534
    • Ömer Sinan Ağacan's avatar
      Fix chaining tagged and untagged ptrs in compacting GC · 0e57d8a1
      Ömer Sinan Ağacan authored
      Currently compacting GC has the invariant that in a chain all fields are tagged
      the same. However this does not really hold: root pointers are not tagged, so
      when we thread a root we initialize a chain without a tag. When the pointed
      objects is evaluated and we have more pointers to it from the heap, we then add
      *tagged* fields to the chain (because pointers to it from the heap are tagged),
      ending up chaining fields with different tags (pointers from roots are NOT
      tagged, pointers from heap are). This breaks the invariant and as a result
      compacting GC turns tagged pointers into non-tagged.
      
      This later causes problem in the generated code where we do reads assuming that
      the pointer is aligned, e.g.
      
          0x7(%rax) -- assumes that pointer is tagged 1
      
      which causes misaligned reads. This caused #17088.
      
      We fix this using the "pointer tagging for large families" patch (#14373,
      !1742):
      
      - With the pointer tagging patch the GC can know what the tagged pointer to a
        CONSTR should be (previously we'd need to know the family size -- large
        families are always tagged 1, small families are tagged depending on the
        constructor).
      
      - Since we now know what the tags should be we no longer need to store the
        pointer tag in the info table pointers when forming chains in the compacting
        GC.
      
      As a result we no longer need to tag pointers in chains with 1/2 depending on
      whether the field points to an info table pointer, or to another field: an info
      table pointer is always tagged 0, everything else in the chain is tagged 1. The
      lost tags in pointers can be retrieved by looking at the info table.
      
      Finally, instead of using tag 1 for fields and tag 0 for info table pointers, we
      use two different tags for fields:
      
      - 1 for fields that have untagged pointers
      - 2 for fields that have tagged pointers
      
      When unchaining we then look at the pointer to a field, and depending on its tag
      we either leave a tagged pointer or an untagged pointer in the field.
      
      This allows chaining untagged and tagged fields together in compacting GC.
      
      Fixes #17088
      
      Nofib results
      -------------
      
      Binaries are smaller because of smaller `Compact.c` code.
      
      make mode=fast EXTRA_RUNTEST_OPTS="-cachegrind" EXTRA_HC_OPTS="-with-rtsopts=-c" NoFibRuns=1
      
          --------------------------------------------------------------------------------
                  Program           Size    Allocs    Instrs     Reads    Writes
          --------------------------------------------------------------------------------
                       CS          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                      CSD          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                       FS          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                        S          -0.3%      0.0%     +5.4%     +0.8%     +3.9%
                       VS          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                      VSD          -0.3%      0.0%     -0.0%     -0.0%     -0.2%
                      VSM          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                     anna          -0.1%      0.0%     +0.0%     +0.0%     +0.0%
                     ansi          -0.3%      0.0%     +0.1%     +0.0%     +0.0%
                     atom          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                   awards          -0.2%      0.0%     +0.0%      0.0%     -0.0%
                   banner          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
               bernouilli          -0.3%      0.0%     +0.1%     +0.0%     +0.0%
             binary-trees          -0.2%      0.0%     +0.0%      0.0%     +0.0%
                    boyer          -0.3%      0.0%     +0.2%     +0.0%     +0.0%
                   boyer2          -0.2%      0.0%     +0.2%     +0.1%     +0.0%
                     bspt          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                cacheprof          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                 calendar          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                 cichelli          -0.3%      0.0%     +1.1%     +0.2%     +0.5%
                  circsim          -0.2%      0.0%     +0.0%     -0.0%     -0.0%
                 clausify          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
            comp_lab_zift          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                 compress          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                compress2          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
              constraints          -0.3%      0.0%     +0.2%     +0.1%     +0.1%
             cryptarithm1          -0.3%      0.0%     +0.0%     -0.0%      0.0%
             cryptarithm2          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                      cse          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
             digits-of-e1          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
             digits-of-e2          -0.3%      0.0%     +0.0%     +0.0%     -0.0%
                   dom-lt          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                    eliza          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                    event          -0.3%      0.0%     +0.1%     +0.0%     -0.0%
              exact-reals          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                   exp3_8          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                   expert          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
           fannkuch-redux          -0.3%      0.0%     -0.0%     -0.0%     -0.0%
                    fasta          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                      fem          -0.2%      0.0%     +0.1%     +0.0%     +0.0%
                      fft          -0.2%      0.0%     +0.0%     -0.0%     -0.0%
                     fft2          -0.2%      0.0%     +0.0%     -0.0%     +0.0%
                 fibheaps          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                     fish          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                    fluid          -0.2%      0.0%     +0.4%     +0.1%     +0.1%
                   fulsom          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                   gamteb          -0.2%      0.0%     +0.1%     +0.0%     +0.0%
                      gcd          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
              gen_regexps          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                   genfft          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                       gg          -0.2%      0.0%     +0.7%     +0.3%     +0.2%
                     grep          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                   hidden          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                      hpg          -0.2%      0.0%     +0.1%     +0.0%     +0.0%
                      ida          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                    infer          -0.2%      0.0%     +0.0%     -0.0%     -0.0%
                  integer          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                integrate          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
             k-nucleotide          -0.2%      0.0%     +0.0%     +0.0%     -0.0%
                    kahan          -0.3%      0.0%     -0.0%     -0.0%     -0.0%
                  knights          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                   lambda          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
               last-piece          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                     lcss          -0.3%      0.0%     +0.0%     +0.0%      0.0%
                     life          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                     lift          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                   linear          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                listcompr          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                 listcopy          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                 maillist          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                   mandel          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                  mandel2          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                     mate          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                  minimax          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                  mkhprog          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
               multiplier          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                   n-body          -0.2%      0.0%     -0.0%     -0.0%     -0.0%
                 nucleic2          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                     para          -0.2%      0.0%     +0.0%     -0.0%     -0.0%
                paraffins          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                   parser          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                  parstof          -0.2%      0.0%     +0.8%     +0.2%     +0.2%
                      pic          -0.2%      0.0%     +0.1%     -0.1%     -0.1%
                 pidigits          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                    power          -0.2%      0.0%     +0.0%     -0.0%     -0.0%
                   pretty          -0.3%      0.0%     -0.0%     -0.0%     -0.1%
                   primes          -0.3%      0.0%     +0.0%     +0.0%     -0.0%
                primetest          -0.2%      0.0%     +0.0%     -0.0%     -0.0%
                   prolog          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                   puzzle          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                   queens          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                  reptile          -0.2%      0.0%     +0.2%     +0.1%     +0.0%
          reverse-complem          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                  rewrite          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                     rfib          -0.2%      0.0%     +0.0%     +0.0%     -0.0%
                      rsa          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                      scc          -0.3%      0.0%     -0.0%     -0.0%     -0.1%
                    sched          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                      scs          -0.2%      0.0%     +0.1%     +0.0%     +0.0%
                   simple          -0.2%      0.0%     +3.4%     +1.0%     +1.8%
                    solid          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                  sorting          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
            spectral-norm          -0.2%      0.0%     -0.0%     -0.0%     -0.0%
                   sphere          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                   symalg          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                      tak          -0.3%      0.0%     +0.0%     +0.0%     -0.0%
                transform          -0.2%      0.0%     +0.2%     +0.1%     +0.1%
                 treejoin          -0.3%      0.0%     +0.2%     -0.0%     -0.1%
                typecheck          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                  veritas          -0.1%      0.0%     +0.0%     +0.0%     +0.0%
                     wang          -0.2%      0.0%     +0.0%     -0.0%     -0.0%
                wave4main          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
             wheel-sieve1          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
             wheel-sieve2          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                     x2n1          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
          --------------------------------------------------------------------------------
                      Min          -0.3%      0.0%     -0.0%     -0.1%     -0.2%
                      Max          -0.1%      0.0%     +5.4%     +1.0%     +3.9%
           Geometric Mean          -0.3%     -0.0%     +0.1%     +0.0%     +0.1%
      
          --------------------------------------------------------------------------------
                  Program           Size    Allocs    Instrs     Reads    Writes
          --------------------------------------------------------------------------------
                  circsim          -0.2%      0.0%     +1.6%     +0.4%     +0.7%
              constraints          -0.3%      0.0%     +4.3%     +1.5%     +2.3%
                 fibheaps          -0.3%      0.0%     +3.5%     +1.2%     +1.3%
                   fulsom          -0.2%      0.0%     +3.6%     +1.2%     +1.8%
                 gc_bench          -0.3%      0.0%     +4.1%     +1.3%     +2.3%
                     hash          -0.3%      0.0%     +6.6%     +2.2%     +3.6%
                     lcss          -0.3%      0.0%     +0.7%     +0.2%     +0.7%
                mutstore1          -0.3%      0.0%     +4.8%     +1.4%     +2.8%
                mutstore2          -0.3%      0.0%     +3.4%     +1.0%     +1.7%
                    power          -0.2%      0.0%     +2.7%     +0.6%     +1.9%
               spellcheck          -0.3%      0.0%     +1.1%     +0.4%     +0.4%
          --------------------------------------------------------------------------------
                      Min          -0.3%      0.0%     +0.7%     +0.2%     +0.4%
                      Max          -0.2%      0.0%     +6.6%     +2.2%     +3.6%
           Geometric Mean          -0.3%     +0.0%     +3.3%     +1.0%     +1.8%
      
      Metric changes
      --------------
      
      While it sounds ridiculous, this change causes increased allocations in
      the following tests. We concluded that this change can't cause a
      difference in allocations and decided to land this patch. Fluctuations
      in "bytes allocated" metric is tracked in #17686.
      
      Metric Increase:
          Naperian
          T10547
          T12150
          T12234
          T12425
          T13035
          T5837
          T6048
      0e57d8a1
  23. 13 Jan, 2020 1 commit
  24. 09 Dec, 2019 1 commit
    • Gabor Greif's avatar
      Fix comment typos · d46a72e1
      Gabor Greif authored
      The below is only necessary to fix the CI perf fluke that
      happened in 9897e8c8:
      -------------------------
      Metric Decrease:
          T5837
          T6048
          T9020
          T12425
          T12234
          T13035
          T12150
          Naperian
      -------------------------
      d46a72e1
  25. 05 Dec, 2019 2 commits
    • Ben Gamari's avatar
      rts/NonMovingSweep: Fix locking of new mutable list allocation · a7a4efbf
      Ben Gamari authored
      Previously we used allocBlockOnNode_sync in nonmovingSweepMutLists
      despite the fact that we aren't in the GC and therefore the allocation
      spinlock isn't in use. This meant that sweep would end up spinning until
      the next minor GC, when the SM lock was moved away from the SM_MUTEX to
      the spinlock. This isn't a correctness issue but it sure isn't good for
      performance.
      
      Found thanks for Ward.
      
      Fixes #17539.
      a7a4efbf
    • Ben Gamari's avatar
      nonmoving: Clear segment bitmaps during sweep · 69001f54
      Ben Gamari authored
      Previously we would clear the bitmaps of segments which we are going to
      sweep during the preparatory pause. However, this is unnecessary: the
      existence of the mark epoch ensures that the sweep will correctly
      identify non-reachable objects, even if we do not clear the bitmap.
      
      We now defer clearing the bitmap to sweep, which happens concurrently
      with mutation.
      69001f54
  26. 02 Dec, 2019 1 commit
  27. 28 Nov, 2019 1 commit