1. 25 Jan, 2020 1 commit
    • Ömer Sinan Ağacan's avatar
      Fix chaining tagged and untagged ptrs in compacting GC · 0e57d8a1
      Ömer Sinan Ağacan authored
      Currently compacting GC has the invariant that in a chain all fields are tagged
      the same. However this does not really hold: root pointers are not tagged, so
      when we thread a root we initialize a chain without a tag. When the pointed
      objects is evaluated and we have more pointers to it from the heap, we then add
      *tagged* fields to the chain (because pointers to it from the heap are tagged),
      ending up chaining fields with different tags (pointers from roots are NOT
      tagged, pointers from heap are). This breaks the invariant and as a result
      compacting GC turns tagged pointers into non-tagged.
      
      This later causes problem in the generated code where we do reads assuming that
      the pointer is aligned, e.g.
      
          0x7(%rax) -- assumes that pointer is tagged 1
      
      which causes misaligned reads. This caused #17088.
      
      We fix this using the "pointer tagging for large families" patch (#14373,
      !1742):
      
      - With the pointer tagging patch the GC can know what the tagged pointer to a
        CONSTR should be (previously we'd need to know the family size -- large
        families are always tagged 1, small families are tagged depending on the
        constructor).
      
      - Since we now know what the tags should be we no longer need to store the
        pointer tag in the info table pointers when forming chains in the compacting
        GC.
      
      As a result we no longer need to tag pointers in chains with 1/2 depending on
      whether the field points to an info table pointer, or to another field: an info
      table pointer is always tagged 0, everything else in the chain is tagged 1. The
      lost tags in pointers can be retrieved by looking at the info table.
      
      Finally, instead of using tag 1 for fields and tag 0 for info table pointers, we
      use two different tags for fields:
      
      - 1 for fields that have untagged pointers
      - 2 for fields that have tagged pointers
      
      When unchaining we then look at the pointer to a field, and depending on its tag
      we either leave a tagged pointer or an untagged pointer in the field.
      
      This allows chaining untagged and tagged fields together in compacting GC.
      
      Fixes #17088
      
      Nofib results
      -------------
      
      Binaries are smaller because of smaller `Compact.c` code.
      
      make mode=fast EXTRA_RUNTEST_OPTS="-cachegrind" EXTRA_HC_OPTS="-with-rtsopts=-c" NoFibRuns=1
      
          --------------------------------------------------------------------------------
                  Program           Size    Allocs    Instrs     Reads    Writes
          --------------------------------------------------------------------------------
                       CS          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                      CSD          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                       FS          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                        S          -0.3%      0.0%     +5.4%     +0.8%     +3.9%
                       VS          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                      VSD          -0.3%      0.0%     -0.0%     -0.0%     -0.2%
                      VSM          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                     anna          -0.1%      0.0%     +0.0%     +0.0%     +0.0%
                     ansi          -0.3%      0.0%     +0.1%     +0.0%     +0.0%
                     atom          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                   awards          -0.2%      0.0%     +0.0%      0.0%     -0.0%
                   banner          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
               bernouilli          -0.3%      0.0%     +0.1%     +0.0%     +0.0%
             binary-trees          -0.2%      0.0%     +0.0%      0.0%     +0.0%
                    boyer          -0.3%      0.0%     +0.2%     +0.0%     +0.0%
                   boyer2          -0.2%      0.0%     +0.2%     +0.1%     +0.0%
                     bspt          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                cacheprof          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                 calendar          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                 cichelli          -0.3%      0.0%     +1.1%     +0.2%     +0.5%
                  circsim          -0.2%      0.0%     +0.0%     -0.0%     -0.0%
                 clausify          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
            comp_lab_zift          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                 compress          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                compress2          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
              constraints          -0.3%      0.0%     +0.2%     +0.1%     +0.1%
             cryptarithm1          -0.3%      0.0%     +0.0%     -0.0%      0.0%
             cryptarithm2          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                      cse          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
             digits-of-e1          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
             digits-of-e2          -0.3%      0.0%     +0.0%     +0.0%     -0.0%
                   dom-lt          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                    eliza          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                    event          -0.3%      0.0%     +0.1%     +0.0%     -0.0%
              exact-reals          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                   exp3_8          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                   expert          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
           fannkuch-redux          -0.3%      0.0%     -0.0%     -0.0%     -0.0%
                    fasta          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                      fem          -0.2%      0.0%     +0.1%     +0.0%     +0.0%
                      fft          -0.2%      0.0%     +0.0%     -0.0%     -0.0%
                     fft2          -0.2%      0.0%     +0.0%     -0.0%     +0.0%
                 fibheaps          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                     fish          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                    fluid          -0.2%      0.0%     +0.4%     +0.1%     +0.1%
                   fulsom          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                   gamteb          -0.2%      0.0%     +0.1%     +0.0%     +0.0%
                      gcd          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
              gen_regexps          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                   genfft          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                       gg          -0.2%      0.0%     +0.7%     +0.3%     +0.2%
                     grep          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                   hidden          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                      hpg          -0.2%      0.0%     +0.1%     +0.0%     +0.0%
                      ida          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                    infer          -0.2%      0.0%     +0.0%     -0.0%     -0.0%
                  integer          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                integrate          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
             k-nucleotide          -0.2%      0.0%     +0.0%     +0.0%     -0.0%
                    kahan          -0.3%      0.0%     -0.0%     -0.0%     -0.0%
                  knights          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                   lambda          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
               last-piece          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                     lcss          -0.3%      0.0%     +0.0%     +0.0%      0.0%
                     life          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                     lift          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                   linear          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                listcompr          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                 listcopy          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                 maillist          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                   mandel          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                  mandel2          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                     mate          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                  minimax          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                  mkhprog          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
               multiplier          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                   n-body          -0.2%      0.0%     -0.0%     -0.0%     -0.0%
                 nucleic2          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                     para          -0.2%      0.0%     +0.0%     -0.0%     -0.0%
                paraffins          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                   parser          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                  parstof          -0.2%      0.0%     +0.8%     +0.2%     +0.2%
                      pic          -0.2%      0.0%     +0.1%     -0.1%     -0.1%
                 pidigits          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                    power          -0.2%      0.0%     +0.0%     -0.0%     -0.0%
                   pretty          -0.3%      0.0%     -0.0%     -0.0%     -0.1%
                   primes          -0.3%      0.0%     +0.0%     +0.0%     -0.0%
                primetest          -0.2%      0.0%     +0.0%     -0.0%     -0.0%
                   prolog          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                   puzzle          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                   queens          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                  reptile          -0.2%      0.0%     +0.2%     +0.1%     +0.0%
          reverse-complem          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                  rewrite          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                     rfib          -0.2%      0.0%     +0.0%     +0.0%     -0.0%
                      rsa          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                      scc          -0.3%      0.0%     -0.0%     -0.0%     -0.1%
                    sched          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                      scs          -0.2%      0.0%     +0.1%     +0.0%     +0.0%
                   simple          -0.2%      0.0%     +3.4%     +1.0%     +1.8%
                    solid          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                  sorting          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
            spectral-norm          -0.2%      0.0%     -0.0%     -0.0%     -0.0%
                   sphere          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                   symalg          -0.2%      0.0%     +0.0%     +0.0%     +0.0%
                      tak          -0.3%      0.0%     +0.0%     +0.0%     -0.0%
                transform          -0.2%      0.0%     +0.2%     +0.1%     +0.1%
                 treejoin          -0.3%      0.0%     +0.2%     -0.0%     -0.1%
                typecheck          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
                  veritas          -0.1%      0.0%     +0.0%     +0.0%     +0.0%
                     wang          -0.2%      0.0%     +0.0%     -0.0%     -0.0%
                wave4main          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
             wheel-sieve1          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
             wheel-sieve2          -0.3%      0.0%     +0.0%     -0.0%     -0.0%
                     x2n1          -0.3%      0.0%     +0.0%     +0.0%     +0.0%
          --------------------------------------------------------------------------------
                      Min          -0.3%      0.0%     -0.0%     -0.1%     -0.2%
                      Max          -0.1%      0.0%     +5.4%     +1.0%     +3.9%
           Geometric Mean          -0.3%     -0.0%     +0.1%     +0.0%     +0.1%
      
          --------------------------------------------------------------------------------
                  Program           Size    Allocs    Instrs     Reads    Writes
          --------------------------------------------------------------------------------
                  circsim          -0.2%      0.0%     +1.6%     +0.4%     +0.7%
              constraints          -0.3%      0.0%     +4.3%     +1.5%     +2.3%
                 fibheaps          -0.3%      0.0%     +3.5%     +1.2%     +1.3%
                   fulsom          -0.2%      0.0%     +3.6%     +1.2%     +1.8%
                 gc_bench          -0.3%      0.0%     +4.1%     +1.3%     +2.3%
                     hash          -0.3%      0.0%     +6.6%     +2.2%     +3.6%
                     lcss          -0.3%      0.0%     +0.7%     +0.2%     +0.7%
                mutstore1          -0.3%      0.0%     +4.8%     +1.4%     +2.8%
                mutstore2          -0.3%      0.0%     +3.4%     +1.0%     +1.7%
                    power          -0.2%      0.0%     +2.7%     +0.6%     +1.9%
               spellcheck          -0.3%      0.0%     +1.1%     +0.4%     +0.4%
          --------------------------------------------------------------------------------
                      Min          -0.3%      0.0%     +0.7%     +0.2%     +0.4%
                      Max          -0.2%      0.0%     +6.6%     +2.2%     +3.6%
           Geometric Mean          -0.3%     +0.0%     +3.3%     +1.0%     +1.8%
      
      Metric changes
      --------------
      
      While it sounds ridiculous, this change causes increased allocations in
      the following tests. We concluded that this change can't cause a
      difference in allocations and decided to land this patch. Fluctuations
      in "bytes allocated" metric is tracked in #17686.
      
      Metric Increase:
          Naperian
          T10547
          T12150
          T12234
          T12425
          T13035
          T5837
          T6048
      0e57d8a1
  2. 23 Oct, 2019 1 commit
    • Ömer Sinan Ağacan's avatar
      Refactor Compact.c: · b521e8b6
      Ömer Sinan Ağacan authored
      - Remove forward declarations
      - Introduce UNTAG_PTR and GET_PTR_TAG for dealing with pointer tags
        without having to cast arguments to StgClosure*
      - Remove dead code
      - Use W_ instead of StgWord
      - Use P_ instead of StgPtr
      b521e8b6
  3. 21 Sep, 2019 1 commit
  4. 25 Mar, 2019 1 commit
    • Takenobu Tani's avatar
      Update Wiki URLs to point to GitLab · 3769e3a8
      Takenobu Tani authored
      This moves all URL references to Trac Wiki to their corresponding
      GitLab counterparts.
      
      This substitution is classified as follows:
      
      1. Automated substitution using sed with Ben's mapping rule [1]
          Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...
          New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...
      
      2. Manual substitution for URLs containing `#` index
          Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...#Zzz
          New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...#zzz
      
      3. Manual substitution for strings starting with `Commentary`
          Old: Commentary/XxxYyy...
          New: commentary/xxx-yyy...
      
      See also !539
      
      [1]: https://gitlab.haskell.org/bgamari/gitlab-migration/blob/master/wiki-mapping.json
      3769e3a8
  5. 06 Mar, 2019 2 commits
    • Ben Gamari's avatar
      Fix it · a4944d8d
      Ben Gamari authored
      a4944d8d
    • Ömer Sinan Ağacan's avatar
      rts: Unglobalize dead_weak_ptr_list and resurrected_threads · 5aab1d9c
      Ömer Sinan Ağacan authored
      In the concurrent nonmoving collector we will need the ability to call
      `traverseWeakPtrList` concurrently with minor generation collections.
      This global state stands in the way of this. However, refactoring it
      away is straightforward since this list only persists the length of a
      single GC.
      5aab1d9c
  6. 29 Aug, 2018 1 commit
    • David Feuer's avatar
      Finish stable split · f48e276a
      David Feuer authored
      Long ago, the stable name table and stable pointer tables were one.
      Now, they are separate, and have significantly different
      implementations. I believe the time has come to finish the split
      that began in #7674.
      
      * Divide `rts/Stable` into `rts/StableName` and `rts/StablePtr`.
      
      * Give each table its own mutex.
      
      * Add FFI functions `hs_lock_stable_ptr_table` and
      `hs_unlock_stable_ptr_table` and document them.
        These are intended to replace the previously undocumented
      `hs_lock_stable_tables` and `hs_lock_stable_tables`,
        which are now documented as deprecated synonyms.
      
      * Make `eqStableName#` use pointer equality instead of unnecessarily
      comparing stable name table indices.
      
      Reviewers: simonmar, bgamari, erikd
      
      Reviewed By: bgamari
      
      Subscribers: rwbarton, carter
      
      GHC Trac Issues: #15555
      
      Differential Revision: https://phabricator.haskell.org/D5084
      f48e276a
  7. 05 Jun, 2018 1 commit
    • Ömer Sinan Ağacan's avatar
      Rename some mutable closure types for consistency · 4075656e
      Ömer Sinan Ağacan authored
      SMALL_MUT_ARR_PTRS_FROZEN0 -> SMALL_MUT_ARR_PTRS_FROZEN_DIRTY
      SMALL_MUT_ARR_PTRS_FROZEN  -> SMALL_MUT_ARR_PTRS_FROZEN_CLEAN
      MUT_ARR_PTRS_FROZEN0       -> MUT_ARR_PTRS_FROZEN_DIRTY
      MUT_ARR_PTRS_FROZEN        -> MUT_ARR_PTRS_FROZEN_CLEAN
      
      Naming is now consistent with other CLEAR/DIRTY objects (MVAR, MUT_VAR,
      MUT_ARR_PTRS).
      
      (alternatively we could rename MVAR_DIRTY/MVAR_CLEAN etc. to MVAR0/MVAR)
      
      Removed a few comments in Scav.c about FROZEN0 being on the mut_list
      because it's now clear from the closure type.
      
      Reviewers: bgamari, simonmar, erikd
      
      Reviewed By: simonmar
      
      Subscribers: rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4784
      4075656e
  8. 16 May, 2018 1 commit
    • Simon Marlow's avatar
      Merge FUN_STATIC closure with its SRT · 838b6903
      Simon Marlow authored
      Summary:
      The idea here is to save a little code size and some work in the GC,
      by collapsing FUN_STATIC closures and their SRTs.
      
      This is (4) in a series; see D4632 for more details.
      
      There's a tradeoff here: more complexity in the compiler in exchange
      for a modest code size reduction (probably around 0.5%).
      
      Results:
      * GHC binary itself (statically linked) is 1% smaller
      * -0.2% binary sizes in nofib (-0.5% module sizes)
      
      Full nofib results comparing D4634 with this: P177 (ignore runtimes,
      these aren't stable on my laptop)
      
      Test Plan: validate, nofib
      
      Reviewers: bgamari, niteria, simonpj, erikd
      
      Subscribers: thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4637
      838b6903
  9. 29 Apr, 2017 1 commit
  10. 14 Nov, 2016 1 commit
    • Simon Marlow's avatar
      Remove CONSTR_STATIC · 55d535da
      Simon Marlow authored
      Summary:
      We currently have two info tables for a constructor
      
      * XXX_con_info: the info table for a heap-resident instance of the
        constructor, It has type CONSTR, or one of the specialised types like
        CONSTR_1_0
      
      * XXX_static_info: the info table for a static instance of this
        constructor, which has type CONSTR_STATIC or CONSTR_STATIC_NOCAF.
      
      I'm getting rid of the latter, and using the `con_info` info table for
      both static and dynamic constructors.  For rationale and more details
      see Note [static constructors] in SMRep.hs.
      
      I also removed these macros: `isSTATIC()`, `ip_STATIC()`,
      `closure_STATIC()`, since they relied on the CONSTR/CONSTR_STATIC
      distinction, and anyway HEAP_ALLOCED() does the same job.
      
      Test Plan: validate
      
      Reviewers: bgamari, simonpj, austin, gcampax, hvr, niteria, erikd
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2690
      
      GHC Trac Issues: #12455
      55d535da
  11. 20 Jul, 2016 1 commit
    • gcampax's avatar
      Compact Regions · cf989ffe
      gcampax authored
      This brings in initial support for compact regions, as described in the
      ICFP 2015 paper "Efficient Communication and Collection with Compact
      Normal Forms" (Edward Z. Yang et.al.) and implemented by Giovanni
      Campagna.
      
      Some things may change before the 8.2 release, but I (Simon M.) wanted
      to get the main patch committed so that we can iterate.
      
      What documentation there is is in the Data.Compact module in the new
      compact package.  We'll need to extend and polish the documentation
      before the release.
      
      Test Plan:
      validate
      (new test cases included)
      
      Reviewers: ezyang, simonmar, hvr, bgamari, austin
      
      Subscribers: vikraman, Yuras, RyanGlScott, qnikst, mboes, facundominguez, rrnewton, thomie, erikd
      
      Differential Revision: https://phabricator.haskell.org/D1264
      
      GHC Trac Issues: #11493
      cf989ffe
  12. 17 May, 2016 1 commit
    • Erik de Castro Lopo's avatar
      rts: More const correct-ness fixes · 33c029dd
      Erik de Castro Lopo authored
      In addition to more const-correctness fixes this patch fixes an
      infelicity of the previous const-correctness patch (995cf0f3) which
      left `UNTAG_CLOSURE` taking a `const StgClosure` pointer parameter
      but returning a non-const pointer. Here we restore the original type
      signature of `UNTAG_CLOSURE` and add a new function
      `UNTAG_CONST_CLOSURE` which takes and returns a const `StgClosure`
      pointer and uses that wherever possible.
      
      Test Plan: Validate on Linux, OS X and Windows
      
      Reviewers: Phyx, hsyl20, bgamari, austin, simonmar, trofi
      
      Reviewed By: simonmar, trofi
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2231
      33c029dd
  13. 23 Jan, 2016 1 commit
    • Joachim Breitner's avatar
      Remove unused IND_PERM · f42db157
      Joachim Breitner authored
      it seems that this closure type has not been in use since 5d52d9, so all
      this is dead and untested code. This removes it. Some of the code might
      be useful for a counting indirection as described in #10613, so when
      implementing that, have a look at what this commit removes.
      
      Test Plan: validate on harbormaster
      
      Reviewers: austin, bgamari, simonmar
      
      Reviewed By: simonmar
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D1821
      f42db157
  14. 11 Sep, 2015 1 commit
  15. 28 Jul, 2015 1 commit
    • Simon Marlow's avatar
      Eliminate zero_static_objects_list() · f83aab95
      Simon Marlow authored
      Summary:
      [Revised version of D1076 that was committed and then backed out]
      
      In a workload with a large amount of code, zero_static_objects_list()
      takes a significant amount of time, and furthermore it is in the
      single-threaded part of the GC.
      
      This patch uses a slightly fiddly scheme for marking objects on the
      static object lists, using a flag in the low 2 bits that flips between
      two states to indicate whether an object has been visited during this
      GC or not.  We also have to take into account objects that have not
      been visited yet, which might appear at any time due to runtime linking.
      
      Test Plan: validate
      
      Reviewers: austin, ezyang, rwbarton, bgamari, thomie
      
      Reviewed By: bgamari, thomie
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D1106
      f83aab95
  16. 27 Jul, 2015 1 commit
  17. 22 Jul, 2015 1 commit
    • Simon Marlow's avatar
      Eliminate zero_static_objects_list() · b949c96b
      Simon Marlow authored
      Summary:
      In a workload with a large amount of code, zero_static_objects_list()
      takes a significant amount of time, and furthermore it is in the
      single-threaded part of the GC.
      
      This patch uses a slightly fiddly scheme for marking objects on the
      static object lists, using a flag in the low 2 bits that flips between
      two states to indicate whether an object has been visited during this
      GC or not.  We also have to take into account objects that have not
      been visited yet, which might appear at any time due to runtime linking.
      
      Test Plan: validate
      
      Reviewers: austin, bgamari, ezyang, rwbarton
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D1076
      b949c96b
  18. 21 Oct, 2014 1 commit
  19. 29 Sep, 2014 1 commit
  20. 28 Jul, 2014 1 commit
  21. 29 Apr, 2014 2 commits
    • Arash Rouhani's avatar
      Rts: Reuse scavenge_small_bitmap (#8742) · 05fcc333
      Arash Rouhani authored
      The function was inlined at two places already. And the function is
      having the STATIC_INLINE annotation, so the assembly output should.
      be the same.
      
      To convince myself, I did diff the output of the object files before
      and after the patch and they matched on my 64-bit Ubuntu 13.10 machine,
      running gcc 4.8.1-10ubuntu9.
      
      Also, I had to move scavenge_small_bitmap up a bit since it's not in any
      .h-file.
      
      While I was at it, I also applied the analogous patch for Compact.c.
      Though I had to write `thread_small_bitmap` instead of just moving it.
      05fcc333
    • Arash Rouhani's avatar
      Rts: Consistently use StgWord for sizes of bitmaps · 43b3bab3
      Arash Rouhani authored
      A long debate is in issue #8742, but the main motivation is that this
      allows for applying a patch to reuse the function scavenge_small_bitmap
      without changing the .o-file output.
      
      Similarly, I changed the types in rts/sm/Compact.c, so I can create
      a STATIC_INLINE function for the redundant code block:
      
              while (size > 0) {
                  if ((bitmap & 1) == 0) {
                      thread((StgClosure **)p);
                  }
                  p++;
                  bitmap = bitmap >> 1;
                  size--;
              }
      43b3bab3
  22. 29 Mar, 2014 1 commit
    • tibbe's avatar
      Add SmallArray# and SmallMutableArray# types · 90329b6c
      tibbe authored
      These array types are smaller than Array# and MutableArray# and are
      faster when the array size is small, as they don't have the overhead
      of a card table. Having no card table reduces the closure size with 2
      words in the typical small array case and leads to less work when
      updating or GC:ing the array.
      
      Reduces both the runtime and memory allocation by 8.8% on my insert
      benchmark for the HashMap type in the unordered-containers package,
      which makes use of lots of small arrays. With tuned GC settings
      (i.e. `+RTS -A6M`) the runtime reduction is 15%.
      
      Fixes #8923.
      90329b6c
  23. 01 Oct, 2013 1 commit
  24. 04 Sep, 2013 1 commit
    • Simon Marlow's avatar
      Don't move Capabilities in setNumCapabilities (#8209) · aa779e09
      Simon Marlow authored
      We have various problems with reallocating the array of Capabilities,
      due to threads in waitForReturnCapability that are already holding a
      pointer to a Capability.
      
      Rather than add more locking to make this safer, I decided it would be
      easier to ensure that we never move the Capabilities at all.  The
      capabilities array is now an array of pointers to Capabaility.  There
      are extra indirections, but it rarely matters - we don't often access
      Capabilities via the array, normally we already have a pointer to
      one.  I ran the parallel benchmarks and didn't see any difference.
      aa779e09
  25. 09 Jul, 2013 1 commit
  26. 15 Jun, 2013 3 commits
  27. 14 Feb, 2013 1 commit
  28. 16 Nov, 2012 1 commit
    • Simon Marlow's avatar
      Add a write barrier for TVAR closures · 6d784c43
      Simon Marlow authored
      This improves GC performance when there are a lot of TVars in the
      heap.  For instance, a TChan with a lot of elements causes a massive
      GC drag without this patch.
      
      There's more to do - several other STM closure types don't have write
      barriers, so GC performance when there are a lot of threads blocked on
      STM isn't great.  But fixing the problem for TVar is a good start.
      6d784c43
  29. 08 Oct, 2012 1 commit
    • Simon Marlow's avatar
      Produce new-style Cmm from the Cmm parser · a7c0387d
      Simon Marlow authored
      The main change here is that the Cmm parser now allows high-level cmm
      code with argument-passing and function calls.  For example:
      
      foo ( gcptr a, bits32 b )
      {
        if (b > 0) {
           // we can make tail calls passing arguments:
           jump stg_ap_0_fast(a);
        }
      
        return (x,y);
      }
      
      More details on the new cmm syntax are in Note [Syntax of .cmm files]
      in CmmParse.y.
      
      The old syntax is still more-or-less supported for those occasional
      code fragments that really need to explicitly manipulate the stack.
      However there are a couple of differences: it is now obligatory to
      give a list of live GlobalRegs on every jump, e.g.
      
        jump %ENTRY_CODE(Sp(0)) [R1];
      
      Again, more details in Note [Syntax of .cmm files].
      
      I have rewritten most of the .cmm files in the RTS into the new
      syntax, except for AutoApply.cmm which is generated by the genapply
      program: this file could be generated in the new syntax instead and
      would probably be better off for it, but I ran out of enthusiasm.
      
      Some other changes in this batch:
      
       - The PrimOp calling convention is gone, primops now use the ordinary
         NativeNodeCall convention.  This means that primops and "foreign
         import prim" code must be written in high-level cmm, but they can
         now take more than 10 arguments.
      
       - CmmSink now does constant-folding (should fix #7219)
      
       - .cmm files now go through the cmmPipeline, and as a result we
         generate better code in many cases.  All the object files generated
         for the RTS .cmm files are now smaller.  Performance should be
         better too, but I haven't measured it yet.
      
       - RET_DYN frames are removed from the RTS, lots of code goes away
      
       - we now have some more canned GC points to cover unboxed-tuples with
         2-4 pointers, which will reduce code size a little.
      a7c0387d
  30. 07 Sep, 2012 1 commit
  31. 25 Aug, 2012 1 commit
    • ian@well-typed.com's avatar
      More CPP macros -> inline functions · 0ab537c5
      ian@well-typed.com authored
      All the wibble seem to have cancelled out, and (non-debug) object sizes
      are back to where they started.
      
      I'm not 100% sure that the types are optimal, but at least now the
      functions have types and we can fix them if necessary.
      0ab537c5
  32. 02 Mar, 2012 1 commit
    • Simon Marlow's avatar
      Drop the per-task timing stats, give a summary only (#5897) · 085c7fe5
      Simon Marlow authored
      We were keeping around the Task struct (216 bytes) for every worker we
      ever created, even though we only keep a maximum of 6 workers per
      Capability.  These Task structs accumulate and cause a space leak in
      programs that do lots of safe FFI calls; this patch frees the Task
      struct as soon as a worker exits.
      
      One reason we were keeping the Task structs around is because we print
      out per-Task timing stats in +RTS -s, but that isn't terribly useful.
      What is sometimes useful is knowing how *many* Tasks there were.  So
      now I'm printing a single-line summary, this is for the program in
      
        TASKS: 2001 (1 bound, 31 peak workers (2000 total), using -N1)
      
      So although we created 2k tasks overall, there were only 31 workers
      active at any one time (which is exactly what we expect: the program
      makes 30 safe FFI calls concurrently).
      
      This also gives an indication of how many capabilities were being
      used, which is handy if you use +RTS -N without an explicit number.
      085c7fe5
  33. 21 Nov, 2011 1 commit
  34. 11 Apr, 2011 1 commit
    • Simon Marlow's avatar
      Refactoring and tidy up · 1fb38442
      Simon Marlow authored
      This is a port of some of the changes from my private local-GC branch
      (which is still in darcs, I haven't converted it to git yet).  There
      are a couple of small functional differences in the GC stats: first,
      per-thread GC timings should now be more accurate, and secondly we now
      report average and maximum pause times. e.g. from minimax +RTS -N8 -s:
      
                                          Tot time (elapsed)  Avg pause  Max pause
        Gen  0      2755 colls,  2754 par   13.16s    0.93s     0.0003s    0.0150s
        Gen  1       769 colls,   769 par    3.71s    0.26s     0.0003s    0.0059s
      1fb38442
  35. 02 Feb, 2011 2 commits