This project is mirrored from https://gitlab.haskell.org/ghc/ghc.git. Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts, and can be resumed by a project maintainer.
Last successful update .
  1. 19 Oct, 2019 1 commit
  2. 18 Oct, 2019 1 commit
    • Ben Gamari's avatar
      rts: Implement concurrent collection in the nonmoving collector · d7017446
      Ben Gamari authored
      This extends the non-moving collector to allow concurrent collection.
      
      The full design of the collector implemented here is described in detail
      in a technical note
      
          B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
          Compiler" (2018)
      
      This extension involves the introduction of a capability-local
      remembered set, known as the /update remembered set/, which tracks
      objects which may no longer be visible to the collector due to mutation.
      To maintain this remembered set we introduce a write barrier on
      mutations which is enabled while a concurrent mark is underway.
      
      The update remembered set representation is similar to that of the
      nonmoving mark queue, being a chunked array of `MarkEntry`s. Each
      `Capability` maintains a single accumulator chunk, which it flushed
      when it (a) is filled, or (b) when the nonmoving collector enters its
      post-mark synchronization phase.
      
      While the write barrier touches a significant amount of code it is
      conceptually straightforward: the mutator must ensure that the referee
      of any pointer it overwrites is added to the update remembered set.
      However, there are a few details:
      
       * In the case of objects with a dirty flag (e.g. `MVar`s) we can
         exploit the fact that only the *first* mutation requires a write
         barrier.
      
       * Weak references, as usual, complicate things. In particular, we must
         ensure that the referee of a weak object is marked if dereferenced by
         the mutator. For this we (unfortunately) must introduce a read
         barrier, as described in Note [Concurrent read barrier on deRefWeak#]
         (in `NonMovingMark.c`).
      
       * Stable names are also a bit tricky as described in Note [Sweeping
         stable names in the concurrent collector] (`NonMovingSweep.c`).
      
      We take quite some pains to ensure that the high thread count often seen
      in parallel Haskell applications doesn't affect pause times. To this end
      we allow thread stacks to be marked either by the thread itself (when it
      is executed or stack-underflows) or the concurrent mark thread (if the
      thread owning the stack is never scheduled). There is a non-trivial
      handshake to ensure that this happens without racing which is described
      in Note [StgStack dirtiness flags and concurrent marking].
      Co-Authored-by: Ömer Sinan Ağacan's avatarÖmer Sinan Ağacan <omer@well-typed.com>
      d7017446
  3. 17 Sep, 2019 1 commit
  4. 28 Jun, 2019 1 commit
    • Travis Whitaker's avatar
      Correct closure observation, construction, and mutation on weak memory machines. · 11bac115
      Travis Whitaker authored
      Here the following changes are introduced:
          - A read barrier machine op is added to Cmm.
          - The order in which a closure's fields are read and written is changed.
          - Memory barriers are added to RTS code to ensure correctness on
            out-or-order machines with weak memory ordering.
      
      Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this
      is lowered to an instruction that ensures memory reads that occur after said
      instruction in program order are not performed before reads coming before said
      instruction in program order. On machines with strong memory ordering properties
      (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so
      MO_ReadBarrier is simply erased. However, such an instruction is necessary on
      weakly ordered machines, e.g. ARM and PowerPC.
      
      Weam memory ordering has consequences for how closures are observed and mutated.
      For example, consider a closure that needs to be updated to an indirection. In
      order for the indirection to be safe for concurrent observers to enter, said
      observers must read the indirection's info table before they read the
      indirectee. Furthermore, the entering observer makes assumptions about the
      closure based on its info table contents, e.g. an INFO_TYPE of IND imples the
      closure has an indirectee pointer that is safe to follow.
      
      When a closure is updated with an indirection, both its info table and its
      indirectee must be written. With weak memory ordering, these two writes can be
      arbitrarily reordered, and perhaps even interleaved with other threads' reads
      and writes (in the absence of memory barrier instructions). Consider this
      example of a bad reordering:
      
      - An updater writes to a closure's info table (INFO_TYPE is now IND).
      - A concurrent observer branches upon reading the closure's INFO_TYPE as IND.
      - A concurrent observer reads the closure's indirectee and enters it. (!!!)
      - An updater writes the closure's indirectee.
      
      Here the update to the indirectee comes too late and the concurrent observer has
      jumped off into the abyss. Speculative execution can also cause us issues,
      consider:
      
      - An observer is about to case on a value in closure's info table.
      - The observer speculatively reads one or more of closure's fields.
      - An updater writes to closure's info table.
      - The observer takes a branch based on the new info table value, but with the
        old closure fields!
      - The updater writes to the closure's other fields, but its too late.
      
      Because of these effects, reads and writes to a closure's info table must be
      ordered carefully with respect to reads and writes to the closure's other
      fields, and memory barriers must be placed to ensure that reads and writes occur
      in program order. Specifically, updates to a closure must follow the following
      pattern:
      
      - Update the closure's (non-info table) fields.
      - Write barrier.
      - Update the closure's info table.
      
      Observing a closure's fields must follow the following pattern:
      
      - Read the closure's info pointer.
      - Read barrier.
      - Read the closure's (non-info table) fields.
      
      This patch updates RTS code to obey this pattern. This should fix long-standing
      SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting
      out-of-order execution) and PowerPC. This fixes issue #15449.
      Co-Authored-By: Ben Gamari's avatarBen Gamari <ben@well-typed.com>
      11bac115
  5. 08 May, 2019 1 commit
  6. 25 Mar, 2019 1 commit
    • Takenobu Tani's avatar
      Update Wiki URLs to point to GitLab · 3769e3a8
      Takenobu Tani authored
      This moves all URL references to Trac Wiki to their corresponding
      GitLab counterparts.
      
      This substitution is classified as follows:
      
      1. Automated substitution using sed with Ben's mapping rule [1]
          Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...
          New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...
      
      2. Manual substitution for URLs containing `#` index
          Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...#Zzz
          New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...#zzz
      
      3. Manual substitution for strings starting with `Commentary`
          Old: Commentary/XxxYyy...
          New: commentary/xxx-yyy...
      
      See also !539
      
      [1]: https://gitlab.haskell.org/bgamari/gitlab-migration/blob/master/wiki-mapping.json
      3769e3a8
  7. 21 Sep, 2018 1 commit
    • Ömer Sinan Ağacan's avatar
      Fix slop zeroing for AP_STACK eager blackholes in debug build · 66c17293
      Ömer Sinan Ağacan authored
      As #15571 reports, eager blackholing breaks sanity checks as we can't
      zero the payload when eagerly blackholing (because we'll be using the
      payload after blackholing), but by the time we blackhole a previously
      eagerly blackholed object (in `threadPaused()`) we don't have the
      correct size information for the object (because the object's type
      becomes BLACKHOLE when we eagerly blackhole it) so can't properly zero
      the slop.
      
      This problem can be solved for AP_STACK eager blackholing (which unlike
      eager blackholing in general, is not optional) by zeroing the payload
      after entering the stack. This patch implements this idea.
      
      Fixes #15571.
      
      Test Plan:
      Previously concprog001 when compiled and run with sanity checks
      
          ghc-stage2 Mult.hs -debug -rtsopts
          ./Mult +RTS -DS
      
      was failing with
      
          Mult: internal error: checkClosure: stack frame
              (GHC version 8.7.20180821 for x86_64_unknown_linux)
              Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
      
      thic patch fixes this panic. The test still panics, but it runs for a while
      before panicking (instead of directly panicking as before), and the new problem
      seems unrelated:
      
          Mult: internal error: ASSERTION FAILED: file rts/sm/Sanity.c, line 296
              (GHC version 8.7.20180919 for x86_64_unknown_linux)
              Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
      
      The new problem will be fixed in another diff.
      
      I also tried slow validate (which requires D5164): this does not introduce any
      new failures.
      
      Reviewers: simonmar, bgamari, erikd
      
      Reviewed By: simonmar
      
      Subscribers: rwbarton, carter
      
      GHC Trac Issues: #15571
      
      Differential Revision: https://phabricator.haskell.org/D5165
      66c17293
  8. 17 Jun, 2018 1 commit
    • Ömer Sinan Ağacan's avatar
      Use __FILE__ for Cmm assertion locations, fix #8619 · 008ea12d
      Ömer Sinan Ağacan authored
      It seems like we currently support string literals in Cmm, so we can use
      __LINE__ CPP macro in assertion macros. This improves error messages
      that previously looked like
      
          ASSERTION FAILED: file (null), line 1302
      
      (null) part now shows the actual file name.
      
      Also inline some single-use string literals in PrimOps.cmm.
      
      Reviewers: bgamari, simonmar, erikd
      
      Reviewed By: bgamari
      
      Subscribers: rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4862
      008ea12d
  9. 02 Jun, 2018 1 commit
  10. 19 Mar, 2018 1 commit
    • Ben Gamari's avatar
      Improve accuracy of get/setAllocationCounter · 20cbb016
      Ben Gamari authored
      Summary:
      get/setAllocationCounter didn't take into account allocations in the
      current block. This was known at the time, but it turns out to be
      important to have more accuracy when using these in a fine-grained
      way.
      
      Test Plan:
      New unit test to test incrementally larger allocaitons.  Before I got
      results like this:
      
      ```
      +0
      +0
      +0
      +0
      +0
      +4096
      +0
      +0
      +0
      +0
      +0
      +4064
      +0
      +0
      +4088
      +4056
      +0
      +0
      +0
      +4088
      +4096
      +4056
      +4096
      ```
      
      Notice how the results aren't always monotonically increasing.  After
      this patch:
      
      ```
      +344
      +416
      +488
      +560
      +632
      +704
      +776
      +848
      +920
      +992
      +1064
      +1136
      +1208
      +1280
      +1352
      +1424
      +1496
      +1568
      +1640
      +1712
      +1784
      +1856
      +1928
      +2000
      +2072
      +2144
      ```
      
      Reviewers: hvr, erikd, simonmar, jrtc27, trommler
      
      Reviewed By: simonmar
      
      Subscribers: trommler, jrtc27, rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4363
      20cbb016
  11. 29 Apr, 2017 1 commit
  12. 28 Apr, 2017 1 commit
    • Ben Gamari's avatar
      Use memcpy in cloneArray · 228d4670
      Ben Gamari authored
      While looking at #13615 I noticed that there was this strange open-coded
      memcpy in the definition of the cloneArray macro. I don't see why this
      should be preferable to memcpy.
      
      Test Plan: Validate, particularly focusing on array operations
      
      Reviewers: simonmar, tibbe, austin, alexbiehl
      
      Reviewed By: tibbe, alexbiehl
      
      Subscribers: alexbiehl, rwbarton, thomie
      
      Differential Revision: https://phabricator.haskell.org/D3504
      228d4670
  13. 23 Apr, 2017 1 commit
  14. 15 Dec, 2016 1 commit
    • Simon Marlow's avatar
      Fix cost-centre-stacks bug (#5654) · 394231b3
      Simon Marlow authored
      This fixes some cases of wrong stacks being generated by the profiler.
      For background and details on the fix see
      `Note [Evaluating functions with profiling]` in `rts/Apply.cmm`.
      
      This does have an impact on allocations for some programs when
      profiling.  nofib results:
      
      ```
         k-nucleotide          +0.0%     +8.8%    +11.0%    +11.0%      0.0%
               puzzle          +0.0%    +12.5%     0.244     0.246      0.0%
            typecheck           0.0%     +8.7%    +16.1%    +16.2%      0.0%
      ------------------------------------------------------------------------
      --------
                  Min          -0.0%     -0.0%    -34.4%    -35.5%    -25.0%
                  Max          +0.0%    +12.5%    +48.9%    +49.4%    +10.6%
       Geometric Mean          +0.0%     +0.6%     +2.0%     +1.8%     -0.3%
      
      ```
      
      But runtimes don't seem to be affected much, and the examples I looked
      at were completely legitimate.  For example, in puzzle we have this:
      
      ```
      position :: ItemType -> StateType ->  BankType
      position Bono = bonoPos
      position Edge = edgePos
      position Larry = larryPos
      position Adam = adamPos
      ```
      
      where the identifiers on the rhs are all record selectors.  Previously
      the profiler gave a stack that looked like
      
      ```
        position
        bonoPos
        ...
      ```
      
      i.e. `bonoPos` was at the same level of the call stack as `position`,
      but now it looks like
      
      ```
        position
         bonoPos
         ...
      ```
      
      I used the normaliser from the testsuite to diff the profiling output
      from other nofib programs and they all looked better.
      
      Test Plan:
      * the broken test passes
      * validate
      * compiled and ran all of nofib, measured perf, diff'd several .prof
      files
      
      Reviewers: niteria, erikd, austin, scpmw, bgamari
      
      Reviewed By: bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2804
      
      GHC Trac Issues: #5654, #10007
      394231b3
  15. 29 Nov, 2016 1 commit
  16. 01 Aug, 2016 1 commit
  17. 10 Jun, 2016 1 commit
    • Simon Marlow's avatar
      NUMA support · 9e5ea67e
      Simon Marlow authored
      Summary:
      The aim here is to reduce the number of remote memory accesses on
      systems with a NUMA memory architecture, typically multi-socket servers.
      
      Linux provides a NUMA API for doing two things:
      * Allocating memory local to a particular node
      * Binding a thread to a particular node
      
      When given the +RTS --numa flag, the runtime will
      * Determine the number of NUMA nodes (N) by querying the OS
      * Assign capabilities to nodes, so cap C is on node C%N
      * Bind worker threads on a capability to the correct node
      * Keep a separate free lists in the block layer for each node
      * Allocate the nursery for a capability from node-local memory
      * Allocate blocks in the GC from node-local memory
      
      For example, using nofib/parallel/queens on a 24-core 2-socket machine:
      
      ```
      $ ./Main 15 +RTS -N24 -s -A64m
        Total   time  173.960s  (  7.467s elapsed)
      
      $ ./Main 15 +RTS -N24 -s -A64m --numa
        Total   time  150.836s  (  6.423s elapsed)
      ```
      
      The biggest win here is expected to be allocating from node-local
      memory, so that means programs using a large -A value (as here).
      
      According to perf, on this program the number of remote memory accesses
      were reduced by more than 50% by using `--numa`.
      
      Test Plan:
      * validate
      * There's a new flag --debug-numa=<n> that pretends to do NUMA without
        actually making the OS calls, which is useful for testing the code
        on non-NUMA systems.
      * TODO: I need to add some unit tests
      
      Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2199
      9e5ea67e
  18. 23 Jan, 2016 1 commit
    • Joachim Breitner's avatar
      Remove unused IND_PERM · f42db157
      Joachim Breitner authored
      it seems that this closure type has not been in use since 5d52d9, so all
      this is dead and untested code. This removes it. Some of the code might
      be useful for a counting indirection as described in #10613, so when
      implementing that, have a look at what this commit removes.
      
      Test Plan: validate on harbormaster
      
      Reviewers: austin, bgamari, simonmar
      
      Reviewed By: simonmar
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D1821
      f42db157
  19. 11 Sep, 2015 1 commit
  20. 16 Jun, 2015 1 commit
  21. 20 Oct, 2014 1 commit
  22. 02 Oct, 2014 1 commit
    • Edward Z. Yang's avatar
      Rename _closure to _static_closure, apply naming consistently. · 35672072
      Edward Z. Yang authored
      Summary:
      In preparation for indirecting all references to closures,
      we rename _closure to _static_closure to ensure any old code
      will get an undefined symbol error.  In order to reference
      a closure foobar_closure (which is now undefined), you should instead
      use STATIC_CLOSURE(foobar).  For convenience, a number of these
      old identifiers are macro'd.
      
      Across C-- and C (Windows and otherwise), there were differing
      conventions on whether or not foobar_closure or &foobar_closure
      was the address of the closure.  Now, all foobar_closure references
      are addresses, and no & is necessary.
      
      CHARLIKE/INTLIKE were not changed, simply alpha-renamed.
      
      Part of remove HEAP_ALLOCED patch set (#8199)
      
      Depends on D265
      Signed-off-by: Edward Z. Yang's avatarEdward Z. Yang <ezyang@mit.edu>
      
      Test Plan: validate
      
      Reviewers: simonmar, austin
      
      Subscribers: simonmar, ezyang, carter, thomie
      
      Differential Revision: https://phabricator.haskell.org/D267
      
      GHC Trac Issues: #8199
      35672072
  23. 20 Aug, 2014 1 commit
  24. 16 Aug, 2014 1 commit
    • Herbert Valerio Riedel's avatar
      Implement {resize,shrink}MutableByteArray# primops · 246436f1
      Herbert Valerio Riedel authored
      The two new primops with the type-signatures
      
        resizeMutableByteArray# :: MutableByteArray# s -> Int#
                                -> State# s -> (# State# s, MutableByteArray# s #)
      
        shrinkMutableByteArray# :: MutableByteArray# s -> Int#
                                -> State# s -> State# s
      
      allow to resize MutableByteArray#s in-place (when possible), and are useful
      for algorithms where memory is temporarily over-allocated. The motivating
      use-case is for implementing integer backends, where the final target size of
      the result is either N or N+1, and only known after the operation has been
      performed.
      
      A future commit will implement a stateful variant of the
      `sizeofMutableByteArray#` operation (see #9447 for details), since now the
      size of a `MutableByteArray#` may change over its lifetime (i.e before
      it gets frozen or GCed).
      
      Test Plan: ./validate --slow
      
      Reviewers: ezyang, austin, simonmar
      
      Reviewed By: austin, simonmar
      
      Differential Revision: https://phabricator.haskell.org/D133
      246436f1
  25. 29 Mar, 2014 1 commit
    • tibbe's avatar
      Add SmallArray# and SmallMutableArray# types · 90329b6c
      tibbe authored
      These array types are smaller than Array# and MutableArray# and are
      faster when the array size is small, as they don't have the overhead
      of a card table. Having no card table reduces the closure size with 2
      words in the typical small array case and leads to less work when
      updating or GC:ing the array.
      
      Reduces both the runtime and memory allocation by 8.8% on my insert
      benchmark for the HashMap type in the unordered-containers package,
      which makes use of lots of small arrays. With tuned GC settings
      (i.e. `+RTS -A6M`) the runtime reduction is 15%.
      
      Fixes #8923.
      90329b6c
  26. 28 Mar, 2014 1 commit
    • tibbe's avatar
      Make copy array ops out-of-line by default · e54828bf
      tibbe authored
      This should reduce code size when there's little to gain from inlining
      these primops, while still retaining the inlining benefit when the
      size of the copy is known statically.
      e54828bf
  27. 22 Mar, 2014 1 commit
    • tibbe's avatar
      codeGen: inline allocation optimization for clone array primops · 1eece456
      tibbe authored
      The inline allocation version is 69% faster than the out-of-line
      version, when cloning an array of 16 unit elements on a 64-bit
      machine.
      
      Comparing the new and the old primop implementations isn't
      straightforward. The old version had a missing heap check that I
      discovered during the development of the new version. Comparing the
      old and the new version would requiring fixing the old version, which
      in turn means reimplementing the equivalent of MAYBE_CG in StgCmmPrim.
      
      The inline allocation threshold is configurable via
      -fmax-inline-alloc-size which gives the maximum array size, in bytes,
      to allocate inline. The size does not include the closure header size.
      
      Allowing the same primop to be either inline or out-of-line has some
      implication for how we lay out heap checks. We always place a heap
      check around out-of-line primops, as they may allocate outside of our
      knowledge. However, for the inline primops we only allow allocation
      via the standard means (i.e. virtHp). Since the clone primops might be
      either inline or out-of-line the heap check layout code now consults
      shouldInlinePrimOp to know whether a primop will be inlined.
      1eece456
  28. 01 Oct, 2013 1 commit
    • Simon Marlow's avatar
      Remove use of R9, and fix associated bugs · 11b5ce55
      Simon Marlow authored
      We were passing the function address to stg_gc_prim_p in R9, which was
      wrong because the call was a high-level call and didn't declare R9 as
      a parameter.  Passing R9 as an argument is the right way, but
      unfortunately that exposed another bug: we were using the same macro
      in some low-level Cmm, where it is illegal to call functions with
      arguments (see Note [syntax of cmm files]).  So we now have low-level
      variants of STK_CHK() and STK_CHK_P() for use in low-level Cmm code.
      11b5ce55
  29. 23 Sep, 2013 2 commits
  30. 11 Apr, 2013 1 commit
  31. 29 Mar, 2013 1 commit
    • nfrisby's avatar
      ticky enhancements · 460abd75
      nfrisby authored
        * the new StgCmmArgRep module breaks a dependency cycle; I also
          untabified it, but made no real changes
      
        * updated the documentation in the wiki and change the user guide to
          point there
      
        * moved the allocation enters for ticky and CCS to after the heap check
      
          * I left LDV where it was, which was before the heap check at least
            once, since I have no idea what it is
      
        * standardized all (active?) ticky alloc totals to bytes
      
        * in order to avoid double counting StgCmmLayout.adjustHpBackwards
          no longer bumps ALLOC_HEAP_ctr
      
        * I resurrected the SLOW_CALL counters
      
          * the new module StgCmmArgRep breaks cyclic dependency between
            Layout and Ticky (which the SLOW_CALL counters cause)
      
          * renamed them SLOW_CALL_fast_<pattern> and VERY_SLOW_CALL
      
        * added ALLOC_RTS_ctr and _tot ticky counters
      
          * eg allocation by Storage.c:allocate or a BUILD_PAP in stg_ap_*_info
      
          * resurrected ticky counters for ALLOC_THK, ALLOC_PAP, and
            ALLOC_PRIM
      
          * added -ticky and -DTICKY_TICKY in ways.mk for debug ways
      
        * added a ticky counter for total LNE entries
      
        * new flags for ticky: -ticky-allocd -ticky-dyn-thunk -ticky-LNE
      
          * all off by default
      
          * -ticky-allocd: tracks allocation *of* closure in addition to
             allocation *by* that closure
      
          * -ticky-dyn-thunk tracks dynamic thunks as if they were functions
      
          * -ticky-LNE tracks LNEs as if they were functions
      
        * updated the ticky report format, including making the argument
          categories (more?) accurate again
      
        * the printed name for things in the report include the unique of
          their ticky parent as well as if they are not top-level
      460abd75
  32. 26 Feb, 2013 1 commit
  33. 01 Feb, 2013 1 commit
  34. 30 Jan, 2013 1 commit
    • Ben Gamari's avatar
      STM: Only wake up once · a23661d2
      Ben Gamari authored
      Previously, threads blocked on an STM retry would be sent a wakeup
      message each time an unpark was requested. This could result in the
      accumulation of a large number of wake-up messages, which would slow
      wake-up once the sleeping thread is finally scheduled.
      
      Here, we introduce a new closure type, STM_AWOKEN, which marks a TSO
      which has been sent a wake-up message, allowing us to send only one
      wakeup.
      a23661d2
  35. 31 Oct, 2012 1 commit
  36. 30 Oct, 2012 2 commits
  37. 23 Oct, 2012 1 commit
  38. 19 Oct, 2012 1 commit