1. 29 Mar, 2021 1 commit
    • Moritz Angermann's avatar
      Allocate Adjustors and mark them readable in two steps · e754ff7f
      Moritz Angermann authored
      This drops allocateExec for darwin, and replaces it with
      a alloc, write, mark executable strategy instead. This prevents
      us from trying to allocate an executable range and then write to
      it, which X^W will prohibit on darwin.
      
      This will *only* work if we can use mmap.
      e754ff7f
  2. 14 Mar, 2021 1 commit
  3. 08 Mar, 2021 1 commit
    • Matthew Pickering's avatar
      rts: Use a separate free block list for allocatePinned · 47d6acd3
      Matthew Pickering authored
      The way in which allocatePinned took blocks out of the nursery was
      leading to horrible fragmentation in some workloads.
      
      The strategy now is that a separate free block list is reserved for each
      capability and blocks are taken from there. When it's empty the global
      SM lock is taken and a fresh block of size PINNED_EMPTY_SIZE is allocated.
      
      Fixes #19481
      47d6acd3
  4. 18 Feb, 2021 1 commit
    • Matthew Pickering's avatar
      rts: Add generic block traversal function, listAllBlocks · 4dc2bcca
      Matthew Pickering authored
      This function is exposed in the RtsAPI.h so that external users have a
      blessed way to traverse all the different `bdescr`s which are known by
      the RTS.
      
      The main motivation is to use this function in ghc-debug but avoid
      having to expose the internal structure of a Capability in the API.
      4dc2bcca
  5. 07 Jan, 2021 2 commits
  6. 15 Nov, 2020 1 commit
    • Moritz Angermann's avatar
      AArch64/arm64 adjustments · 8887102f
      Moritz Angermann authored
      This addes the necessary logic to support aarch64 on elf, as well
      as aarch64 on mach-o, which Apple calls arm64.
      
      We change architecture name to AArch64, which is the official arm
      naming scheme.
      8887102f
  7. 11 Nov, 2020 1 commit
  8. 01 Nov, 2020 1 commit
  9. 30 Oct, 2020 1 commit
  10. 25 Oct, 2020 1 commit
  11. 25 Jun, 2020 1 commit
  12. 01 Jun, 2020 1 commit
    • Daniel Gröber (dxld)'s avatar
      Cleanup OVERWRITING_CLOSURE logic · 2ee4f36c
      Daniel Gröber (dxld) authored
      The code is just more confusing than it needs to be. We don't need to mix
      the threaded check with the ldv profiling check since ldv's init already
      checks for this. Hence they can be two separate checks. Taking the sanity
      checking into account is also cleaner via DebugFlags.sanity. No need for
      checking the DEBUG define.
      
      The ZERO_SLOP_FOR_LDV_PROF and ZERO_SLOP_FOR_SANITY_CHECK definitions the
      old code had also make things a lot more opaque IMO so I removed those.
      2ee4f36c
  13. 06 May, 2020 1 commit
    • Ben Gamari's avatar
      nonmoving: Fix handling of dirty objects · b2d72c75
      Ben Gamari authored
      Previously we (incorrectly) relied on failed_to_evac to be "precise".
      That is, we expected it to only be true if *all* of an object's fields
      lived outside of the non-moving heap. However, does not match the
      behavior of failed_to_evac, which is true if *any* of the object's
      fields weren't promoted (meaning that some others *may* live in the
      non-moving heap).
      
      This is problematic as we skip the non-moving write barrier for dirty
      objects (which we can only safely do if *all* fields point outside of
      the non-moving heap).
      
      Clearly this arises due to a fundamental difference in the behavior
      expected of failed_to_evac in the moving and non-moving collector.
      e.g., in the moving collector it is always safe to conservatively say
      failed_to_evac=true whereas in the non-moving collector the safe value
      is false.
      
      This issue went unnoticed as I never wrote down the dirtiness
      invariant enforced by the non-moving collector. We now define this
      invariant as
      
          An object being marke...
      b2d72c75
  14. 15 Apr, 2020 4 commits
    • Daniel Gröber (dxld)'s avatar
      c3c0f662
    • Daniel Gröber (dxld)'s avatar
    • Daniel Gröber (dxld)'s avatar
      15fa9bd6
    • Daniel Gröber (dxld)'s avatar
      Zero out pinned block alignment slop when profiling · 41230e26
      Daniel Gröber (dxld) authored
      The heap profiler currently cannot traverse pinned blocks because of
      alignment slop. This used to just be a minor annoyance as the whole block
      is accounted into a special cost center rather than the respective object's
      CCS, cf. #7275. However for the new root profiler we would like to be able
      to visit _every_ closure on the heap. We need to do this so we can get rid
      of the current 'flip' bit hack in the heap traversal code.
      
      Since info pointers are always non-zero we can in principle skip all the
      slop in the profiler if we can rely on it being zeroed. This assumption
      caused problems in the past though, commit a586b33f ("rts: Correct
      handling of LARGE ARR_WORDS in LDV profiler"), part of !1118, tried to use
      the same trick for BF_LARGE objects but neglected to take into account that
      shrink*Array# functions don't ensure that slop is zeroed when not
      compiling with profiling.
      
      Later, commit 0c114c65 ("Handle large ARR_WORDS in heap census (fix
      as we will only be assuming slop is zeroed when profiling is on.
      
      This commit also reduces the ammount of slop we introduce in the first
      place by calculating the needed alignment before doing the allocation for
      small objects where we know the next available address. For large objects
      we don't know how much alignment we'll have to do yet since those details
      are hidden behind the allocateMightFail function so there we continue to
      allocate the maximum additional words we'll need to do the alignment.
      
      So we don't have to duplicate all this logic in the cmm code we pull it
      into the RTS allocatePinned function instead.
      
      Metric Decrease:
          T7257
          haddock.Cabal
          haddock.base
      41230e26
  15. 25 Jan, 2020 1 commit
    • PHO's avatar
      Fix rts allocateExec() on NetBSD · 8b726534
      PHO authored
      Similar to SELinux, NetBSD "PaX mprotect" prohibits marking a page
      mapping both writable and executable at the same time. Use libffi
      which knows how to work around it.
      8b726534
  16. 09 Dec, 2019 1 commit
    • Gabor Greif's avatar
      Fix comment typos · d46a72e1
      Gabor Greif authored
      The below is only necessary to fix the CI perf fluke that
      happened in 9897e8c8:
      -------------------------
      Metric Decrease:
          T5837
          T6048
          T9020
          T12425
          T12234
          T13035
          T12150
          Naperian
      -------------------------
      d46a72e1
  17. 02 Dec, 2019 1 commit
  18. 22 Oct, 2019 1 commit
  19. 21 Oct, 2019 4 commits
    • Ben Gamari's avatar
    • Ben Gamari's avatar
      Don't cleanup until we've stopped the collector · 10373416
      Ben Gamari authored
      This requires that we break nonmovingExit into two pieces since we need
      to first stop the collector to relinquish any capabilities, then we need
      to shutdown the scheduler, then we need to free the nonmoving
      allocators.
      10373416
    • Ben Gamari's avatar
      rts: Implement concurrent collection in the nonmoving collector · bd8e3ff4
      Ben Gamari authored
      This extends the non-moving collector to allow concurrent collection.
      
      The full design of the collector implemented here is described in detail
      in a technical note
      
          B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
          Compiler" (2018)
      
      This extension involves the introduction of a capability-local
      remembered set, known as the /update remembered set/, which tracks
      objects which may no longer be visible to the collector due to mutation.
      To maintain this remembered set we introduce a write barrier on
      mutations which is enabled while a concurrent mark is underway.
      
      The update remembered set representation is similar to that of the
      nonmoving mark queue, being a chunked array of `MarkEntry`s. Each
      `Capability` maintains a single accumulator chunk, which it flushed
      when it (a) is filled, or (b) when the nonmoving collector enters its
      post-mark synchronization phase.
      
      While the write barrier touches a significant amount of code ...
      bd8e3ff4
    • Ömer Sinan Ağacan's avatar
      rts: Non-concurrent mark and sweep · 68e0647f
      Ömer Sinan Ağacan authored
      This implements the core heap structure and a serial mark/sweep
      collector which can be used to manage the oldest-generation heap.
      This is the first step towards a concurrent mark-and-sweep collector
      aimed at low-latency applications.
      
      The full design of the collector implemented here is described in detail
      in a technical note
      
          B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
          Compiler" (2018)
      
      The basic heap structure used in this design is heavily inspired by
      
          K. Ueno & A. Ohori. "A fully concurrent garbage collector for
          functional programs on multicore processors." /ACM SIGPLAN Notices/
          Vol. 51. No. 9 (presented by ICFP 2016)
      
      This design is intended to allow both marking and sweeping
      concurrent to execution of a multi-core mutator. Unlike the Ueno design,
      which requires no global synchronization pauses, the collector
      introduced here requires a stop-the-world pause at the beginning and end
      of the mark phase.
      
      To avoid heap fragmenta...
      68e0647f
  20. 18 Oct, 2019 1 commit
  21. 12 Oct, 2019 1 commit
    • John Ericson's avatar
      Simplify Configure in a few ways · c2290596
      John Ericson authored
       - No need to distinguish between gcc-llvm and clang. First of all,
         gcc-llvm is quite old and surely unmaintained by now. Second of all,
         none of the code actually care about that distinction!
      
         Now, it does make sense to consider C multiple frontends for LLVMs in
         the form of clang vs clang-cl (same clang, yes, but tweaked
         interface). But this is better handled in terms of "gccish vs
         mvscish" and "is LLVM", yielding 4 combinations. Therefore, I don't
         think it is useful saving the existing code for that.
      
       - Get the remaining CC_LLVM_BACKEND, and also TABLES_NEXT_TO_CODE in
         mk/config.h the normal way, rather than hacking it post-hoc. No point
         keeping these special cases around for now reason.
      
       - Get rid of hand-rolled `die` function and just use `AC_MSG_ERROR`.
      
       - Abstract check + flag override for unregisterised and tables next to
         code.
      
      Oh, and as part of the above I also renamed/combined some variables
      where it felt appropriate.
      
       - GccIsCla...
      c2290596
  22. 03 Oct, 2019 1 commit
  23. 28 Jun, 2019 2 commits
    • Travis Whitaker's avatar
      Correct closure observation, construction, and mutation on weak memory machines. · 11bac115
      Travis Whitaker authored
      Here the following changes are introduced:
          - A read barrier machine op is added to Cmm.
          - The order in which a closure's fields are read and written is changed.
          - Memory barriers are added to RTS code to ensure correctness on
            out-or-order machines with weak memory ordering.
      
      Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this
      is lowered to an instruction that ensures memory reads that occur after said
      instruction in program order are not performed before reads coming before said
      instruction in program order. On machines with strong memory ordering properties
      (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so
      MO_ReadBarrier is simply erased. However, such an instruction is necessary on
      weakly ordered machines, e.g. ARM and PowerPC.
      
      Weam memory ordering has consequences for how closures are observed and mutated.
      For example, consider a closure that needs to be upda...
      11bac115
    • Sylvain Henry's avatar
      4ec233ec
  24. 02 Apr, 2019 1 commit
    • Michal Terepeta's avatar
      Improve performance of newSmallArray# · 7cf5ba3d
      Michal Terepeta authored
      
      
      This:
      - Hoists part of the condition outside of the initialization loop in
        `stg_newSmallArrayzh`.
      - Annotates one of the unlikely branches as unlikely, also in
        `stg_newSmallArrayzh`.
      - Adds a couple of annotations to `allocateMightFail` indicating which
        branches are likely to be taken.
      
      Together this gives about 5% improvement.
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      7cf5ba3d
  25. 25 Mar, 2019 1 commit
    • Takenobu Tani's avatar
      Update Wiki URLs to point to GitLab · 3769e3a8
      Takenobu Tani authored
      This moves all URL references to Trac Wiki to their corresponding
      GitLab counterparts.
      
      This substitution is classified as follows:
      
      1. Automated substitution using sed with Ben's mapping rule [1]
          Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...
          New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...
      
      2. Manual substitution for URLs containing `#` index
          Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...#Zzz
          New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...#zzz
      
      3. Manual substitution for strings starting with `Commentary`
          Old: Commentary/XxxYyy...
          New: commentary/xxx-yyy...
      
      See also !539
      
      [1]: https://gitlab.haskell.org/bgamari/gitlab-migration/blob/master/wiki-mapping.json
      3769e3a8
  26. 15 Mar, 2019 1 commit
  27. 27 Jun, 2018 1 commit
  28. 26 Jun, 2018 1 commit
  29. 02 May, 2018 1 commit
  30. 30 Mar, 2018 1 commit
  31. 21 Jan, 2018 1 commit
    • Douglas Wilson's avatar
      [rts] Adjust whitehole_spin · 180ca65f
      Douglas Wilson authored
      Rename to whitehole_gc_spin, in preparation for adding stats for the
      whitehole busy-loop in SMPClosureOps.
      
      Make whitehole_gc_spin volatile, and move it to be defined and
      statically initialised in GC.c. This saves some #ifs, and I'm pretty
      sure it should be volatile.
      
      Test Plan: ./validate
      
      Reviewers: bgamari, erikd, simonmar
      
      Reviewed By: bgamari
      
      Subscribers: rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4300
      180ca65f
  32. 26 Sep, 2017 1 commit