1. 12 Dec, 2019 1 commit
  2. 11 Dec, 2019 2 commits
  3. 09 Dec, 2019 1 commit
    • Gabor Greif's avatar
      Fix comment typos · d46a72e1
      Gabor Greif authored
      The below is only necessary to fix the CI perf fluke that
      happened in 9897e8c8:
      -------------------------
      Metric Decrease:
          T5837
          T6048
          T9020
          T12425
          T12234
          T13035
          T12150
          Naperian
      -------------------------
      d46a72e1
  4. 05 Dec, 2019 2 commits
    • Ben Gamari's avatar
      rts/NonMovingSweep: Fix locking of new mutable list allocation · a7a4efbf
      Ben Gamari authored
      Previously we used allocBlockOnNode_sync in nonmovingSweepMutLists
      despite the fact that we aren't in the GC and therefore the allocation
      spinlock isn't in use. This meant that sweep would end up spinning until
      the next minor GC, when the SM lock was moved away from the SM_MUTEX to
      the spinlock. This isn't a correctness issue but it sure isn't good for
      performance.
      
      Found thanks for Ward.
      
      Fixes #17539.
      a7a4efbf
    • Ben Gamari's avatar
      nonmoving: Clear segment bitmaps during sweep · 69001f54
      Ben Gamari authored
      Previously we would clear the bitmaps of segments which we are going to
      sweep during the preparatory pause. However, this is unnecessary: the
      existence of the mark epoch ensures that the sweep will correctly
      identify non-reachable objects, even if we do not clear the bitmap.
      
      We now defer clearing the bitmap to sweep, which happens concurrently
      with mutation.
      69001f54
  5. 02 Dec, 2019 1 commit
  6. 28 Nov, 2019 1 commit
  7. 24 Nov, 2019 1 commit
  8. 23 Nov, 2019 1 commit
  9. 20 Nov, 2019 2 commits
  10. 19 Nov, 2019 8 commits
    • Ben Gamari's avatar
      nonmoving: Drop redundant write barrier on stack underflow · 098d5017
      Ben Gamari authored
      Previously we would push stack-carried return values to the new stack on
      a stack overflow. While the precise reasoning for this barrier is
      unfortunately lost to history, in hindsight I suspect it was prompted by
      a missing barrier elsewhere (that has been since fixed).
      
      Moreover, there the redundant barrier is actively harmful: the stack may
      contain non-pointer values; blindly pushing these to the mark queue will
      result in a crash. This is precisely what happened in the `stack003`
      test. However, because of a (now fixed) deficiency in the test this
      crash did not trigger on amd64.
      098d5017
    • Ben Gamari's avatar
      nonmoving: Fix handling on large object marking on 32-bit · eb7b233a
      Ben Gamari authored
      Previously we would reset the pointer pointing to the object to be
      marked to the beginning of the block when marking a large object. This
      did no harm on 64-bit but on 32-bit it broke, e.g. `arr020`, since we
      align pinned ByteArray allocations such that the payload is 8
      byte-aligned. This means that the object might not begin at the
      beginning of the block.,
      eb7b233a
    • Ben Gamari's avatar
      nonmoving: Rework mark queue representation · 097f8072
      Ben Gamari authored
      The previous representation needlessly limited the array length to
      16-bits on 32-bit platforms.
      097f8072
    • Ben Gamari's avatar
      nonmoving: Fix incorrect masking in mark queue type test · deed8e31
      Ben Gamari authored
      We were using TAG_BITS instead of TAG_MASK. This happened to work on
      64-bit platforms where TAG_BITS==3 since we only use tag values 0 and
      3. However, this broken on 32-bit platforms where TAG_BITS==2.
      deed8e31
    • Ben Gamari's avatar
      nonmoving: Use correct info table pointer accessor · c819c0e4
      Ben Gamari authored
      Previously we used INFO_PTR_TO_STRUCT instead of
      THUNK_INFO_PTR_TO_STRUCT when looking at a thunk. These two happen to be
      equivalent on 64-bit architectures due to alignment considerations
      however they are different on 32-bit platforms. This lead to #17487.
      
      To fix this we also employ a small optimization: there is only one thunk
      of type WHITEHOLE (namely stg_WHITEHOLE_info). Consequently, we can just
      use a plain pointer comparison instead of testing against info->type.
      c819c0e4
    • Ben Gamari's avatar
      rts: Add missing include of SymbolExtras.h · 0418c38d
      Ben Gamari authored
      This broke the Windows build.
      0418c38d
    • Ben Gamari's avatar
      Properly account for libdw paths in make build system · 2b27cc16
      Ben Gamari authored
      Should finally fix #17255.
      2b27cc16
    • vdukhovni's avatar
      Enable USE_PTHREAD_FOR_ITIMER also on FreeBSD · ec8a463d
      vdukhovni authored
      If using a pthread instead of a timer signal is more reliable, and
      has no known drawbacks, then FreeBSD is also capable of supporting
      this mode of operation (tested on FreeBSD 12 with GHC 8.8.1, but
      no reason why it would not also work on FreeBSD 11 or GHC 8.6).
      
      Proposed by Kevin Zhang in:
      
          https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=241849
      ec8a463d
  11. 08 Nov, 2019 5 commits
  12. 06 Nov, 2019 2 commits
  13. 05 Nov, 2019 1 commit
  14. 04 Nov, 2019 1 commit
    • Ben Gamari's avatar
      rts/linker: Ensure that code isn't writable · 120f2e53
      Ben Gamari authored
      For many years the linker would simply map all of its memory with
      PROT_READ|PROT_WRITE|PROT_EXEC. However operating systems have been
      becoming increasingly reluctant to accept this practice (e.g. #17353
      and #12657) and for good reason: writable code is ripe for exploitation.
      
      Consequently mmapForLinker now maps its memory with
      PROT_READ|PROT_WRITE.  After the linker has finished filling/relocating
      the mapping it must then call mmapForLinkerMarkExecutable on the
      sections of the mapping which contain executable code.
      
      Moreover, to make all of this possible it was necessary to redesign the
      m32 allocator. First, we gave (in an earlier commit) each ObjectCode its
      own m32_allocator. This was necessary since code loading and symbol
      resolution/relocation are currently interleaved, meaning that it is not
      possible to enforce W^X when symbols from different objects reside in
      the same page.
      
      We then redesigned the m32 allocator to take advantage of the fact that
      all of the pages allocated with the allocator die at the same time
      (namely, when the owning ObjectCode is unloaded). This makes a number of
      things simpler (e.g. no more page reference counting; the interface
      provided by the allocator for freeing is simpler). See
      Note [M32 Allocator] for details.
      120f2e53
  15. 02 Nov, 2019 1 commit
  16. 01 Nov, 2019 2 commits
    • Ben Gamari's avatar
      rts: Make m32 allocator per-ObjectCode · c6759080
      Ben Gamari authored
      MacOS Catalina is finally going to force our hand in forbidden writable
      exeutable mappings. Unfortunately, this is quite incompatible with the
      current global m32 allocator, which mixes symbols from various objects
      in a single page. The problem here is that some of these symbols may not
      yet be resolved (e.g. had relocations performed) as this happens lazily
      (and therefore we can't yet make the section read-only and therefore
      executable).
      
      The easiest way around this is to simply create one m32 allocator per
      ObjectCode. This may slightly increase fragmentation for short-running
      programs but I suspect will actually improve fragmentation for programs
      doing lots of loading/unloading since we can always free all of the
      pages allocated to an object when it is unloaded (although this ability
      will only be implemented in a later patch).
      c6759080
    • Ben Gamari's avatar
      mmap: Factor out protection flags · 70b62c97
      Ben Gamari authored
      70b62c97
  17. 30 Oct, 2019 2 commits
  18. 26 Oct, 2019 2 commits
    • Ben Gamari's avatar
      rts: Fix ARM linker includes · 417f59d4
      Ben Gamari authored
       * Prefer #pragma once over guard macros
       * Drop redundant #includes
       * Fix order to ensure that necessary macros are defined when we
         condition on them
      417f59d4
    • Andrew Martin's avatar
      Implement shrinkSmallMutableArray# and resizeSmallMutableArray#. · 8916e64e
      Andrew Martin authored
      This is a part of GHC Proposal #25: "Offer more array resizing primitives".
      Resources related to the proposal:
      
        - Discussion: https://github.com/ghc-proposals/ghc-proposals/pull/121
        - Proposal: https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0025-resize-boxed.rst
      
      Only shrinkSmallMutableArray# is implemented as a primop since a
      library-space implementation of resizeSmallMutableArray# (in GHC.Exts)
      is no less efficient than a primop would be. This may be replaced by
      a primop in the future if someone devises a strategy for growing
      arrays in-place. The library-space implementation always copies the
      array when growing it.
      
      This commit also tweaks the documentation of the deprecated
      sizeofMutableByteArray#, removing the mention of concurrency. That
      primop is unsound even in single-threaded applications. Additionally,
      the non-negativity assertion on the existing shrinkMutableByteArray#
      primop has been removed since this predicate is trivially always true.
      8916e64e
  19. 25 Oct, 2019 1 commit
    • Ben Gamari's avatar
      configure: Drop GccLT46 · 519f5162
      Ben Gamari authored
      GCC 4.6 was released 7 years ago. I think we can finally assume that
      it's available. This is a simplification prompted by #15742.
      519f5162
  20. 23 Oct, 2019 3 commits
    • ryates@cs.rochester.edu's avatar
      Full abort on validate failure merging `orElse`. · 1f40e68a
      ryates@cs.rochester.edu authored
      Previously partial roll back of a branch of an `orElse` was attempted
      if validation failure was observed.  Validation here, however, does
      not account for what part of the transaction observed inconsistent
      state.  This commit fixes this by fully aborting and restarting the
      transaction.
      1f40e68a
    • Matthew Pickering's avatar
      eventlog: Dump cost centre stack on each sample · 17987a4b
      Matthew Pickering authored
      With this change it is possible to reconstruct the timing portion of a
      `.prof` file after the fact. By logging the stacks at each time point
      a more precise executation trace of the program can be observed rather
      than all identical cost centres being identified in the report.
      
      There are two new events:
      
      1. `EVENT_PROF_BEGIN` - emitted at the start of profiling to communicate
      the tick interval
      2. `EVENT_PROF_SAMPLE_COST_CENTRE` - emitted on each tick to communicate the
      current call stack.
      
      Fixes #17322
      17987a4b
    • Ömer Sinan Ağacan's avatar
      Refactor Compact.c: · b521e8b6
      Ömer Sinan Ağacan authored
      - Remove forward declarations
      - Introduce UNTAG_PTR and GET_PTR_TAG for dealing with pointer tags
        without having to cast arguments to StgClosure*
      - Remove dead code
      - Use W_ instead of StgWord
      - Use P_ instead of StgPtr
      b521e8b6