1. 28 Jul, 2020 1 commit
  2. 26 Jul, 2020 1 commit
  3. 15 Jul, 2020 4 commits
  4. 27 Jun, 2020 1 commit
    • Sylvain Henry's avatar
      Fix ghc-bignum exceptions · 1b3d13b6
      Sylvain Henry authored
      We must ensure that exceptions are not simplified. Previously we used:
      
         case raiseDivZero of
            _ -> 0## -- dummyValue
      
      But it was wrong because the evaluation of `raiseDivZero` was removed and
      the dummy value was directly returned. See new Note [ghc-bignum exceptions].
      
      I've also removed the exception triggering primops which were fragile.
      We don't need them to be primops, we can have them exported by ghc-prim.
      
      I've also added a test for #18359 which triggered this patch.
      1b3d13b6
  5. 01 Jun, 2020 2 commits
  6. 18 Apr, 2020 1 commit
    • Sylvain Henry's avatar
      Modules (#13009) · 15312bbb
      Sylvain Henry authored
      * SysTools
      * Parser
      * GHC.Builtin
      * GHC.Iface.Recomp
      * Settings
      
      Update Haddock submodule
      
      Metric Decrease:
          Naperian
          parsing001
      15312bbb
  7. 15 Apr, 2020 3 commits
    • Daniel Gröber (dxld)'s avatar
    • Daniel Gröber (dxld)'s avatar
      rts: Fix nomenclature in OVERWRITING_CLOSURE macros · e149dea9
      Daniel Gröber (dxld) authored
      The additional commentary introduced by commit 8916e64e ("Implement
      shrinkSmallMutableArray# and resizeSmallMutableArray#.") unfortunately got
      this wrong. We set 'prim' to true in overwritingClosureOfs because we
      _don't_ want to call LDV_recordDead().
      
      The reason is because of this "inherently used" distinction made in the LDV
      profiler so I rename the variable to be more appropriate.
      e149dea9
    • Daniel Gröber (dxld)'s avatar
      Zero out pinned block alignment slop when profiling · 41230e26
      Daniel Gröber (dxld) authored
      The heap profiler currently cannot traverse pinned blocks because of
      alignment slop. This used to just be a minor annoyance as the whole block
      is accounted into a special cost center rather than the respective object's
      CCS, cf. #7275. However for the new root profiler we would like to be able
      to visit _every_ closure on the heap. We need to do this so we can get rid
      of the current 'flip' bit hack in the heap traversal code.
      
      Since info pointers are always non-zero we can in principle skip all the
      slop in the profiler if we can rely on it being zeroed. This assumption
      caused problems in the past though, commit a586b33f ("rts: Correct
      handling of LARGE ARR_WORDS in LDV profiler"), part of !1118, tried to use
      the same trick for BF_LARGE objects but neglected to take into account that
      shrink*Array# functions don't ensure that slop is zeroed when not
      compiling with profiling.
      
      Later, commit 0c114c65 ("Handle large ARR_WORDS in heap census (fix
      as we will only be assuming slop is zeroed when profiling is on.
      
      This commit also reduces the ammount of slop we introduce in the first
      place by calculating the needed alignment before doing the allocation for
      small objects where we know the next available address. For large objects
      we don't know how much alignment we'll have to do yet since those details
      are hidden behind the allocateMightFail function so there we continue to
      allocate the maximum additional words we'll need to do the alignment.
      
      So we don't have to duplicate all this logic in the cmm code we pull it
      into the RTS allocatePinned function instead.
      
      Metric Decrease:
          T7257
          haddock.Cabal
          haddock.base
      41230e26
  8. 12 Feb, 2020 1 commit
  9. 11 Feb, 2020 1 commit
  10. 25 Jan, 2020 1 commit
  11. 13 Jan, 2020 1 commit
  12. 26 Oct, 2019 1 commit
  13. 23 Oct, 2019 1 commit
    • ryates@cs.rochester.edu's avatar
      Full abort on validate failure merging `orElse`. · 1f40e68a
      ryates@cs.rochester.edu authored
      Previously partial roll back of a branch of an `orElse` was attempted
      if validation failure was observed.  Validation here, however, does
      not account for what part of the transaction observed inconsistent
      state.  This commit fixes this by fully aborting and restarting the
      transaction.
      1f40e68a
  14. 21 Oct, 2019 2 commits
    • Ben Gamari's avatar
    • Ben Gamari's avatar
      rts: Implement concurrent collection in the nonmoving collector · bd8e3ff4
      Ben Gamari authored
      This extends the non-moving collector to allow concurrent collection.
      
      The full design of the collector implemented here is described in detail
      in a technical note
      
          B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
          Compiler" (2018)
      
      This extension involves the introduction of a capability-local
      remembered set, known as the /update remembered set/, which tracks
      objects which may no longer be visible to the collector due to mutation.
      To maintain this remembered set we introduce a write barrier on
      mutations which is enabled while a concurrent mark is underway.
      
      The update remembered set representation is similar to that of the
      nonmoving mark queue, being a chunked array of `MarkEntry`s. Each
      `Capability` maintains a single accumulator chunk, which it flushed
      when it (a) is filled, or (b) when the nonmoving collector enters its
      post-mark synchronization phase.
      
      While the write barrier touches a significant amount of code it is
      conceptually straightforward: the mutator must ensure that the referee
      of any pointer it overwrites is added to the update remembered set.
      However, there are a few details:
      
       * In the case of objects with a dirty flag (e.g. `MVar`s) we can
         exploit the fact that only the *first* mutation requires a write
         barrier.
      
       * Weak references, as usual, complicate things. In particular, we must
         ensure that the referee of a weak object is marked if dereferenced by
         the mutator. For this we (unfortunately) must introduce a read
         barrier, as described in Note [Concurrent read barrier on deRefWeak#]
         (in `NonMovingMark.c`).
      
       * Stable names are also a bit tricky as described in Note [Sweeping
         stable names in the concurrent collector] (`NonMovingSweep.c`).
      
      We take quite some pains to ensure that the high thread count often seen
      in parallel Haskell applications doesn't affect pause times. To this end
      we allow thread stacks to be marked either by the thread itself (when it
      is executed or stack-underflows) or the concurrent mark thread (if the
      thread owning the stack is never scheduled). There is a non-trivial
      handshake to ensure that this happens without racing which is described
      in Note [StgStack dirtiness flags and concurrent marking].
      Co-Authored-by: Ömer Sinan Ağacan's avatarÖmer Sinan Ağacan <omer@well-typed.com>
      bd8e3ff4
  15. 18 Oct, 2019 1 commit
  16. 03 Oct, 2019 1 commit
    • Stefan Schulze Frielinghaus's avatar
      Extend argument of createIOThread to word size · d0924b15
      Stefan Schulze Frielinghaus authored
      Function createIOThread expects its second argument to be of size word.
      The natural size of the second parameter is 32bits. Thus for some 64bit
      architectures, where a write of the lower half of a register does not
      clear the upper half, the value must be zero extended.
      d0924b15
  17. 09 Sep, 2019 1 commit
    • Sylvain Henry's avatar
      Module hierarchy: StgToCmm (#13009) · 447864a9
      Sylvain Henry authored
      Add StgToCmm module hierarchy. Platform modules that are used in several
      other places (NCG, LLVM codegen, Cmm transformations) are put into
      GHC.Platform.
      447864a9
  18. 10 Jul, 2019 1 commit
    • John Ericson's avatar
      Remove most uses of TARGET platform macros · 0472f0f6
      John Ericson authored
      These prevent multi-target builds. They were gotten rid of in 3 ways:
      
      1. In the compiler itself, replacing `#if` with runtime `if`. In these
      cases, we care about the target platform still, but the target platform
      is dynamic so we must delay the elimination to run time.
      
      2. In the compiler itself, replacing `TARGET` with `HOST`. There was
      just one bit of this, in some code splitting strings representing lists
      of paths. These paths are used by GHC itself, and not by the compiled
      binary. (They are compiler lookup paths, rather than RPATHS or something
      that does matter to the compiled binary, and thus would legitamentally
      be target-sensative.) As such, the path-splitting method only depends on
      where GHC runs and not where code it produces runs. This should have
      been `HOST` all along.
      
      3. Changing the RTS. The RTS doesn't care about the target platform,
      full stop.
      
      4. `includes/stg/HaskellMachRegs.h` This file is also included in the
      genapply executable. This is tricky because the RTS's host platform
      really is that utility's target platform. so that utility really really
      isn't multi-target either. But at least it isn't an installed part of
      GHC, but just a one-off tool when building the RTS. Lying with the
      `HOST` to a one-off program (genapply) that isn't installed doesn't seem so bad.
      It's certainly better than the other way around of lying to the RTS
      though not to genapply. The RTS is more important, and it is installed,
      *and* this header is installed as part of the RTS.
      0472f0f6
  19. 28 Jun, 2019 1 commit
    • Travis Whitaker's avatar
      Correct closure observation, construction, and mutation on weak memory machines. · 11bac115
      Travis Whitaker authored
      Here the following changes are introduced:
          - A read barrier machine op is added to Cmm.
          - The order in which a closure's fields are read and written is changed.
          - Memory barriers are added to RTS code to ensure correctness on
            out-or-order machines with weak memory ordering.
      
      Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this
      is lowered to an instruction that ensures memory reads that occur after said
      instruction in program order are not performed before reads coming before said
      instruction in program order. On machines with strong memory ordering properties
      (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so
      MO_ReadBarrier is simply erased. However, such an instruction is necessary on
      weakly ordered machines, e.g. ARM and PowerPC.
      
      Weam memory ordering has consequences for how closures are observed and mutated.
      For example, consider a closure that needs to be updated to an indirection. In
      order for the indirection to be safe for concurrent observers to enter, said
      observers must read the indirection's info table before they read the
      indirectee. Furthermore, the entering observer makes assumptions about the
      closure based on its info table contents, e.g. an INFO_TYPE of IND imples the
      closure has an indirectee pointer that is safe to follow.
      
      When a closure is updated with an indirection, both its info table and its
      indirectee must be written. With weak memory ordering, these two writes can be
      arbitrarily reordered, and perhaps even interleaved with other threads' reads
      and writes (in the absence of memory barrier instructions). Consider this
      example of a bad reordering:
      
      - An updater writes to a closure's info table (INFO_TYPE is now IND).
      - A concurrent observer branches upon reading the closure's INFO_TYPE as IND.
      - A concurrent observer reads the closure's indirectee and enters it. (!!!)
      - An updater writes the closure's indirectee.
      
      Here the update to the indirectee comes too late and the concurrent observer has
      jumped off into the abyss. Speculative execution can also cause us issues,
      consider:
      
      - An observer is about to case on a value in closure's info table.
      - The observer speculatively reads one or more of closure's fields.
      - An updater writes to closure's info table.
      - The observer takes a branch based on the new info table value, but with the
        old closure fields!
      - The updater writes to the closure's other fields, but its too late.
      
      Because of these effects, reads and writes to a closure's info table must be
      ordered carefully with respect to reads and writes to the closure's other
      fields, and memory barriers must be placed to ensure that reads and writes occur
      in program order. Specifically, updates to a closure must follow the following
      pattern:
      
      - Update the closure's (non-info table) fields.
      - Write barrier.
      - Update the closure's info table.
      
      Observing a closure's fields must follow the following pattern:
      
      - Read the closure's info pointer.
      - Read barrier.
      - Read the closure's (non-info table) fields.
      
      This patch updates RTS code to obey this pattern. This should fix long-standing
      SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting
      out-of-order execution) and PowerPC. This fixes issue #15449.
      Co-Authored-By: Ben Gamari's avatarBen Gamari <ben@well-typed.com>
      11bac115
  20. 04 May, 2019 1 commit
  21. 02 Apr, 2019 1 commit
    • Michal Terepeta's avatar
      Improve performance of newSmallArray# · 7cf5ba3d
      Michal Terepeta authored
      This:
      - Hoists part of the condition outside of the initialization loop in
        `stg_newSmallArrayzh`.
      - Annotates one of the unlikely branches as unlikely, also in
        `stg_newSmallArrayzh`.
      - Adds a couple of annotations to `allocateMightFail` indicating which
        branches are likely to be taken.
      
      Together this gives about 5% improvement.
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      7cf5ba3d
  22. 17 Mar, 2019 1 commit
    • Ben Gamari's avatar
      ghc-heap: Introduce closureSize · cb61371e
      Ben Gamari authored
      This function allows the user to compute the (non-transitive) size of a
      heap object in words. The "closure" in the name is admittedly confusing
      but we are stuck with this nomenclature at this point.
      cb61371e
  23. 21 Nov, 2018 1 commit
    • Ömer Sinan Ağacan's avatar
      Fix heap corruption during stable name allocation · 691aa715
      Ömer Sinan Ağacan authored
      See #15906 for the problem. To fix we simply call `allocate()` instead of
      `ALLOC_PRIM()`. `allocate()` does not trigger GC when the nursery is full,
      instead it extends it.
      
      Test Plan:
      This validates. memo001 now passes with `-debug` compile parameter. I'll add
      another test that runs memo001 with `-debug` once I figure out how to use
      stdout files for multiple tests.
      
      Reviewers: simonmar, bgamari, erikd
      
      Reviewed By: simonmar
      
      Subscribers: rwbarton, carter
      
      GHC Trac Issues: #15906
      
      Differential Revision: https://phabricator.haskell.org/D5342
      691aa715
  24. 21 Aug, 2018 1 commit
  25. 15 Jul, 2018 1 commit
  26. 04 Jul, 2018 1 commit
  27. 29 Jun, 2018 1 commit
  28. 17 Jun, 2018 1 commit
    • Ömer Sinan Ağacan's avatar
      Use __FILE__ for Cmm assertion locations, fix #8619 · 008ea12d
      Ömer Sinan Ağacan authored
      It seems like we currently support string literals in Cmm, so we can use
      __LINE__ CPP macro in assertion macros. This improves error messages
      that previously looked like
      
          ASSERTION FAILED: file (null), line 1302
      
      (null) part now shows the actual file name.
      
      Also inline some single-use string literals in PrimOps.cmm.
      
      Reviewers: bgamari, simonmar, erikd
      
      Reviewed By: bgamari
      
      Subscribers: rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4862
      008ea12d
  29. 05 Jun, 2018 1 commit
    • Ömer Sinan Ağacan's avatar
      Rename some mutable closure types for consistency · 4075656e
      Ömer Sinan Ağacan authored
      SMALL_MUT_ARR_PTRS_FROZEN0 -> SMALL_MUT_ARR_PTRS_FROZEN_DIRTY
      SMALL_MUT_ARR_PTRS_FROZEN  -> SMALL_MUT_ARR_PTRS_FROZEN_CLEAN
      MUT_ARR_PTRS_FROZEN0       -> MUT_ARR_PTRS_FROZEN_DIRTY
      MUT_ARR_PTRS_FROZEN        -> MUT_ARR_PTRS_FROZEN_CLEAN
      
      Naming is now consistent with other CLEAR/DIRTY objects (MVAR, MUT_VAR,
      MUT_ARR_PTRS).
      
      (alternatively we could rename MVAR_DIRTY/MVAR_CLEAN etc. to MVAR0/MVAR)
      
      Removed a few comments in Scav.c about FROZEN0 being on the mut_list
      because it's now clear from the closure type.
      
      Reviewers: bgamari, simonmar, erikd
      
      Reviewed By: simonmar
      
      Subscribers: rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4784
      4075656e
  30. 02 Jun, 2018 1 commit
  31. 20 May, 2018 1 commit
    • patrickdoc's avatar
      Add HeapView functionality · ec22f7dd
      patrickdoc authored
      This pulls parts of Joachim Breitner's ghc-heap-view library inside GHC.
      The bits added are the C hooks into the RTS and a basic Haskell wrapper
      to these C hooks. The main reason for these to be added to GHC proper
      is that the code needs to be kept in sync with the closure types
      defined by the RTS. It is expected that the version of HeapView shipped
      with GHC will always work with that version of GHC and that extra
      functionality can be layered on top with a library like ghc-heap-view
      distributed via Hackage.
      
      Test Plan: validate
      
      Reviewers: simonmar, hvr, nomeata, austin, Phyx, bgamari, erikd
      
      Reviewed By: bgamari
      
      Subscribers: carter, patrickdoc, tmcgilchrist, rwbarton, thomie
      
      Differential Revision: https://phabricator.haskell.org/D3055
      ec22f7dd
  32. 19 Mar, 2018 1 commit
    • Ben Gamari's avatar
      Improve accuracy of get/setAllocationCounter · 20cbb016
      Ben Gamari authored
      Summary:
      get/setAllocationCounter didn't take into account allocations in the
      current block. This was known at the time, but it turns out to be
      important to have more accuracy when using these in a fine-grained
      way.
      
      Test Plan:
      New unit test to test incrementally larger allocaitons.  Before I got
      results like this:
      
      ```
      +0
      +0
      +0
      +0
      +0
      +4096
      +0
      +0
      +0
      +0
      +0
      +4064
      +0
      +0
      +4088
      +4056
      +0
      +0
      +0
      +4088
      +4096
      +4056
      +4096
      ```
      
      Notice how the results aren't always monotonically increasing.  After
      this patch:
      
      ```
      +344
      +416
      +488
      +560
      +632
      +704
      +776
      +848
      +920
      +992
      +1064
      +1136
      +1208
      +1280
      +1352
      +1424
      +1496
      +1568
      +1640
      +1712
      +1784
      +1856
      +1928
      +2000
      +2072
      +2144
      ```
      
      Reviewers: hvr, erikd, simonmar, jrtc27, trommler
      
      Reviewed By: simonmar
      
      Subscribers: trommler, jrtc27, rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4363
      20cbb016
  33. 09 Mar, 2018 1 commit