1. 22 Sep, 2019 4 commits
  2. 21 Sep, 2019 1 commit
  3. 17 Sep, 2019 2 commits
    • John Ericson's avatar
      Deduplicate `HaskellMachRegs.h` and `RtsMachRegs.h` headers · c77fc3b2
      John Ericson authored
      Until 0472f0f6 there was a meaningful
      host vs target distinction (though it wasn't used right, in genapply).
      After that, they did not differ in meaningful ways, so it's best to just
      only keep one.
      c77fc3b2
    • Matthew Pickering's avatar
      eventlog: Add biographical and retainer profiling traces · ae4415b9
      Matthew Pickering authored
      This patch adds a new eventlog event which indicates the start of
      a biographical profiler sample. These are different to normal events as
      they also include the timestamp of when the census took place. This is
      because the LDV profiler only emits samples at the end of the run.
      
      Now all the different profiling modes emit consumable events to the
      eventlog.
      ae4415b9
  4. 12 Sep, 2019 1 commit
  5. 09 Sep, 2019 1 commit
    • Sylvain Henry's avatar
      Module hierarchy: StgToCmm (#13009) · 447864a9
      Sylvain Henry authored
      Add StgToCmm module hierarchy. Platform modules that are used in several
      other places (NCG, LLVM codegen, Cmm transformations) are put into
      GHC.Platform.
      447864a9
  6. 01 Sep, 2019 1 commit
  7. 18 Aug, 2019 1 commit
  8. 10 Aug, 2019 1 commit
    • Joachim Breitner's avatar
      Consolidate `TablesNextToCode` and `GhcUnreigsterised` in configure (#15548) · 81860281
      Joachim Breitner authored
      `TablesNextToCode` is now a substituted by configure, where it has the
      correct defaults and error handling. Nowhere else needs to duplicate
      that, though we may want the compiler to to guard against bogus settings
      files.
      
      I renamed it from `GhcEnableTablesNextToCode` to `TablesNextToCode` to:
      
       - Help me guard against any unfixed usages
      
       - Remove any lingering connotation that this flag needs to be combined
         with `GhcUnreigsterised`.
      
      Original reviewers:
      
      Original subscribers: TerrorJack, rwbarton, carter
      
      Original Differential Revision: https://phabricator.haskell.org/D5082
      81860281
  9. 03 Aug, 2019 1 commit
    • Ben Gamari's avatar
      rts: Always truncate output files · 4664bafc
      Ben Gamari authored
      Previously there were numerous places in the RTS where we would fopen
      with the "w" flag string. This is wrong as it will not truncate the
      file. Consequently if we write less data than the previous length of the
      file we will leave garbage at its end.
      
      Fixes #16993.
      4664bafc
  10. 30 Jul, 2019 1 commit
    • Andreas Klebinger's avatar
      Expand the preallocated Int range to [-16,255] · 9c8a211a
      Andreas Klebinger authored
      Effects as I measured them:
      
      RTS Size: +0.1%
      Compile times: -0.5%
      Runtine nofib: -1.1%
      
      Nofib runtime result seems to mostly come from the `CS` benchmark
      which is very sensible to alignment changes so this is likely over
      represented.
      
      However the compile time changes are realistic.
      
      This is related to #16961.
      9c8a211a
  11. 26 Jul, 2019 1 commit
  12. 16 Jul, 2019 2 commits
  13. 14 Jul, 2019 1 commit
    • John Ericson's avatar
      Expunge #ifdef and #ifndef from the codebase · d7c6c471
      John Ericson authored
      These are unexploded minds as far as the linter is concerned. I don't
      want to hit in my MRs by mistake!
      
      I did this with `sed`, and then rolled back some changes in the docs,
      config.guess, and the linter itself.
      d7c6c471
  14. 10 Jul, 2019 1 commit
    • John Ericson's avatar
      Remove most uses of TARGET platform macros · 0472f0f6
      John Ericson authored
      These prevent multi-target builds. They were gotten rid of in 3 ways:
      
      1. In the compiler itself, replacing `#if` with runtime `if`. In these
      cases, we care about the target platform still, but the target platform
      is dynamic so we must delay the elimination to run time.
      
      2. In the compiler itself, replacing `TARGET` with `HOST`. There was
      just one bit of this, in some code splitting strings representing lists
      of paths. These paths are used by GHC itself, and not by the compiled
      binary. (They are compiler lookup paths, rather than RPATHS or something
      that does matter to the compiled binary, and thus would legitamentally
      be target-sensative.) As such, the path-splitting method only depends on
      where GHC runs and not where code it produces runs. This should have
      been `HOST` all along.
      
      3. Changing the RTS. The RTS doesn't care about the target platform,
      full stop.
      
      4. `includes/stg/HaskellMachRegs.h` This file is also included in the
      genapply executable. This is tricky because the RTS's host platform
      really is that utility's target platform. so that utility really really
      isn't multi-target either. But at least it isn't an installed part of
      GHC, but just a one-off tool when building the RTS. Lying with the
      `HOST` to a one-off program (genapply) that isn't installed doesn't seem so bad.
      It's certainly better than the other way around of lying to the RTS
      though not to genapply. The RTS is more important, and it is installed,
      *and* this header is installed as part of the RTS.
      0472f0f6
  15. 05 Jul, 2019 2 commits
  16. 02 Jul, 2019 4 commits
  17. 28 Jun, 2019 3 commits
    • Ben Gamari's avatar
      rts: Assert that LDV profiling isn't used with parallel GC · bd660ede
      Ben Gamari authored
      I'm not entirely sure we are careful about ensuring this; this is a
      last-ditch check.
      bd660ede
    • Travis Whitaker's avatar
      Correct closure observation, construction, and mutation on weak memory machines. · 11bac115
      Travis Whitaker authored
      Here the following changes are introduced:
          - A read barrier machine op is added to Cmm.
          - The order in which a closure's fields are read and written is changed.
          - Memory barriers are added to RTS code to ensure correctness on
            out-or-order machines with weak memory ordering.
      
      Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this
      is lowered to an instruction that ensures memory reads that occur after said
      instruction in program order are not performed before reads coming before said
      instruction in program order. On machines with strong memory ordering properties
      (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so
      MO_ReadBarrier is simply erased. However, such an instruction is necessary on
      weakly ordered machines, e.g. ARM and PowerPC.
      
      Weam memory ordering has consequences for how closures are observed and mutated.
      For example, consider a closure that needs to be updated to an indirection. In
      order for the indirection to be safe for concurrent observers to enter, said
      observers must read the indirection's info table before they read the
      indirectee. Furthermore, the entering observer makes assumptions about the
      closure based on its info table contents, e.g. an INFO_TYPE of IND imples the
      closure has an indirectee pointer that is safe to follow.
      
      When a closure is updated with an indirection, both its info table and its
      indirectee must be written. With weak memory ordering, these two writes can be
      arbitrarily reordered, and perhaps even interleaved with other threads' reads
      and writes (in the absence of memory barrier instructions). Consider this
      example of a bad reordering:
      
      - An updater writes to a closure's info table (INFO_TYPE is now IND).
      - A concurrent observer branches upon reading the closure's INFO_TYPE as IND.
      - A concurrent observer reads the closure's indirectee and enters it. (!!!)
      - An updater writes the closure's indirectee.
      
      Here the update to the indirectee comes too late and the concurrent observer has
      jumped off into the abyss. Speculative execution can also cause us issues,
      consider:
      
      - An observer is about to case on a value in closure's info table.
      - The observer speculatively reads one or more of closure's fields.
      - An updater writes to closure's info table.
      - The observer takes a branch based on the new info table value, but with the
        old closure fields!
      - The updater writes to the closure's other fields, but its too late.
      
      Because of these effects, reads and writes to a closure's info table must be
      ordered carefully with respect to reads and writes to the closure's other
      fields, and memory barriers must be placed to ensure that reads and writes occur
      in program order. Specifically, updates to a closure must follow the following
      pattern:
      
      - Update the closure's (non-info table) fields.
      - Write barrier.
      - Update the closure's info table.
      
      Observing a closure's fields must follow the following pattern:
      
      - Read the closure's info pointer.
      - Read barrier.
      - Read the closure's (non-info table) fields.
      
      This patch updates RTS code to obey this pattern. This should fix long-standing
      SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting
      out-of-order execution) and PowerPC. This fixes issue #15449.
      Co-Authored-By: Ben Gamari's avatarBen Gamari <ben@well-typed.com>
      11bac115
    • Sylvain Henry's avatar
      4ec233ec
  18. 27 Jun, 2019 3 commits
    • Matthew Pickering's avatar
      rts: Do not traverse nursery for dead closures in LDV profile · 07cffc49
      Matthew Pickering authored
      It is important that `heapCensus` and `LdvCensusForDead` traverse the
      same areas.
      
      `heapCensus` increases the `not_used` counter which tracks how many
      closures are live but haven't been used yet.
      
      `LdvCensusForDead` increases the `void_total` counter which tracks how
      many dead closures there are.
      
      The `LAG` is then calculated by substracting the `void_total` from
      `not_used` and so it is essential that `not_used >= void_total`. This
      fact is checked by quite a few assertions.
      
      However, if a program has low maximum residency but allocates a lot in
      the nursery then these assertions were failing (see #16753 and #15903)
      because `LdvCensusForDead` was observing dead closures from the nursery
      which totalled more than the `not_used`. The same closures were not
      counted by `heapCensus`.
      
      Therefore, it seems that the correct fix is to make `LdvCensusForDead`
      agree with `heapCensus` and not traverse the nursery for dead closures.
      
      Fixes #16100 #16753 #15903 #8982
      07cffc49
    • Matthew Pickering's avatar
      rts: Correct assertion in LDV_recordDead · ed4cbd93
      Matthew Pickering authored
      It is possible that void_total is exactly equal to not_used and the
      other assertions for this check for <= rather than <.
      ed4cbd93
    • Matthew Pickering's avatar
      rts: Correct handling of LARGE ARR_WORDS in LDV profiler · a586b33f
      Matthew Pickering authored
      This implements the correct fix for #11627 by skipping over the slop
      (which is zeroed) rather than adding special case logic for LARGE
      ARR_WORDS which runs the risk of not performing a correct census by
      ignoring any subsequent blocks.
      
      This approach implements similar logic to that in Sanity.c
      a586b33f
  19. 26 Jun, 2019 1 commit
  20. 22 Jun, 2019 1 commit
    • Ben Gamari's avatar
      rts: Reset STATIC_LINK field of reverted CAFs · b0d6bf2a
      Ben Gamari authored
      When we revert a CAF we must reset the STATIC_LINK field lest the GC
      might ignore the CAF (e.g. as it carries the STATIC_FLAG_LIST flag) and
      will consequently overlook references to object code that we are trying
      to unload. This would result in the reachable object code being
      unloaded. See Note [CAF lists] and Note [STATIC_LINK fields].
      
      This fixes #16842.
      
      Idea-due-to: Phuong Trinh <lolotp@fb.com>
      b0d6bf2a
  21. 13 Jun, 2019 1 commit
  22. 12 Jun, 2019 2 commits
  23. 11 Jun, 2019 4 commits