1. 22 Sep, 2019 14 commits
    • Daniel Gröber (dxld)'s avatar
      rts: retainer: Reduce DEBUG_RETAINER ifdef noise · 64ec45a7
      Daniel Gröber (dxld) authored
      Keeping track of the maximum stack seems like a good idea in all
      configurations. The associated ASSERTs only materialize in debug mode but
      having the statistic is nice.
      To make the debug code less prone to bitrotting I introduce a function
      'debug()' which doesn't actually print by default and is #define'd away
      only when the standard DEBUG define is off.
    • Daniel Gröber (dxld)'s avatar
      rts: retainer: Rename heap traversal functions for extraction · b7e15d17
      Daniel Gröber (dxld) authored
      This gets all remaining functions in-line with the new 'traverse' prefix
      and module name.
    • Daniel Gröber (dxld)'s avatar
      rts: retainer: Remove obsolete debug code · ec1d76e2
      Daniel Gröber (dxld) authored
      Commit dbef766c ("Profiling cleanup.") made this debug code obsolete by
      removing the 'cost' function without a replacement. As best I can tell the
      retainer profiler used to do some heap census too and this debug code was
      mainly concerned with that.
    • Daniel Gröber (dxld)'s avatar
      rts: RetainerSet: Remove obsolete fist/second-approach choice · f3bb7397
      Daniel Gröber (dxld) authored
      In the old code when DEBUG_RETAINER was set, FIRST_APPROACH is
      implied. However ProfHeap.c now depends on printRetainerSetShort which is
      only available with SECOND_APPROACH. This is because with FIRST_APPROACH
      retainerProfile() will free all retainer sets before returning so by the
      time ProfHeap calls dumpCensus the retainer set pointers are segfaulty.
      Since all of this debugging code obviously hasn't been compiled in ages
      anyways I'm taking the liberty of just removing it.
      Remember guys: Dead code is a liability not an asset :)
    • Daniel Gröber (dxld)'s avatar
      rts: retainer: simplify pop() control flow · 48e816f0
      Daniel Gröber (dxld) authored
      Instead of breaking out of the switch-in-while construct using `return` this
      uses `goto out` which makes it possible to share a lot of the out-variable
      assignment code in all the cases.
      I also replaced the nasty `while(true)` business by the real loop
      condition: `while(*c == NULL)`. All `break` calls inside the switch aready
      have either a check for NULL or an assignment of `c` to NULL so this should
      not change any behaviour.
      Using `goto out` also allowed me to remove another minor wart: In the
      MVAR_*/WEAK cases the popOff() call used to happen before reading the
      stackElement. This looked like a use-after-free hazard to me as the stack
      is allocated in blocks and depletion of a block could mean it getting freed
      and possibly overwritten by zero or garbage, depending on the block
      allocator's behaviour.
    • Daniel Gröber (dxld)'s avatar
      rts: retainer: Pull retainer specific code into a callback · b03db9da
      Daniel Gröber (dxld) authored
      This essentially turns the heap traversal code into a visitor. You add a
      bunch of roots to the work-stack and then the callback you give to
      traverseWorkStack() will be called with every reachable closure at least
    • Daniel Gröber (dxld)'s avatar
    • Daniel Gröber (dxld)'s avatar
      rts: Generalise profiling heap traversal flip bit handling · 2f2f6dd5
      Daniel Gröber (dxld) authored
      This commit starts renaming some flip bit related functions for the
      generalised heap traversal code and adds provitions for sharing the
      per-closure profiling header field currently used exclusively for retainer
      profiling with other heap traversal profiling modes.
    • Daniel Gröber (dxld)'s avatar
    • Daniel Gröber (dxld)'s avatar
      rts: retainer: Generalise per-stackElement data · f79ac2ef
      Daniel Gröber (dxld) authored
      This essentially ammounts to s/retainer/stackData/, s/c_child_r/data/ and
      some temporary casting of c_child_r to stackData until refactoring of this
      module is completed by a subsequent commit. We also introduce a new union
      'stackData' which will contain the actual extra data to be stored on the
      The idea is to make the heap traversal logic of the retainer profiler ready
      for extraction into it's own module. So talking about "retainers" there
      doesn't really make sense anymore.
      Essentially the "retainers" we store in the stack are just data associated
      with the push()ed closures which we return when pop()ing it.
    • Daniel Gröber (dxld)'s avatar
      rts: retainer: Move info.next.parent to stackElement · 94ecdb4f
      Daniel Gröber (dxld) authored
      I don't see a point in having this live in 'info', just seems to make the
      code more complicated.
    • Daniel Gröber (dxld)'s avatar
      rts: retainer: Turn global traversal state into a struct · ead05f80
      Daniel Gröber (dxld) authored
      Global state is ugly and hard to test. Since the profiling code isn't quite
      as performance critical as, say, GC we should prefer better code here.
      I would like to move the 'flip' bit into the struct too but that's
      complicated by the fact that the defines which use it directly are also
      called from ProfHeap where the traversalState is not easily
      available. Maybe in a future commit.
    • Daniel Gröber (dxld)'s avatar
      rts: Fix outdated references to 'ldvTime' · 63023dc2
      Daniel Gröber (dxld) authored
      This got renamed to 'era' in dbef766c ("[project @ 2001-11-26 16:54:21 by
      simonmar] Profiling cleanup").
    • Daniel Gröber (dxld)'s avatar
      rts: retainer: Remove cStackSize debug counter · da12da79
      Daniel Gröber (dxld) authored
      This can only ever be one since 5f1d949a ("Remove explicit recursion in
      retainer profiling"), so it's pointless.
  2. 21 Sep, 2019 1 commit
  3. 17 Sep, 2019 2 commits
    • John Ericson's avatar
      Deduplicate `HaskellMachRegs.h` and `RtsMachRegs.h` headers · c77fc3b2
      John Ericson authored
      Until 0472f0f6 there was a meaningful
      host vs target distinction (though it wasn't used right, in genapply).
      After that, they did not differ in meaningful ways, so it's best to just
      only keep one.
    • Matthew Pickering's avatar
      eventlog: Add biographical and retainer profiling traces · ae4415b9
      Matthew Pickering authored
      This patch adds a new eventlog event which indicates the start of
      a biographical profiler sample. These are different to normal events as
      they also include the timestamp of when the census took place. This is
      because the LDV profiler only emits samples at the end of the run.
      Now all the different profiling modes emit consumable events to the
  4. 12 Sep, 2019 1 commit
  5. 09 Sep, 2019 1 commit
    • Sylvain Henry's avatar
      Module hierarchy: StgToCmm (#13009) · 447864a9
      Sylvain Henry authored
      Add StgToCmm module hierarchy. Platform modules that are used in several
      other places (NCG, LLVM codegen, Cmm transformations) are put into
  6. 01 Sep, 2019 1 commit
  7. 18 Aug, 2019 1 commit
  8. 10 Aug, 2019 1 commit
    • Joachim Breitner's avatar
      Consolidate `TablesNextToCode` and `GhcUnreigsterised` in configure (#15548) · 81860281
      Joachim Breitner authored
      `TablesNextToCode` is now a substituted by configure, where it has the
      correct defaults and error handling. Nowhere else needs to duplicate
      that, though we may want the compiler to to guard against bogus settings
      I renamed it from `GhcEnableTablesNextToCode` to `TablesNextToCode` to:
       - Help me guard against any unfixed usages
       - Remove any lingering connotation that this flag needs to be combined
         with `GhcUnreigsterised`.
      Original reviewers:
      Original subscribers: TerrorJack, rwbarton, carter
      Original Differential Revision: https://phabricator.haskell.org/D5082
  9. 03 Aug, 2019 1 commit
    • Ben Gamari's avatar
      rts: Always truncate output files · 4664bafc
      Ben Gamari authored
      Previously there were numerous places in the RTS where we would fopen
      with the "w" flag string. This is wrong as it will not truncate the
      file. Consequently if we write less data than the previous length of the
      file we will leave garbage at its end.
      Fixes #16993.
  10. 30 Jul, 2019 1 commit
    • Andreas Klebinger's avatar
      Expand the preallocated Int range to [-16,255] · 9c8a211a
      Andreas Klebinger authored
      Effects as I measured them:
      RTS Size: +0.1%
      Compile times: -0.5%
      Runtine nofib: -1.1%
      Nofib runtime result seems to mostly come from the `CS` benchmark
      which is very sensible to alignment changes so this is likely over
      However the compile time changes are realistic.
      This is related to #16961.
  11. 26 Jul, 2019 1 commit
  12. 16 Jul, 2019 2 commits
  13. 14 Jul, 2019 1 commit
    • John Ericson's avatar
      Expunge #ifdef and #ifndef from the codebase · d7c6c471
      John Ericson authored
      These are unexploded minds as far as the linter is concerned. I don't
      want to hit in my MRs by mistake!
      I did this with `sed`, and then rolled back some changes in the docs,
      config.guess, and the linter itself.
  14. 10 Jul, 2019 1 commit
    • John Ericson's avatar
      Remove most uses of TARGET platform macros · 0472f0f6
      John Ericson authored
      These prevent multi-target builds. They were gotten rid of in 3 ways:
      1. In the compiler itself, replacing `#if` with runtime `if`. In these
      cases, we care about the target platform still, but the target platform
      is dynamic so we must delay the elimination to run time.
      2. In the compiler itself, replacing `TARGET` with `HOST`. There was
      just one bit of this, in some code splitting strings representing lists
      of paths. These paths are used by GHC itself, and not by the compiled
      binary. (They are compiler lookup paths, rather than RPATHS or something
      that does matter to the compiled binary, and thus would legitamentally
      be target-sensative.) As such, the path-splitting method only depends on
      where GHC runs and not where code it produces runs. This should have
      been `HOST` all along.
      3. Changing the RTS. The RTS doesn't care about the target platform,
      full stop.
      4. `includes/stg/HaskellMachRegs.h` This file is also included in the
      genapply executable. This is tricky because the RTS's host platform
      really is that utility's target platform. so that utility really really
      isn't multi-target either. But at least it isn't an installed part of
      GHC, but just a one-off tool when building the RTS. Lying with the
      `HOST` to a one-off program (genapply) that isn't installed doesn't seem so bad.
      It's certainly better than the other way around of lying to the RTS
      though not to genapply. The RTS is more important, and it is installed,
      *and* this header is installed as part of the RTS.
  15. 05 Jul, 2019 2 commits
  16. 02 Jul, 2019 4 commits
  17. 28 Jun, 2019 3 commits
    • Ben Gamari's avatar
      rts: Assert that LDV profiling isn't used with parallel GC · bd660ede
      Ben Gamari authored
      I'm not entirely sure we are careful about ensuring this; this is a
      last-ditch check.
    • Travis Whitaker's avatar
      Correct closure observation, construction, and mutation on weak memory machines. · 11bac115
      Travis Whitaker authored
      Here the following changes are introduced:
          - A read barrier machine op is added to Cmm.
          - The order in which a closure's fields are read and written is changed.
          - Memory barriers are added to RTS code to ensure correctness on
            out-or-order machines with weak memory ordering.
      Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this
      is lowered to an instruction that ensures memory reads that occur after said
      instruction in program order are not performed before reads coming before said
      instruction in program order. On machines with strong memory ordering properties
      (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so
      MO_ReadBarrier is simply erased. However, such an instruction is necessary on
      weakly ordered machines, e.g. ARM and PowerPC.
      Weam memory ordering has consequences for how closures are observed and mutated.
      For example, consider a closure that needs to be updated to an indirection. In
      order for the indirection to be safe for concurrent observers to enter, said
      observers must read the indirection's info table before they read the
      indirectee. Furthermore, the entering observer makes assumptions about the
      closure based on its info table contents, e.g. an INFO_TYPE of IND imples the
      closure has an indirectee pointer that is safe to follow.
      When a closure is updated with an indirection, both its info table and its
      indirectee must be written. With weak memory ordering, these two writes can be
      arbitrarily reordered, and perhaps even interleaved with other threads' reads
      and writes (in the absence of memory barrier instructions). Consider this
      example of a bad reordering:
      - An updater writes to a closure's info table (INFO_TYPE is now IND).
      - A concurrent observer branches upon reading the closure's INFO_TYPE as IND.
      - A concurrent observer reads the closure's indirectee and enters it. (!!!)
      - An updater writes the closure's indirectee.
      Here the update to the indirectee comes too late and the concurrent observer has
      jumped off into the abyss. Speculative execution can also cause us issues,
      - An observer is about to case on a value in closure's info table.
      - The observer speculatively reads one or more of closure's fields.
      - An updater writes to closure's info table.
      - The observer takes a branch based on the new info table value, but with the
        old closure fields!
      - The updater writes to the closure's other fields, but its too late.
      Because of these effects, reads and writes to a closure's info table must be
      ordered carefully with respect to reads and writes to the closure's other
      fields, and memory barriers must be placed to ensure that reads and writes occur
      in program order. Specifically, updates to a closure must follow the following
      - Update the closure's (non-info table) fields.
      - Write barrier.
      - Update the closure's info table.
      Observing a closure's fields must follow the following pattern:
      - Read the closure's info pointer.
      - Read barrier.
      - Read the closure's (non-info table) fields.
      This patch updates RTS code to obey this pattern. This should fix long-standing
      SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting
      out-of-order execution) and PowerPC. This fixes issue #15449.
      Co-Authored-By: Ben Gamari's avatarBen Gamari <ben@well-typed.com>
    • Sylvain Henry's avatar
  18. 27 Jun, 2019 2 commits
    • Matthew Pickering's avatar
      rts: Do not traverse nursery for dead closures in LDV profile · 07cffc49
      Matthew Pickering authored
      It is important that `heapCensus` and `LdvCensusForDead` traverse the
      same areas.
      `heapCensus` increases the `not_used` counter which tracks how many
      closures are live but haven't been used yet.
      `LdvCensusForDead` increases the `void_total` counter which tracks how
      many dead closures there are.
      The `LAG` is then calculated by substracting the `void_total` from
      `not_used` and so it is essential that `not_used >= void_total`. This
      fact is checked by quite a few assertions.
      However, if a program has low maximum residency but allocates a lot in
      the nursery then these assertions were failing (see #16753 and #15903)
      because `LdvCensusForDead` was observing dead closures from the nursery
      which totalled more than the `not_used`. The same closures were not
      counted by `heapCensus`.
      Therefore, it seems that the correct fix is to make `LdvCensusForDead`
      agree with `heapCensus` and not traverse the nursery for dead closures.
      Fixes #16100 #16753 #15903 #8982
    • Matthew Pickering's avatar
      rts: Correct assertion in LDV_recordDead · ed4cbd93
      Matthew Pickering authored
      It is possible that void_total is exactly equal to not_used and the
      other assertions for this check for <= rather than <.