1. 21 Aug, 2018 1 commit
  2. 05 Sep, 2017 2 commits
  3. 11 Jul, 2017 1 commit
    • duog's avatar
      Fix Work Balance computation in RTS stats · 7c9e356d
      duog authored
      An additional stat is tracked per gc: par_balanced_copied This is the
      the number of bytes copied by each gc thread under the balanced lmit,
      which is simply (copied_bytes / num_gc_threads).  The stat is added to
      all the appropriate GC structures, so is visible in the eventlog and in
      GHC.Stats.
      
      A note is added explaining how work balance is computed.
      
      Remove some end of line whitespace
      
      Test Plan:
      ./validate
      experiment with the program attached to the ticket
      examine code changes carefully
      
      Reviewers: simonmar, austin, hvr, bgamari, erikd
      
      Reviewed By: simonmar
      
      Subscribers: Phyx, rwbarton, thomie
      
      GHC Trac Issues: #13830
      
      Differential Revision: https://phabricator.haskell.org/D3658
      7c9e356d
  4. 19 Jun, 2017 1 commit
  5. 29 Apr, 2017 1 commit
  6. 31 Jan, 2017 1 commit
    • alexbiehl's avatar
      Abstract over the way eventlogs are flushed · 4dfc6d1c
      alexbiehl authored
      Currently eventlog data is always written to a file `progname.eventlog`.
      This patch introduces the `flushEventLog` field in `RtsConfig` which
      allows to customize the writing of eventlog data.
      
      One possible scenario is the ongoing live-profile-monitor effort by
      @NCrashed which slurps all eventlog data through `fluchEventLog`.
      
      `flushEventLog` takes a buffer with eventlog data and its size and
      returns `false` (0) in case eventlog data could not be procesed.
      
      Reviewers: simonmar, austin, erikd, bgamari
      
      Reviewed By: simonmar, bgamari
      
      Subscribers: qnikst, thomie, NCrashed
      
      Differential Revision: https://phabricator.haskell.org/D2934
      4dfc6d1c
  7. 16 Jul, 2016 1 commit
  8. 10 Jun, 2016 1 commit
    • Simon Marlow's avatar
      NUMA support · 9e5ea67e
      Simon Marlow authored
      Summary:
      The aim here is to reduce the number of remote memory accesses on
      systems with a NUMA memory architecture, typically multi-socket servers.
      
      Linux provides a NUMA API for doing two things:
      * Allocating memory local to a particular node
      * Binding a thread to a particular node
      
      When given the +RTS --numa flag, the runtime will
      * Determine the number of NUMA nodes (N) by querying the OS
      * Assign capabilities to nodes, so cap C is on node C%N
      * Bind worker threads on a capability to the correct node
      * Keep a separate free lists in the block layer for each node
      * Allocate the nursery for a capability from node-local memory
      * Allocate blocks in the GC from node-local memory
      
      For example, using nofib/parallel/queens on a 24-core 2-socket machine:
      
      ```
      $ ./Main 15 +RTS -N24 -s -A64m
        Total   time  173.960s  (  7.467s elapsed)
      
      $ ./Main 15 +RTS -N24 -s -A64m --numa
        Total   time  150.836s  (  6.423s elapsed)
      ```
      
      The biggest win here is expected to be allocating from node-local
      memory, so that means programs using a large -A value (as here).
      
      According to perf, on this program the number of remote memory accesses
      were reduced by more than 50% by using `--numa`.
      
      Test Plan:
      * validate
      * There's a new flag --debug-numa=<n> that pretends to do NUMA without
        actually making the OS calls, which is useful for testing the code
        on non-NUMA systems.
      * TODO: I need to add some unit tests
      
      Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2199
      9e5ea67e
  9. 04 May, 2016 1 commit
  10. 01 Nov, 2015 1 commit
    • Ben Gamari's avatar
      EventLog: Loop fwrite if necessary during flush · f46f32b9
      Ben Gamari authored
      Previously the eventlog flush code would fail if `fwrite` wrote less
      than the requested amount. Like all Unix stream I/O operations, however,
      `fwrite` isn't guaranteed to write the entire buffer. Here we loop as
      long as `fwrite` succeeds in writing anything.
      
      Fixes #11041.
      
      Test Plan: Validate with eventlog
      
      Reviewers: austin, simonmar
      
      Reviewed By: simonmar
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D1415
      
      GHC Trac Issues: #11041
      f46f32b9
  11. 05 Sep, 2015 2 commits
  12. 17 Feb, 2015 1 commit
    • thomie's avatar
      Don't truncate traceEvents to 512 bytes (#8309) · a82364c9
      thomie authored
      Summary:
      Don't call postLogMsg to post a user msg, because it truncates messages to
      512 bytes.
      
      Rename traceCap_stderr and trace_stderr to vtraceCap_stderr and trace_stderr,
      to signal that they take a va_list (similar to vdebugBelch vs debugBelch).
      
      See #3874 for the original reason behind traceFormatUserMsg.
      
      See the commit msg in #9395 (d360d440) for a discussion about using
      null-terminated strings vs strings with an explicit length.
      
      Test Plan:
      Run `cabal install ghc-events` and inspect the result of `ghc-events show` on
      an eventlog file created with `ghc -eventlog Test.hs` and `./Test +RTS -l`,
      where Test.hs contains:
      
      ```
      import Debug.Trace
      main = traceEvent (replicate 510 'a' ++ "bcd") $ return ()
      ```
      
      Depends on D655.
      
      Reviewers: austin
      
      Reviewed By: austin
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D656
      
      GHC Trac Issues: #8309
      a82364c9
  13. 18 Jan, 2015 1 commit
  14. 29 Sep, 2014 2 commits
  15. 28 Jul, 2014 2 commits
  16. 10 Jul, 2014 1 commit
  17. 27 Mar, 2014 1 commit
  18. 15 Oct, 2012 1 commit
    • Duncan Coutts's avatar
      Add a new traceMarker# primop for use in profiling output · a609027d
      Duncan Coutts authored
      In time-based profiling visualisations (e.g. heap profiles and ThreadScope)
      it would be useful to be able to mark particular points in the execution and
      have those points in time marked in the visualisation.
      
      The traceMarker# primop currently emits an event into the eventlog. In
      principle it could be extended to do something in the heap profiling too.
      a609027d
  19. 07 Sep, 2012 1 commit
    • Simon Marlow's avatar
      Deprecate lnat, and use StgWord instead · 41737f12
      Simon Marlow authored
      lnat was originally "long unsigned int" but we were using it when we
      wanted a 64-bit type on a 64-bit machine.  This broke on Windows x64,
      where long == int == 32 bits.  Using types of unspecified size is bad,
      but what we really wanted was a type with N bits on an N-bit machine.
      StgWord is exactly that.
      
      lnat was mentioned in some APIs that clients might be using
      (e.g. StackOverflowHook()), so we leave it defined but with a comment
      to say that it's deprecated.
      41737f12
  20. 10 Jul, 2012 1 commit
    • Duncan Coutts's avatar
      Define the task-tracking events · 54c98b68
      Duncan Coutts authored
      Based on initial patches by Mikolaj Konarski <mikolaj@well-typed.com>
      
      These new eventlog events are to let profiling tools keep track of all
      the OS threads that belong to an RTS capability at any moment in time.
      In the RTS, OS threads correspond to the Task abstraction, so that is
      what we track. There are events for tasks being created, migrated
      between capabilities and deleted. In particular the task creation event
      also records the kernel thread id which lets us match up the OS thread
      with data collected by others tools (in the initial use case with
      Linux's perf tool, but in principle also with DTrace).
      54c98b68
  21. 24 Apr, 2012 1 commit
  22. 04 Apr, 2012 5 commits
    • Mikolaj Konarski's avatar
      Add the GC_GLOBAL_SYNC event marking that all caps are stopped for GC · c294d95d
      Mikolaj Konarski authored
      Quoting design rationale by dcoutts: The event indicates that we're doing
      a stop-the-world GC and all other HECs should be between their GC_START
      and GC_END events at that moment. We don't want to use GC_STATS_GHC
      for that, because GC_STATS_GHC is for extra GHC-specific info,
      not something we have to rely on to be able to match the GC pauses
      across HECs to a particular global GC.
      c294d95d
    • Mikolaj Konarski's avatar
      Fix the timestamps in GC_START and GC_END events on the GC-initiating cap · 598109eb
      Mikolaj Konarski authored
      There was a discrepancy between GC times reported in +RTS -s
      and the timestamps of GC_START and GC_END events on the cap,
      on which +RTS -s stats for the given GC are based.
      This is fixed by posting the events with exactly the same timestamp
      as generated for the stat calculation. The calls posting the events
      are moved too, so that the events are emitted close to the time instant
      they claim to be emitted at. The GC_STATS_GHC was moved, too, ensuring
      it's emitted before the moved GC_END on all caps, which simplifies tools code.
      598109eb
    • Duncan Coutts's avatar
      Adjust the eventlog description header for the spark counter event · a3cdefd2
      Duncan Coutts authored
      The EventLogFormat.h described the spark counter fields in a different
      order to that which ghc emits (the GC'd and fizzled fields were
      reversed). At this stage it is easier to fix the ghc-events lib and
      to have ghc continue to emit them in the current order.
      a3cdefd2
    • Duncan Coutts's avatar
      Add new eventlog events for various heap and GC statistics · 65aaa9b2
      Duncan Coutts authored
      They cover much the same info as is available via the GHC.Stats module
      or via the '+RTS -s' textual output, but via the eventlog and with a
      better sampling frequency.
      
      We have three new generic heap info events and two very GHC-specific
      ones. (The hope is the general ones are usable by other implementations
      that use the same eventlog system, or indeed not so sensitive to changes
      in GHC itself.)
      
      The general ones are:
      
       * total heap mem allocated since prog start, on a per-HEC basis
       * current size of the heap (MBlocks reserved from OS for the heap)
       * current size of live data in the heap
      
      Currently these are all emitted by GHC at GC time (live data only at
      major GC).
      
      The GHC specific ones are:
      
       * an event giving various static heap paramaters:
         * number of generations (usually 2)
         * max size if any
         * nursary size
         * MBlock and block sizes
       * a event emitted on each GC containing:
         * GC generation (usually just 0,1)
         * total bytes copied
         * bytes lost to heap slop and fragmentation
         * the number of threads in the parallel GC (1 for serial)
         * the maximum number of bytes copied by any par GC thread
         * the total number of bytes copied by all par GC threads
           (these last three can be used to calculate an estimate of the
            work balance in parallel GCs)
      65aaa9b2
    • Duncan Coutts's avatar
      Add eventlog/trace stuff for capabilities: create/delete/enable/disable · f9c2e854
      Duncan Coutts authored
      Now that we can adjust the number of capabilities on the fly, we need
      this reflected in the eventlog. Previously the eventlog had a single
      startup event that declared a static number of capabilities. Obviously
      that's no good anymore.
      
      For compatability we're keeping the EVENT_STARTUP but adding new
      EVENT_CAP_CREATE/DELETE. The EVENT_CAP_DELETE is actually just the old
      EVENT_SHUTDOWN but renamed and extended (using the existing mechanism
      to extend eventlog events in a compatible way). So we now emit both
      EVENT_STARTUP and EVENT_CAP_CREATE. One day we will drop EVENT_STARTUP.
      
      Since reducing the number of capabilities at runtime does not really
      delete them, it just disables them, then we also have new events for
      disable/enable.
      
      The old EVENT_SHUTDOWN was in the scheduler class of events. The new
      EVENT_CAP_* events are in the unconditional class, along with the
      EVENT_CAPSET_* ones. Knowing when capabilities are created and deleted
      is crucial to making sense of eventlogs, you always want those events.
      In any case, they're extremely low volume.
      f9c2e854
  23. 06 Dec, 2011 1 commit
  24. 25 Nov, 2011 1 commit
    • Simon Marlow's avatar
      Time handling overhaul · 6b109851
      Simon Marlow authored
      Terminology cleanup: the type "Ticks" has been renamed "Time", which
      is an StgWord64 in units of TIME_RESOLUTION (currently nanoseconds).
      The terminology "tick" is now used consistently to mean the interval
      between timer signals.
      
      The ticker now always ticks in realtime (actually CLOCK_MONOTONIC if
      we have it).  Before it used CPU time in the non-threaded RTS and
      realtime in the threaded RTS, but I've discovered that the CPU timer
      has terrible resolution (at least on Linux) and isn't much use for
      profiling.  So now we always use realtime.  This should also fix
      
      The default tick interval is now 10ms, except when profiling where we
      drop it to 1ms.  This gives more accurate profiles without affecting
      runtime too much (<1%).
      
      Lots of cleanups - the resolution of Time is now in one place
      only (Rts.h) rather than having calculations that depend on the
      resolution scattered all over the RTS.  I hope I found them all.
      6b109851
  25. 22 Nov, 2011 1 commit
  26. 04 Nov, 2011 1 commit
    • Duncan Coutts's avatar
      Add eventlog event for thread labels · c739d845
      Duncan Coutts authored
      The existing GHC.Conc.labelThread will now also emit the the thread
      label into the eventlog. Profiling tools like ThreadScope could then
      use the thread labels rather than thread numbers.
      c739d845
  27. 26 Oct, 2011 2 commits
    • Duncan Coutts's avatar
      Add new eventlog EVENT_WALL_CLOCK_TIME for time matching · 4856d15a
      Duncan Coutts authored
      Eventlog timestamps are elapsed times (in nanoseconds) relative to the
      process start. To be able to merge eventlogs from multiple processes we
      need to be able to align their timelines. If they share a clock domain
      (or a user judges that their clocks are sufficiently closely
      synchronised) then it is sufficient to know how the eventlog timestamps
      match up with the clock.
      
      The EVENT_WALL_CLOCK_TIME contains the clock time with (up to)
      nanosecond precision. It is otherwise an ordinary event and so contains
      the usual timestamp for the same moment in time. It therefore enables
      us to match up all the eventlog timestamps with clock time.
      4856d15a
    • Duncan Coutts's avatar
      f9c21573
  28. 18 Jul, 2011 4 commits
    • Duncan Coutts's avatar
      Sync EventLogFormat.h with ghc-events · a5c7f52e
      Duncan Coutts authored
      a5c7f52e
    • Duncan Coutts's avatar
      Add new fully-accurate per-spark trace/eventlog events · 084b64f2
      Duncan Coutts authored
      Replaces the existing EVENT_RUN/STEAL_SPARK events with 7 new events
      covering all stages of the spark lifcycle:
        create, dud, overflow, run, steal, fizzle, gc
      
      The sampled spark events are still available. There are now two event
      classes for sparks, the sampled and the fully accurate. They can be
      enabled/disabled independently. By default +RTS -l includes the sampled
      but not full detail spark events. Use +RTS -lf-p to enable the detailed
      'f' and disable the sampled 'p' spark.
      
      Includes work by Mikolaj <mikolaj.konarski@gmail.com>
      084b64f2
    • Duncan Coutts's avatar
      Move GC tracing into a separate trace class · 46b70749
      Duncan Coutts authored
      Previously GC was included in the scheduler trace class. It can be
      enabled specifically with +RTS -vg or -lg, though note that both -v
      and -l on their own now default to a sensible set of trace classes,
      currently: scheduler, gc and sparks.
      46b70749
    • Duncan Coutts's avatar
      Add spark counter tracing · d77df1ca
      Duncan Coutts authored
      A new eventlog event containing 7 spark counters/statistics: sparks
      created, dud, overflowed, converted, GC'd, fizzled and remaining.
      These are maintained and logged separately for each capability.
      We log them at startup, on each GC (minor and major) and on shutdown.
      d77df1ca