1. 22 Sep, 2019 2 commits
  2. 12 Jan, 2019 1 commit
    • Ömer Sinan Ağacan's avatar
      Fix negative mutator time in GC stats in prof builds · 19670bc3
      Ömer Sinan Ağacan authored
      Because garbage collector calls `retainerProfile()` and `heapCensus()`,
      GC times normally include some of PROF times too. To fix this we have
      these lines:
      
          // heapCensus() is called by the GC, so RP and HC time are
          // included in the GC stats.  We therefore subtract them to
          // obtain the actual GC cpu time.
          stats.gc_cpu_ns      -=  prof_cpu;
          stats.gc_elapsed_ns  -=  prof_elapsed;
      
      These variables are later used for calculating GC time excluding the
      final GC (which should be attributed to EXIT).
      
          exit_gc_elapsed      = stats.gc_elapsed_ns - start_exit_gc_elapsed;
      
      The problem is if we subtract PROF times from `gc_elapsed_ns` and then
      subtract `start_exit_gc_elapsed` from the result, we end up subtracting
      PROF times twice, because `start_exit_gc_elapsed` also includes PROF
      times.
      
      We now subtract PROF times from GC after the calculations for EXIT and
      MUT times. The existing assertion that checks
      
          INIT + MUT + GC + EXIT = TOTAL
      
      now holds. When we subtract PROF numbers from GC, and a new assertion
      
          INIT + MUT + GC + PROF + EXIT = TOTAL
      
      also holds.
      
      Fixes #15897. New assertions added in this commit also revealed #16102,
      which is also fixed by this commit.
      19670bc3
  3. 27 Mar, 2018 1 commit
  4. 26 Mar, 2018 1 commit
    • duog's avatar
      rts, base: Refactor stats.c to improve --machine-readable report · f0b258bc
      duog authored
      There should be no change in the output of the '+RTS -s' (summary)
      report, or
      the 'RTS -t' (one-line) report.
      
      All data shown in the summary report is now shown in the machine
      readable
      report.
      
      All data in RTSStats is now shown in the machine readable report.
      
      init times are added to RTSStats and added to GHC.Stats.
      
      Example of the new output:
      ```
       [("bytes allocated", "375016384")
       ,("num_GCs", "113")
       ,("average_bytes_used", "148348")
       ,("max_bytes_used", "206552")
       ,("num_byte_usage_samples", "2")
       ,("peak_megabytes_allocated", "6")
       ,("init_cpu_seconds", "0.001642")
       ,("init_wall_seconds", "0.001027")
       ,("mut_cpu_seconds", "3.020166")
       ,("mut_wall_seconds", "0.757244")
       ,("GC_cpu_seconds", "0.037750")
       ,("GC_wall_seconds", "0.009569")
       ,("exit_cpu_seconds", "0.000890")
       ,("exit_wall_seconds", "0.002551")
       ,("total_cpu_seconds", "3.060452")
       ,("total_wall_seconds", "0.770395")
       ,("major_gcs", "2")
       ,("allocated_bytes", "375016384")
       ,("max_live_bytes", "206552")
       ,("max_large_objects_bytes", "159344")
       ,("max_compact_bytes", "0")
       ,("max_slop_bytes", "59688")
       ,("max_mem_in_use_bytes", "6291456")
       ,("cumulative_live_bytes", "296696")
       ,("copied_bytes", "541024")
       ,("par_copied_bytes", "493976")
       ,("cumulative_par_max_copied_bytes", "104104")
       ,("cumulative_par_balanced_copied_bytes", "274456")
       ,("fragmentation_bytes", "2112")
       ,("alloc_rate", "124170795")
       ,("productivity_cpu_percent", "0.986838")
       ,("productivity_wall_percent", "0.982935")
       ,("bound_task_count", "1")
       ,("sparks_count", "5836258")
       ,("sparks_converted", "237")
       ,("sparks_overflowed", "1990408")
       ,("sparks_dud ", "0")
       ,("sparks_gcd", "3455553")
       ,("sparks_fizzled", "390060")
       ,("work_balance", "0.555606")
       ,("n_capabilities", "4")
       ,("task_count", "10")
       ,("peak_worker_count", "9")
       ,("worker_count", "9")
       ,("gc_alloc_block_sync_spin", "162")
       ,("gc_alloc_block_sync_yield", "0")
       ,("gc_alloc_block_sync_spin", "162")
       ,("gc_spin_spin", "18840855")
       ,("gc_spin_yield", "10355")
       ,("mut_spin_spin", "70331392")
       ,("mut_spin_yield", "61700")
       ,("waitForGcThreads_spin", "241")
       ,("waitForGcThreads_yield", "2797")
       ,("whitehole_gc_spin", "0")
       ,("whitehole_lockClosure_spin", "0")
       ,("whitehole_lockClosure_yield", "0")
       ,("whitehole_executeMessage_spin", "0")
       ,("whitehole_threadPaused_spin", "0")
       ,("any_work", "1667")
       ,("no_work", "1662")
       ,("scav_find_work", "1026")
       ,("gen_0_collections", "111")
       ,("gen_0_par_collections", "111")
       ,("gen_0_cpu_seconds", "0.036126")
       ,("gen_0_wall_seconds", "0.036126")
       ,("gen_0_max_pause_seconds", "0.036126")
       ,("gen_0_avg_pause_seconds", "0.000081")
       ,("gen_0_sync_spin", "21")
       ,("gen_0_sync_yield", "0")
       ,("gen_1_collections", "2")
       ,("gen_1_par_collections", "1")
       ,("gen_1_cpu_seconds", "0.001624")
       ,("gen_1_wall_seconds", "0.001624")
       ,("gen_1_max_pause_seconds", "0.001624")
       ,("gen_1_avg_pause_seconds", "0.000272")
       ,("gen_1_sync_spin", "3")
       ,("gen_1_sync_yield", "0")
       ]
      ```
      
      Test Plan: Ensure that one-line and summary reports are unchanged.
      
      Reviewers: erikd, simonmar, hvr
      
      Subscribers: duog, carter, thomie, rwbarton
      
      GHC Trac Issues: #14660
      
      Differential Revision: https://phabricator.haskell.org/D4529
      f0b258bc
  5. 20 Mar, 2018 1 commit
  6. 19 Mar, 2018 2 commits
    • duog's avatar
      rts, base: Refactor stats.c to improve --machine-readable report · 2d4bda2e
      duog authored
      There should be no change in the output of the '+RTS -s' (summary)
      report, or the 'RTS -t' (one-line) report.
      
      All data shown in the summary report is now shown in the machine
      readable report.
      
      All data in RTSStats is now shown in the machine readable report.
      
      init times are added to RTSStats and added to GHC.Stats.
      
      Example of the new output:
      ```
       [("bytes allocated", "375016384")
       ,("num_GCs", "113")
       ,("average_bytes_used", "148348")
       ,("max_bytes_used", "206552")
       ,("num_byte_usage_samples", "2")
       ,("peak_megabytes_allocated", "6")
       ,("init_cpu_seconds", "0.001642")
       ,("init_wall_seconds", "0.001027")
       ,("mut_cpu_seconds", "3.020166")
       ,("mut_wall_seconds", "0.757244")
       ,("GC_cpu_seconds", "0.037750")
       ,("GC_wall_seconds", "0.009569")
       ,("exit_cpu_seconds", "0.000890")
       ,("exit_wall_seconds", "0.002551")
       ,("total_cpu_seconds", "3.060452")
       ,("total_wall_seconds", "0.770395")
       ,("major_gcs", "2")
       ,("allocated_bytes", "375016384")
       ,("max_live_bytes", "206552")
       ,("max_large_objects_bytes", "159344")
       ,("max_compact_bytes", "0")
       ,("max_slop_bytes", "59688")
       ,("max_mem_in_use_bytes", "6291456")
       ,("cumulative_live_bytes", "296696")
       ,("copied_bytes", "541024")
       ,("par_copied_bytes", "493976")
       ,("cumulative_par_max_copied_bytes", "104104")
       ,("cumulative_par_balanced_copied_bytes", "274456")
       ,("fragmentation_bytes", "2112")
       ,("alloc_rate", "124170795")
       ,("productivity_cpu_percent", "0.986838")
       ,("productivity_wall_percent", "0.982935")
       ,("bound_task_count", "1")
       ,("sparks_count", "5836258")
       ,("sparks_converted", "237")
       ,("sparks_overflowed", "1990408")
       ,("sparks_dud ", "0")
       ,("sparks_gcd", "3455553")
       ,("sparks_fizzled", "390060")
       ,("work_balance", "0.555606")
       ,("n_capabilities", "4")
       ,("task_count", "10")
       ,("peak_worker_count", "9")
       ,("worker_count", "9")
       ,("gc_alloc_block_sync_spin", "162")
       ,("gc_alloc_block_sync_yield", "0")
       ,("gc_alloc_block_sync_spin", "162")
       ,("gc_spin_spin", "18840855")
       ,("gc_spin_yield", "10355")
       ,("mut_spin_spin", "70331392")
       ,("mut_spin_yield", "61700")
       ,("waitForGcThreads_spin", "241")
       ,("waitForGcThreads_yield", "2797")
       ,("whitehole_gc_spin", "0")
       ,("whitehole_lockClosure_spin", "0")
       ,("whitehole_lockClosure_yield", "0")
       ,("whitehole_executeMessage_spin", "0")
       ,("whitehole_threadPaused_spin", "0")
       ,("any_work", "1667")
       ,("no_work", "1662")
       ,("scav_find_work", "1026")
       ,("gen_0_collections", "111")
       ,("gen_0_par_collections", "111")
       ,("gen_0_cpu_seconds", "0.036126")
       ,("gen_0_wall_seconds", "0.036126")
       ,("gen_0_max_pause_seconds", "0.036126")
       ,("gen_0_avg_pause_seconds", "0.000081")
       ,("gen_0_sync_spin", "21")
       ,("gen_0_sync_yield", "0")
       ,("gen_1_collections", "2")
       ,("gen_1_par_collections", "1")
       ,("gen_1_cpu_seconds", "0.001624")
       ,("gen_1_wall_seconds", "0.001624")
       ,("gen_1_max_pause_seconds", "0.001624")
       ,("gen_1_avg_pause_seconds", "0.000272")
       ,("gen_1_sync_spin", "3")
       ,("gen_1_sync_yield", "0")
       ]
      ```
      
      Test Plan: Ensure that one-line and summary reports are unchanged.
      
      Reviewers: bgamari, erikd, simonmar, hvr
      
      Reviewed By: simonmar
      
      Subscribers: rwbarton, thomie, carter
      
      GHC Trac Issues: #14660
      
      Differential Revision: https://phabricator.haskell.org/D4303
      2d4bda2e
    • duog's avatar
      rts: Add --internal-counters RTS flag and several counters · 2918abf7
      duog authored
      The existing internal counters:
      * gc_alloc_block_sync
      * whitehole_spin
      * gen[g].sync
      * gen[1].sync
      
      are now not shown in the -s report unless --internal-counters is also passed.
      
      If --internal-counters is passed we now show the counters above, reformatted, as
      well as several other counters. In particular, we now count the yieldThread()
      calls that SpinLocks do as well as their spins.
      
      The added counters are:
      * gc_spin (spin and yield)
      * mut_spin (spin and yield)
      * whitehole_threadPaused (spin only)
      * whitehole_executeMessage (spin only)
      * whitehole_lockClosure (spin only)
      * waitForGcThreadsd (spin and yield)
      
      As well as the following, which are not SpinLock-like things:
      * any_work
      * do_work
      * scav_find_work
      
      See the Note for descriptions of what these counters are.
      
      We add busy_wait_nops in these loops along with the counter increment where it
      was absent.
      
      Old internal counters output:
      ```
      gc_alloc_block_sync: 0
      whitehole_gc_spin: 0
      gen[0].sync: 0
      gen[1].sync: 0
      ```
      
      New internal counters output:
      ```
      Internal Counters:
                                                 Spins        Yields
          gc_alloc_block_sync                      323             0
          gc_spin                              9016713           752
          mut_spin                            57360944         47716
          whitehole_gc                               0           n/a
          whitehole_threadPaused                     0           n/a
          whitehole_executeMessage                   0           n/a
          whitehole_lockClosure                      0             0
          waitForGcThreads                           2           415
          gen[0].sync                                6             0
          gen[1].sync                                1             0
      
          any_work                                2017
          no_work                                 2014
          scav_find_work                          1004
      ```
      
      Test Plan:
      ./validate
      
      Check it builds with #define PROF_SPIN removed from includes/rts/Config.h
      
      Reviewers: bgamari, erikd, simonmar, hvr
      
      Reviewed By: simonmar
      
      Subscribers: rwbarton, thomie, carter
      
      GHC Trac Issues: #3553, #9221
      
      Differential Revision: https://phabricator.haskell.org/D4302
      2918abf7
  7. 11 Jul, 2017 1 commit
    • duog's avatar
      Fix Work Balance computation in RTS stats · 7c9e356d
      duog authored
      An additional stat is tracked per gc: par_balanced_copied This is the
      the number of bytes copied by each gc thread under the balanced lmit,
      which is simply (copied_bytes / num_gc_threads).  The stat is added to
      all the appropriate GC structures, so is visible in the eventlog and in
      GHC.Stats.
      
      A note is added explaining how work balance is computed.
      
      Remove some end of line whitespace
      
      Test Plan:
      ./validate
      experiment with the program attached to the ticket
      examine code changes carefully
      
      Reviewers: simonmar, austin, hvr, bgamari, erikd
      
      Reviewed By: simonmar
      
      Subscribers: Phyx, rwbarton, thomie
      
      GHC Trac Issues: #13830
      
      Differential Revision: https://phabricator.haskell.org/D3658
      7c9e356d
  8. 29 Apr, 2017 1 commit
  9. 23 Apr, 2017 1 commit
  10. 06 Dec, 2016 1 commit
    • Simon Marlow's avatar
      Overhaul GC stats · 24e6594c
      Simon Marlow authored
      Summary:
      Visible API changes:
      
      * The C struct `GCDetails` gives the stats about a single GC.  This is
        passed to the `gcDone()` callback if one is set via the
        RtsConfig. (previously we just passed a collection of values, so this
        is more extensible, at the expense of breaking the existing API)
      
      * `RTSStats` gives cumulative stats since the start of the program,
        and includes the `GCDetails` for the most recent GC.  This struct
        can be obtained via `getRTSStats()` (the old `getGCStats()` has been
        removed, and `getGCStatsEnabled()` has been renamed to
        `getRTSStatsEnabled()`)
      
      Improvements:
      
      * The per-GC stats and cumulative stats are now cleanly separated.
      
      * Inside the RTS we have a top-level `RTSStats` struct to keep all our
        stats in, previously this was just a collection of strangely-named
        variables.  This struct is mostly just copied in `getRTSStats()`, so
        the implementation of that function is a lot shorter.
      
      * Types are more consistent.  We use a uint64_t byte count for all
        memory values, and Time for all time values.
      
      * Names are more consistent.  We use a suffix `_bytes` for all byte
        counts and `_ns` for all time values.
      
      * We now collect information about the amount of memory in large
        objects and compact objects in `GCDetails`. (the latter was the reason
        I started doing this patch but it seems to have ballooned a bit!)
      
      * I fixed a bug in the calculation of the elapsed MUT time, and added
        an ASSERT to stop the calculations going wrong in the future.
      
      For now I kept the Haskell API in `GHC.Stats` the same, by
      impedence-matching with the new API.  We could either break that API
      and make it match the C API more closely, or we could add a new API
      and deprecate the old one.  Opinions welcome.
      
      This stuff is very easy to get wrong, and it's hard to test.  Reviews
      welcome!
      
      Test Plan:
      manual testing
      validate
      
      Reviewers: bgamari, niteria, austin, ezyang, hvr, erikd, rwbarton, Phyx
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2756
      24e6594c
  11. 04 May, 2016 1 commit
  12. 07 Feb, 2016 1 commit
  13. 18 Nov, 2015 1 commit
  14. 07 Apr, 2015 1 commit
  15. 29 Sep, 2014 1 commit
  16. 28 Jul, 2014 1 commit
  17. 10 Jul, 2014 1 commit
  18. 14 Feb, 2013 1 commit
    • Simon Marlow's avatar
      Simplify the allocation stats accounting · 65a0e1eb
      Simon Marlow authored
      We were doing it in two different ways and asserting that the results
      were the same.  In most cases they were, but I found one case where
      they weren't: the GC itself allocates some memory for running
      finalizers, and this memory was accounted for one way but not the
      other.
      
      It was simpler to remove the old way of counting allocation that to
      try to fix it up, so I did that.
      65a0e1eb
  19. 17 Jan, 2013 1 commit
  20. 07 Sep, 2012 1 commit
    • Simon Marlow's avatar
      Deprecate lnat, and use StgWord instead · 41737f12
      Simon Marlow authored
      lnat was originally "long unsigned int" but we were using it when we
      wanted a 64-bit type on a 64-bit machine.  This broke on Windows x64,
      where long == int == 32 bits.  Using types of unspecified size is bad,
      but what we really wanted was a type with N bits on an N-bit machine.
      StgWord is exactly that.
      
      lnat was mentioned in some APIs that clients might be using
      (e.g. StackOverflowHook()), so we leave it defined but with a comment
      to say that it's deprecated.
      41737f12
  21. 26 Apr, 2012 2 commits
    • Ian Lynagh's avatar
      OS X build fixes · 42760bd9
      Ian Lynagh authored
      OS X doesn't understand 'gnu_printf', so we need to onyl use it
      conditionally.
      42760bd9
    • Ian Lynagh's avatar
      Fix warnings on Win64 · 1dbe6d59
      Ian Lynagh authored
      Mostly this meant getting pointer<->int conversions to use the right
      sizes. lnat is now size_t, rather than unsigned long, as that seems a
      better match for how it's used.
      1dbe6d59
  22. 04 Apr, 2012 2 commits
    • Mikolaj Konarski's avatar
      Fix the timestamps in GC_START and GC_END events on the GC-initiating cap · 598109eb
      Mikolaj Konarski authored
      There was a discrepancy between GC times reported in +RTS -s
      and the timestamps of GC_START and GC_END events on the cap,
      on which +RTS -s stats for the given GC are based.
      This is fixed by posting the events with exactly the same timestamp
      as generated for the stat calculation. The calls posting the events
      are moved too, so that the events are emitted close to the time instant
      they claim to be emitted at. The GC_STATS_GHC was moved, too, ensuring
      it's emitted before the moved GC_END on all caps, which simplifies tools code.
      598109eb
    • Duncan Coutts's avatar
      Add new eventlog events for various heap and GC statistics · 65aaa9b2
      Duncan Coutts authored
      They cover much the same info as is available via the GHC.Stats module
      or via the '+RTS -s' textual output, but via the eventlog and with a
      better sampling frequency.
      
      We have three new generic heap info events and two very GHC-specific
      ones. (The hope is the general ones are usable by other implementations
      that use the same eventlog system, or indeed not so sensitive to changes
      in GHC itself.)
      
      The general ones are:
      
       * total heap mem allocated since prog start, on a per-HEC basis
       * current size of the heap (MBlocks reserved from OS for the heap)
       * current size of live data in the heap
      
      Currently these are all emitted by GHC at GC time (live data only at
      major GC).
      
      The GHC specific ones are:
      
       * an event giving various static heap paramaters:
         * number of generations (usually 2)
         * max size if any
         * nursary size
         * MBlock and block sizes
       * a event emitted on each GC containing:
         * GC generation (usually just 0,1)
         * total bytes copied
         * bytes lost to heap slop and fragmentation
         * the number of threads in the parallel GC (1 for serial)
         * the maximum number of bytes copied by any par GC thread
         * the total number of bytes copied by all par GC threads
           (these last three can be used to calculate an estimate of the
            work balance in parallel GCs)
      65aaa9b2
  23. 25 Nov, 2011 1 commit
    • Simon Marlow's avatar
      Time handling overhaul · 6b109851
      Simon Marlow authored
      Terminology cleanup: the type "Ticks" has been renamed "Time", which
      is an StgWord64 in units of TIME_RESOLUTION (currently nanoseconds).
      The terminology "tick" is now used consistently to mean the interval
      between timer signals.
      
      The ticker now always ticks in realtime (actually CLOCK_MONOTONIC if
      we have it).  Before it used CPU time in the non-threaded RTS and
      realtime in the threaded RTS, but I've discovered that the CPU timer
      has terrible resolution (at least on Linux) and isn't much use for
      profiling.  So now we always use realtime.  This should also fix
      
      The default tick interval is now 10ms, except when profiling where we
      drop it to 1ms.  This gives more accurate profiles without affecting
      runtime too much (<1%).
      
      Lots of cleanups - the resolution of Time is now in one place
      only (Rts.h) rather than having calculations that depend on the
      resolution scattered all over the RTS.  I hope I found them all.
      6b109851
  24. 23 Jul, 2011 2 commits
    • Ian Lynagh's avatar
      Remove prototype to mut_user_time_during_GC · 5c6e293d
      Ian Lynagh authored
      The function no longer exists.
      5c6e293d
    • Ian Lynagh's avatar
      Fix heap profiling times · 4f8cfaf9
      Ian Lynagh authored
      Now that the heap census runs in the middle of garbage collections,
      the "CPU time" it was calculating included any CPU time used so far
      in the current GC. This could cause CPU time to appear to go down,
      which means hp2ps complained about "samples out of sequence".
      
      I'm not sure if this is the nicest way to solve this (maybe resurrecting
      mut_user_time_during_GC would be better?) but it gets things working again.
      4f8cfaf9
  25. 11 Apr, 2011 1 commit
    • Simon Marlow's avatar
      Refactoring and tidy up · 1fb38442
      Simon Marlow authored
      This is a port of some of the changes from my private local-GC branch
      (which is still in darcs, I haven't converted it to git yet).  There
      are a couple of small functional differences in the GC stats: first,
      per-thread GC timings should now be more accurate, and secondly we now
      report average and maximum pause times. e.g. from minimax +RTS -N8 -s:
      
                                          Tot time (elapsed)  Avg pause  Max pause
        Gen  0      2755 colls,  2754 par   13.16s    0.93s     0.0003s    0.0150s
        Gen  1       769 colls,   769 par    3.71s    0.26s     0.0003s    0.0059s
      1fb38442
  26. 17 Jun, 2010 2 commits
  27. 02 Dec, 2009 1 commit
  28. 09 Sep, 2009 1 commit
  29. 05 Aug, 2009 1 commit
  30. 02 Aug, 2009 1 commit
    • Simon Marlow's avatar
      RTS tidyup sweep, first phase · a2a67cd5
      Simon Marlow authored
      The first phase of this tidyup is focussed on the header files, and in
      particular making sure we are exposinng publicly exactly what we need
      to, and no more.
      
       - Rts.h now includes everything that the RTS exposes publicly,
         rather than a random subset of it.
      
       - Most of the public header files have moved into subdirectories, and
         many of them have been renamed.  But clients should not need to
         include any of the other headers directly, just #include the main
         public headers: Rts.h, HsFFI.h, RtsAPI.h.
      
       - All the headers needed for via-C compilation have moved into the
         stg subdirectory, which is self-contained.  Most of the headers for
         the rest of the RTS APIs have moved into the rts subdirectory.
      
       - I left MachDeps.h where it is, because it is so widely used in
         Haskell code.
       
       - I left a deprecated stub for RtsFlags.h in place.  The flag
         structures are now exposed by Rts.h.
      
       - Various internal APIs are no longer exposed by public header files.
      
       - Various bits of dead code and declarations have been removed
      
       - More gcc warnings are turned on, and the RTS code is more
         warning-clean.
      
       - More source files #include "PosixSource.h", and hence only use
         standard POSIX (1003.1c-1995) interfaces.
      
      There is a lot more tidying up still to do, this is just the first
      pass.  I also intend to standardise the names for external RTS APIs
      (e.g use the rts_ prefix consistently), and declare the internal APIs
      as hidden for shared libraries.
      a2a67cd5
  31. 16 Apr, 2008 2 commits
  32. 31 Oct, 2007 1 commit
  33. 08 Nov, 2006 1 commit
    • mrchebas@gmail.com's avatar
      Addition of PAPI to RTS · fe07f054
      mrchebas@gmail.com authored
      This patch still requires the addition of the USE_PAPI
      define to compile with PAPI. Also, programs must be
      compiled and linked with the appropriate library flags
      for papi.
      fe07f054