Commit 2918abf7 authored by duog's avatar duog Committed by Ben Gamari

rts: Add --internal-counters RTS flag and several counters

The existing internal counters:
* gc_alloc_block_sync
* whitehole_spin
* gen[g].sync
* gen[1].sync

are now not shown in the -s report unless --internal-counters is also passed.

If --internal-counters is passed we now show the counters above, reformatted, as
well as several other counters. In particular, we now count the yieldThread()
calls that SpinLocks do as well as their spins.

The added counters are:
* gc_spin (spin and yield)
* mut_spin (spin and yield)
* whitehole_threadPaused (spin only)
* whitehole_executeMessage (spin only)
* whitehole_lockClosure (spin only)
* waitForGcThreadsd (spin and yield)

As well as the following, which are not SpinLock-like things:
* any_work
* do_work
* scav_find_work

See the Note for descriptions of what these counters are.

We add busy_wait_nops in these loops along with the counter increment where it
was absent.

Old internal counters output:
```
gc_alloc_block_sync: 0
whitehole_gc_spin: 0
gen[0].sync: 0
gen[1].sync: 0
```

New internal counters output:
```
Internal Counters:
                                           Spins        Yields
    gc_alloc_block_sync                      323             0
    gc_spin                              9016713           752
    mut_spin                            57360944         47716
    whitehole_gc                               0           n/a
    whitehole_threadPaused                     0           n/a
    whitehole_executeMessage                   0           n/a
    whitehole_lockClosure                      0             0
    waitForGcThreads                           2           415
    gen[0].sync                                6             0
    gen[1].sync                                1             0

    any_work                                2017
    no_work                                 2014
    scav_find_work                          1004
```

Test Plan:
./validate

Check it builds with #define PROF_SPIN removed from includes/rts/Config.h

Reviewers: bgamari, erikd, simonmar, hvr

Reviewed By: simonmar

Subscribers: rwbarton, thomie, carter

GHC Trac Issues: #3553, #9221

Differential Revision: https://phabricator.haskell.org/D4302
parent f9a6d420
......@@ -776,6 +776,7 @@ RTS options to produce runtime statistics
-s [⟨file⟩]
-S [⟨file⟩]
--machine-readable
--internal-counters
These options produce runtime-system statistics, such as the amount
of time spent executing the program and in the garbage collector,
......@@ -785,7 +786,10 @@ RTS options to produce runtime statistics
line of output in the same format as GHC's ``-Rghc-timing`` option,
``-s`` produces a more detailed summary at the end of the program,
and ``-S`` additionally produces information about each and every
garbage collection.
garbage collection. Passing ``--internal-counters`` to a threaded
runtime will cause a detailed summary to include various internal
counts accumulated during the run; note that these are unspecified
and may change between releases.
The output is placed in ⟨file⟩. If ⟨file⟩ is omitted, then the
output is sent to ``stderr``.
......
......@@ -210,6 +210,30 @@ typedef struct _RTSStats {
GCDetails gc;
// -----------------------------------
// Internal Counters
// The number of times a GC thread spun on its 'gc_spin' lock.
// Will be zero if the rts was not built with PROF_SPIN
uint64_t gc_spin_spin;
// The number of times a GC thread yielded on its 'gc_spin' lock.
// Will be zero if the rts was not built with PROF_SPIN
uint64_t gc_spin_yield;
// The number of times a GC thread spun on its 'mut_spin' lock.
// Will be zero if the rts was not built with PROF_SPIN
uint64_t mut_spin_spin;
// The number of times a GC thread yielded on its 'mut_spin' lock.
// Will be zero if the rts was not built with PROF_SPIN
uint64_t mut_spin_yield;
// The number of times a GC thread has checked for work across all parallel
// GCs
uint64_t any_work;
// The number of times a GC thread has checked for work and found none across
// all parallel GCs
uint64_t no_work;
// The number of times a GC thread has iterated it's outer loop across all
// parallel GCs
uint64_t scav_find_work;
} RTSStats;
void getRTSStats (RTSStats *s);
......
......@@ -195,6 +195,7 @@ typedef struct _MISC_FLAGS {
bool generate_dump_file;
bool generate_stack_trace;
bool machineReadable;
bool internalCounters; /* See Note [Internal Counter Stats] */
StgWord linkerMemBase; /* address to ask the OS for memory
* for the linker, NULL ==> off */
} MISC_FLAGS;
......
......@@ -27,7 +27,8 @@
typedef struct SpinLock_
{
StgWord lock;
StgWord64 spin; // DEBUG version counts how much it spins
StgWord64 spin; // incremented every time we spin in ACQUIRE_SPIN_LOCK
StgWord64 yield; // incremented every time we yield in ACQUIRE_SPIN_LOCK
} SpinLock;
#else
typedef StgWord SpinLock;
......@@ -49,6 +50,7 @@ INLINE_HEADER void ACQUIRE_SPIN_LOCK(SpinLock * p)
p->spin++;
busy_wait_nop();
}
p->yield++;
yieldThread();
} while (1);
}
......@@ -66,6 +68,7 @@ INLINE_HEADER void initSpinLock(SpinLock * p)
write_barrier();
p->lock = 1;
p->spin = 0;
p->yield = 0;
}
#else
......
......@@ -120,7 +120,7 @@ typedef struct generation_ {
// stats information
uint32_t collections;
uint32_t par_collections;
uint32_t failed_promotions;
uint32_t failed_promotions; // Currently unused
// ------------------------------------
// Fields below are used during GC only
......
......@@ -138,6 +138,7 @@ data MiscFlags = MiscFlags
, generateCrashDumpFile :: Bool
, generateStackTrace :: Bool
, machineReadable :: Bool
, internalCounters :: Bool
, linkerMemBase :: Word
-- ^ address to ask the OS for memory for the linker, 0 ==> off
} deriving ( Show -- ^ @since 4.8.0.0
......@@ -441,6 +442,8 @@ getMiscFlags = do
(#{peek MISC_FLAGS, generate_stack_trace} ptr :: IO CBool))
<*> (toBool <$>
(#{peek MISC_FLAGS, machineReadable} ptr :: IO CBool))
<*> (toBool <$>
(#{peek MISC_FLAGS, internalCounters} ptr :: IO CBool))
<*> #{peek MISC_FLAGS, linkerMemBase} ptr
getDebugFlags :: IO DebugFlags
......
......@@ -129,6 +129,9 @@ loop:
}
else if (i == &stg_WHITEHOLE_info)
{
#if defined(PROF_SPIN)
++whitehole_executeMessage_spin;
#endif
goto loop;
}
else
......
......@@ -31,3 +31,7 @@ doneWithMsgThrowTo (MessageThrowTo *m)
}
#include "EndPrivate.h"
#if defined(THREADED_RTS) && defined(PROF_SPIN)
extern volatile StgWord64 whitehole_executeMessage_spin;
#endif
......@@ -232,6 +232,7 @@ void initRtsFlagsDefaults(void)
RtsFlags.MiscFlags.generate_stack_trace = true;
RtsFlags.MiscFlags.generate_dump_file = false;
RtsFlags.MiscFlags.machineReadable = false;
RtsFlags.MiscFlags.internalCounters = false;
RtsFlags.MiscFlags.linkerMemBase = 0;
#if defined(THREADED_RTS)
......@@ -888,6 +889,11 @@ error = true;
OPTION_UNSAFE;
RtsFlags.MiscFlags.machineReadable = true;
}
else if (strequal("internal-counters",
&rts_argv[arg][2])) {
OPTION_SAFE;
RtsFlags.MiscFlags.internalCounters = true;
}
else if (strequal("info",
&rts_argv[arg][2])) {
OPTION_SAFE;
......
......@@ -38,6 +38,11 @@ EXTERN_INLINE void unlockClosure(StgClosure *p, const StgInfoTable *info);
#if defined(THREADED_RTS)
#if defined(PROF_SPIN)
extern volatile StgWord64 whitehole_lockClosure_spin;
extern volatile StgWord64 whitehole_lockClosure_yield;
#endif
/* -----------------------------------------------------------------------------
* Locking/unlocking closures
*
......@@ -56,7 +61,14 @@ EXTERN_INLINE StgInfoTable *reallyLockClosure(StgClosure *p)
do {
info = xchg((P_)(void *)&p->header.info, (W_)&stg_WHITEHOLE_info);
if (info != (W_)&stg_WHITEHOLE_info) return (StgInfoTable *)info;
#if defined(PROF_SPIN)
++whitehole_lockClosure_spin;
#endif
busy_wait_nop();
} while (++i < SPIN_COUNT);
#if defined(PROF_SPIN)
++whitehole_lockClosure_yield;
#endif
yieldThread();
} while (1);
}
......
This diff is collapsed.
......@@ -30,7 +30,10 @@ void stat_startGCSync(struct gc_thread_ *_gct);
void stat_startGC(Capability *cap, struct gc_thread_ *_gct);
void stat_endGC (Capability *cap, struct gc_thread_ *_gct, W_ live,
W_ copied, W_ slop, uint32_t gen, uint32_t n_gc_threads,
W_ par_max_copied, W_ par_balanced_copied);
W_ par_max_copied, W_ par_balanced_copied,
W_ gc_spin_spin, W_ gc_spin_yield, W_ mut_spin_spin,
W_ mut_spin_yield, W_ any_work, W_ no_work,
W_ scav_find_work);
#if defined(PROFILING)
void stat_startRP(void);
......
......@@ -375,11 +375,21 @@ loop:
// spin until the WHITEHOLE is updated
info = StgHeader_info(node);
if (info == stg_WHITEHOLE_info) {
#if defined(PROF_SPIN)
W_[whitehole_lockClosure_spin] =
W_[whitehole_lockClosure_spin] + 1;
#endif
i = i + 1;
if (i == SPIN_COUNT) {
i = 0;
#if defined(PROF_SPIN)
W_[whitehole_lockClosure_yield] =
W_[whitehole_lockClosure_yield] + 1;
#endif
ccall yieldThread();
}
// TODO: We should busy_wait_nop() here, but that's not currently
// defined in CMM.
goto loop;
}
jump %ENTRY_CODE(info) (node);
......
......@@ -329,6 +329,10 @@ threadPaused(Capability *cap, StgTSO *tso)
if (cur_bh_info != bh_info) {
bh_info = cur_bh_info;
#if defined(PROF_SPIN)
++whitehole_threadPaused_spin;
#endif
busy_wait_nop();
goto retry;
}
#endif
......
......@@ -8,4 +8,12 @@
#pragma once
#include "BeginPrivate.h"
RTS_PRIVATE void threadPaused ( Capability *cap, StgTSO * );
#include "EndPrivate.h"
#if defined(THREADED_RTS) && defined(PROF_SPIN)
extern volatile StgWord64 whitehole_threadPaused_spin;
#endif
......@@ -1058,9 +1058,14 @@ selector_chain:
// In threaded mode, we'll use WHITEHOLE to lock the selector
// thunk while we evaluate it.
{
do {
while(true) {
info_ptr = xchg((StgPtr)&p->header.info, (W_)&stg_WHITEHOLE_info);
} while (info_ptr == (W_)&stg_WHITEHOLE_info);
if (info_ptr != (W_)&stg_WHITEHOLE_info) { break; }
#if defined(PROF_SPIN)
++whitehole_gc_spin;
#endif
busy_wait_nop();
}
// make sure someone else didn't get here first...
if (IS_FORWARDING_PTR(info_ptr) ||
......
......@@ -139,6 +139,9 @@ uint32_t n_gc_threads;
static long copied; // *words* copied & scavenged during this GC
#if defined(PROF_SPIN) && defined(THREADED_RTS)
// spin and yield counts for the quasi-SpinLock in waitForGcThreads
volatile StgWord64 waitForGcThreads_spin = 0;
volatile StgWord64 waitForGcThreads_yield = 0;
volatile StgWord64 whitehole_gc_spin = 0;
#endif
......@@ -198,7 +201,9 @@ GarbageCollect (uint32_t collect_gen,
{
bdescr *bd;
generation *gen;
StgWord live_blocks, live_words, par_max_copied, par_balanced_copied;
StgWord live_blocks, live_words, par_max_copied, par_balanced_copied,
gc_spin_spin, gc_spin_yield, mut_spin_spin, mut_spin_yield,
any_work, no_work, scav_find_work;
#if defined(THREADED_RTS)
gc_thread *saved_gct;
#endif
......@@ -471,32 +476,53 @@ GarbageCollect (uint32_t collect_gen,
copied = 0;
par_max_copied = 0;
par_balanced_copied = 0;
gc_spin_spin = 0;
gc_spin_yield = 0;
mut_spin_spin = 0;
mut_spin_yield = 0;
any_work = 0;
no_work = 0;
scav_find_work = 0;
{
uint32_t i;
uint64_t par_balanced_copied_acc = 0;
const gc_thread* thread;
for (i=0; i < n_gc_threads; i++) {
copied += gc_threads[i]->copied;
}
for (i=0; i < n_gc_threads; i++) {
thread = gc_threads[i];
if (n_gc_threads > 1) {
debugTrace(DEBUG_gc,"thread %d:", i);
debugTrace(DEBUG_gc," copied %ld", gc_threads[i]->copied * sizeof(W_));
debugTrace(DEBUG_gc," scanned %ld", gc_threads[i]->scanned * sizeof(W_));
debugTrace(DEBUG_gc," any_work %ld", gc_threads[i]->any_work);
debugTrace(DEBUG_gc," no_work %ld", gc_threads[i]->no_work);
debugTrace(DEBUG_gc," scav_find_work %ld", gc_threads[i]->scav_find_work);
debugTrace(DEBUG_gc," copied %ld",
thread->copied * sizeof(W_));
debugTrace(DEBUG_gc," scanned %ld",
thread->scanned * sizeof(W_));
debugTrace(DEBUG_gc," any_work %ld",
thread->any_work);
debugTrace(DEBUG_gc," no_work %ld",
thread->no_work);
debugTrace(DEBUG_gc," scav_find_work %ld",
thread->scav_find_work);
#if defined(THREADED_RTS) && defined(PROF_SPIN)
gc_spin_spin += thread->gc_spin.spin;
gc_spin_yield += thread->gc_spin.yield;
mut_spin_spin += thread->mut_spin.spin;
mut_spin_yield += thread->mut_spin.yield;
#endif
any_work += thread->any_work;
no_work += thread->no_work;
scav_find_work += thread->scav_find_work;
par_max_copied = stg_max(gc_threads[i]->copied, par_max_copied);
par_balanced_copied_acc +=
stg_min(n_gc_threads * gc_threads[i]->copied, copied);
}
par_max_copied = stg_max(gc_threads[i]->copied, par_max_copied);
par_balanced_copied_acc +=
stg_min(n_gc_threads * gc_threads[i]->copied, copied);
}
if (n_gc_threads == 1) {
par_max_copied = 0;
par_balanced_copied = 0;
}
else
{
if (n_gc_threads > 1) {
// See Note [Work Balance] for an explanation of this computation
par_balanced_copied =
(par_balanced_copied_acc - copied + (n_gc_threads - 1) / 2) /
......@@ -834,7 +860,9 @@ GarbageCollect (uint32_t collect_gen,
// ok, GC over: tell the stats department what happened.
stat_endGC(cap, gct, live_words, copied,
live_blocks * BLOCK_SIZE_W - live_words /* slop */,
N, n_gc_threads, par_max_copied, par_balanced_copied);
N, n_gc_threads, par_max_copied, par_balanced_copied,
gc_spin_spin, gc_spin_yield, mut_spin_spin, mut_spin_yield,
any_work, no_work, scav_find_work);
#if defined(RTS_USER_SIGNALS)
if (RtsFlags.MiscFlags.install_signal_handlers) {
......@@ -1186,6 +1214,9 @@ waitForGcThreads (Capability *cap USED_IF_THREADS, bool idle_cap[])
}
}
if (!retry) break;
#if defined(PROF_SPIN)
waitForGcThreads_yield++;
#endif
yieldThread();
}
......@@ -1196,6 +1227,14 @@ waitForGcThreads (Capability *cap USED_IF_THREADS, bool idle_cap[])
rtsConfig.longGCSync(cap->no, t2 - t0);
t1 = t2;
}
if (retry) {
#if defined(PROF_SPIN)
// This is a bit strange, we'll get more yields than spins.
// I guess that means it's not a spin-lock at all, but these
// numbers are still useful (I think).
waitForGcThreads_spin++;
#endif
}
}
if (RtsFlags.GcFlags.longGCSync != 0 &&
......
......@@ -47,6 +47,8 @@ extern uint32_t mutlist_MUTVARS, mutlist_MUTARRS, mutlist_MVARS, mutlist_OTHERS,
#if defined(PROF_SPIN) && defined(THREADED_RTS)
extern volatile StgWord64 whitehole_gc_spin;
extern volatile StgWord64 waitForGcThreads_spin;
extern volatile StgWord64 waitForGcThreads_yield;
#endif
void gcWorkerThread (Capability *cap);
......
Internal Counters:
\ No newline at end of file
......@@ -48,13 +48,13 @@ T5423:
.PHONY: T9405
T9405:
@'$(TEST_HC)' $(TEST_HC_OPTS) -ticky -rtsopts T9405.hs; \
./T9405 +RTS -rT9405.ticky & \
sleep 0.2; \
kill -2 $$!; \
wait $$!; \
[ -e T9405.ticky ] || echo "Error: Ticky profile doesn't exist"; \
[ -s T9405.ticky ] || echo "Error: Ticky profile is empty"; \
echo Ticky-Ticky;
./T9405 +RTS -rT9405.ticky & \
sleep 0.2; \
kill -2 $$!; \
wait $$!; \
[ -e T9405.ticky ] || echo "Error: Ticky profile doesn't exist"; \
[ -s T9405.ticky ] || echo "Error: Ticky profile is empty"; \
echo Ticky-Ticky;
# Naming convention: 'T5423_' obj-way '_' obj-src
# obj-way ::= v | dyn
......@@ -178,3 +178,8 @@ T12497:
.PHONY: T14695
T14695:
echo ":quit" | LD_LIBRARY_PATH="foo:" "$(TEST_HC)" $(filter-out -rtsopts, $(TEST_HC_OPTS_INTERACTIVE))
.PHONY: InternalCounters
InternalCounters:
"$(TEST_HC)" +RTS -s --internal-counters -RTS 2>&1 | grep "Internal Counters"
-"$(TEST_HC)" +RTS -s -RTS 2>&1 | grep "Internal Counters"
......@@ -390,3 +390,5 @@ test('T14702', [ ignore_stdout
, compile_and_run, [''])
test('T14900', normal, compile_and_run, ['-package ghc-compact'])
test('InternalCounters', normal, run_command,
['$MAKE -s --no-print-directory InternalCounters'])
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment