- 21 Oct, 2014 1 commit
-
-
Austin Seipp authored
Signed-off-by:
Austin Seipp <austin@well-typed.com>
-
- 29 Sep, 2014 1 commit
-
-
Simon Marlow authored
This reverts commit 39b5c1cb.
-
- 28 Jul, 2014 2 commits
-
-
Austin Seipp authored
This will hopefully help ensure some basic consistency in the forward by overriding buffer variables. In particular, it sets the wrap length, the offset to 4, and turns off tabs. Signed-off-by:
Austin Seipp <austin@well-typed.com>
-
Herbert Valerio Riedel authored
Summary: Today's hardware is much faster, so it makes sense to report timings with more precision, and possibly help reduce rounding-induced fluctuations in the nofib statistics. This commit increases the precision of all timings previously reported with a granularity of 10ms to 1ms. For instance, the `+RTS -S` output is now rendered as: Alloc Copied Live GC GC TOT TOT Page Flts bytes bytes bytes user elap user elap 641936 59944 158120 0.000 0.000 0.013 0.001 0 0 (Gen: 0) 517672 60840 158464 0.000 0.000 0.013 0.002 0 0 (Gen: 0) 517256 58800 156424 0.005 0.005 0.019 0.007 0 0 (Gen: 1) 670208 9520 158728 0.000 0.000 0.019 0.008 0 0 (Gen: 0) ... Tot time (elapsed) Avg pause Max pause Gen 0 24 colls, 0 par 0.002s 0.002s 0.0001s 0.0002s Gen 1 3 colls, 0 par 0.011s 0.011s 0.0038s 0.0055s TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.001s ( 0.001s elapsed) MUT time 0.005s ( 0.006s elapsed) GC time 0.014s ( 0.014s elapsed) EXIT time 0.001s ( 0.001s elapsed) Total time 0.032s ( 0.020s elapsed) Note that this change also requires associated changes in the nofib submodule. Test Plan: tested with modified nofib Reviewers: simonmar, nomeata, austin Subscribers: simonmar, relrod, carter Differential Revision: https://phabricator.haskell.org/D97
-
- 10 Jul, 2014 1 commit
-
-
brbr authored
Summary: Avoid unnecessary clock_gettime() syscalls in GC stats. Test Plan: Use strace. Reviewers: simonmar, austin Reviewed By: simonmar, austin Subscribers: simonmar, relrod, carter Differential Revision: https://phabricator.haskell.org/D39
-
- 05 Dec, 2013 1 commit
-
-
Christopher Rodrigues authored
Signed-off-by:
Austin Seipp <austin@well-typed.com>
-
- 28 Oct, 2013 1 commit
-
-
Erik de Castro Lopo authored
-
- 04 Sep, 2013 1 commit
-
-
Simon Marlow authored
We have various problems with reallocating the array of Capabilities, due to threads in waitForReturnCapability that are already holding a pointer to a Capability. Rather than add more locking to make this safer, I decided it would be easier to ensure that we never move the Capabilities at all. The capabilities array is now an array of pointers to Capabaility. There are extra indirections, but it rarely matters - we don't often access Capabilities via the array, normally we already have a pointer to one. I ran the parallel benchmarks and didn't see any difference.
-
- 14 Feb, 2013 1 commit
-
-
Simon Marlow authored
We were doing it in two different ways and asserting that the results were the same. In most cases they were, but I found one case where they weren't: the GC itself allocates some memory for running finalizers, and this memory was accounted for one way but not the other. It was simpler to remove the old way of counting allocation that to try to fix it up, so I did that.
-
- 14 Sep, 2012 1 commit
-
-
Ian Lynagh authored
-
- 07 Sep, 2012 1 commit
-
-
Simon Marlow authored
lnat was originally "long unsigned int" but we were using it when we wanted a 64-bit type on a 64-bit machine. This broke on Windows x64, where long == int == 32 bits. Using types of unspecified size is bad, but what we really wanted was a type with N bits on an N-bit machine. StgWord is exactly that. lnat was mentioned in some APIs that clients might be using (e.g. StackOverflowHook()), so we leave it defined but with a comment to say that it's deprecated.
-
- 19 Jun, 2012 1 commit
-
-
pcapriotti authored
-
- 26 Apr, 2012 2 commits
-
-
Ian Lynagh authored
-
Ian Lynagh authored
Mostly this meant getting pointer<->int conversions to use the right sizes. lnat is now size_t, rather than unsigned long, as that seems a better match for how it's used.
-
- 04 Apr, 2012 6 commits
-
-
Mikolaj Konarski authored
Quoting design rationale by dcoutts: The event indicates that we're doing a stop-the-world GC and all other HECs should be between their GC_START and GC_END events at that moment. We don't want to use GC_STATS_GHC for that, because GC_STATS_GHC is for extra GHC-specific info, not something we have to rely on to be able to match the GC pauses across HECs to a particular global GC.
-
Mikolaj Konarski authored
There was a discrepancy between GC times reported in +RTS -s and the timestamps of GC_START and GC_END events on the cap, on which +RTS -s stats for the given GC are based. This is fixed by posting the events with exactly the same timestamp as generated for the stat calculation. The calls posting the events are moved too, so that the events are emitted close to the time instant they claim to be emitted at. The GC_STATS_GHC was moved, too, ensuring it's emitted before the moved GC_END on all caps, which simplifies tools code.
-
Duncan Coutts authored
In stat_exit we want to emit a final EVENT_HEAP_ALLOCATED for each cap so that we get the same total allocation count as reported via +RTS -s. To do so we need to update the per-cap total_allocated counts. Previously we had a single calcAllocated(rtsBool) function that counted the large allocations and optionally the nurseries for all caps. The GC would always call it with false, and the stat_exit always with true. The reason for these two modes is that the GC counts the nurseries via clearNurseries() (which also updates the per-cap total_allocated counts), so it's only the stat_exit() path that needs to count them. We now split the calcAllocated() function into two: countLargeAllocated and updateNurseriesStats. As the name suggests, the latter now updates the per-cap total_allocated counts, in additon to returning a total.
-
Duncan Coutts authored
They cover much the same info as is available via the GHC.Stats module or via the '+RTS -s' textual output, but via the eventlog and with a better sampling frequency. We have three new generic heap info events and two very GHC-specific ones. (The hope is the general ones are usable by other implementations that use the same eventlog system, or indeed not so sensitive to changes in GHC itself.) The general ones are: * total heap mem allocated since prog start, on a per-HEC basis * current size of the heap (MBlocks reserved from OS for the heap) * current size of live data in the heap Currently these are all emitted by GHC at GC time (live data only at major GC). The GHC specific ones are: * an event giving various static heap paramaters: * number of generations (usually 2) * max size if any * nursary size * MBlock and block sizes * a event emitted on each GC containing: * GC generation (usually just 0,1) * total bytes copied * bytes lost to heap slop and fragmentation * the number of threads in the parallel GC (1 for serial) * the maximum number of bytes copied by any par GC thread * the total number of bytes copied by all par GC threads (these last three can be used to calculate an estimate of the work balance in parallel GCs)
-
Duncan Coutts authored
Also rename internal variables to make the names match what they hold. The parallel GC work balance is calculated using the total amount of memory copied by all GC threads, and the maximum copied by any individual thread. You have serial GC when the max is the same as copied, and perfectly balanced GC when total/max == n_caps. Previously we presented this as the ratio total/max and told users that the serial value was 1 and the ideal value N, for N caps, e.g. Parallel GC work balance: 1.05 (4045071 / 3846774, ideal 2) The downside of this is that the user always has to keep in mind the number of cores being used. Our new presentation uses a normalised scale 0--1 as a percentage. The 0% means completely serial and 100% is perfect balance, e.g. Parallel GC work balance: 4.56% (serial 0%, perfect 100%)
-
Duncan Coutts authored
In addition to the existing global method. For now we just do it both ways and assert they give the same grand total. At some stage we can simplify the global method to just take the sum of the per-cap counters.
-
- 02 Mar, 2012 1 commit
-
-
Simon Marlow authored
We were keeping around the Task struct (216 bytes) for every worker we ever created, even though we only keep a maximum of 6 workers per Capability. These Task structs accumulate and cause a space leak in programs that do lots of safe FFI calls; this patch frees the Task struct as soon as a worker exits. One reason we were keeping the Task structs around is because we print out per-Task timing stats in +RTS -s, but that isn't terribly useful. What is sometimes useful is knowing how *many* Tasks there were. So now I'm printing a single-line summary, this is for the program in TASKS: 2001 (1 bound, 31 peak workers (2000 total), using -N1) So although we created 2k tasks overall, there were only 31 workers active at any one time (which is exactly what we expect: the program makes 30 safe FFI calls concurrently). This also gives an indication of how many capabilities were being used, which is handy if you use +RTS -N without an explicit number.
-
- 15 Jan, 2012 2 commits
-
-
Ian Lynagh authored
-
Ian Lynagh authored
-
- 06 Jan, 2012 1 commit
-
-
Simon Marlow authored
-
- 06 Dec, 2011 1 commit
-
-
Simon Marlow authored
At present the number of capabilities can only be *increased*, not decreased. The latter presents a few more challenges!
-
- 25 Nov, 2011 1 commit
-
-
Simon Marlow authored
Terminology cleanup: the type "Ticks" has been renamed "Time", which is an StgWord64 in units of TIME_RESOLUTION (currently nanoseconds). The terminology "tick" is now used consistently to mean the interval between timer signals. The ticker now always ticks in realtime (actually CLOCK_MONOTONIC if we have it). Before it used CPU time in the non-threaded RTS and realtime in the threaded RTS, but I've discovered that the CPU timer has terrible resolution (at least on Linux) and isn't much use for profiling. So now we always use realtime. This should also fix The default tick interval is now 10ms, except when profiling where we drop it to 1ms. This gives more accurate profiles without affecting runtime too much (<1%). Lots of cleanups - the resolution of Time is now in one place only (Rts.h) rather than having calculations that depend on the resolution scattered all over the RTS. I hope I found them all.
-
- 02 Nov, 2011 1 commit
-
-
Simon Marlow authored
-
- 06 Aug, 2011 1 commit
-
-
Edward Z. Yang authored
Signed-off-by:
Edward Z. Yang <ezyang@mit.edu>
-
- 31 Jul, 2011 1 commit
-
-
Edward Z. Yang authored
We add a new RTS flag -T for collecting statistics but not giving any new inputs. There is one new struct in rts/storage/GC.h: GCStats. We add two new global counters current_residency and current_slop, which are useful for in-program GC statistics. See GHC.Stats in base for a Haskell interface to this functionality. Signed-off-by:
Edward Z. Yang <ezyang@mit.edu>
-
- 25 Jul, 2011 1 commit
-
-
Edward Z. Yang authored
Signed-off-by:
Edward Z. Yang <ezyang@mit.edu>
-
- 24 Jul, 2011 1 commit
-
-
Ian Lynagh authored
Heap census now happens during GC, so that time is already accounted for in GC_tot_cpu.
-
- 23 Jul, 2011 1 commit
-
-
Ian Lynagh authored
Now that the heap census runs in the middle of garbage collections, the "CPU time" it was calculating included any CPU time used so far in the current GC. This could cause CPU time to appear to go down, which means hp2ps complained about "samples out of sequence". I'm not sure if this is the nicest way to solve this (maybe resurrecting mut_user_time_during_GC would be better?) but it gets things working again.
-
- 18 Jul, 2011 2 commits
-
-
Duncan Coutts authored
When you use `par` to make a spark, if the spark pool on the current capability is full then the spark is discarded. This represents a loss of potential parallelism and it also means there are simply a lot of sparks around. Both are things that might be of concern to a programmer when tuning a parallel program that uses par. The "+RTS -s" stats command now reports overflowed sparks, e.g. SPARKS: 100001 (15521 converted, 84480 overflowed, 0 dud, 0 GC'd, 0 fizzled)
-
Duncan Coutts authored
-
- 06 Jun, 2011 1 commit
-
-
Simon Marlow authored
-
- 25 May, 2011 1 commit
-
-
Simon Marlow authored
in the future.
-
- 10 May, 2011 1 commit
-
-
dmp authored
The code that prints the "one-line" stats (i.e. the RTS -t flag) was incorreclty printing zeros for some time values. These time values were computed inside a conditional that was only true when printing detailed stats (i.e. the RTS -s or -S flags). This commit simply moves the computation out of the conditional so they are available for the one-line stats output.
-
- 15 Apr, 2011 1 commit
-
-
Simon Marlow authored
-
- 14 Apr, 2011 1 commit
-
-
Simon Marlow authored
The pinned_object_block is where we allocate small pinned ByteArray# objects. At a GC the pinned_object_block was being treated like other large objects and promoted to the next step/generation, even if it was only partly full. Under some ByteString-heavy workloads this would accumulate on average 2k of slop per GC, and this memory is never released until the ByteArray# objects in the block are freed. So now, we keep allocating into the pinned_object_block until it is completely full, at which point it is handed over to the GC as before. The pinned_object_block might therefore contain objects which a large range of ages, but I don't think this is any worse than the situation before. We still have the fragmentation issue in general, but the new scheme can improve the memory overhead for some workloads dramatically.
-
- 11 Apr, 2011 1 commit
-
-
Simon Marlow authored
This is a port of some of the changes from my private local-GC branch (which is still in darcs, I haven't converted it to git yet). There are a couple of small functional differences in the GC stats: first, per-thread GC timings should now be more accurate, and secondly we now report average and maximum pause times. e.g. from minimax +RTS -N8 -s: Tot time (elapsed) Avg pause Max pause Gen 0 2755 colls, 2754 par 13.16s 0.93s 0.0003s 0.0150s Gen 1 769 colls, 769 par 3.71s 0.26s 0.0003s 0.0059s
-