- 16 Apr, 2008 11 commits
-
-
simonmarhaskell@gmail.com authored
- GCAux.c contains code not compiled with the gct register enabled, it is callable from outside the GC - marking functions are moved to their relevant subsystems, outside the GC - mark_root needs to save the gct register, as it is called from outside the GC
-
simonmarhaskell@gmail.com authored
-
simonmarhaskell@gmail.com authored
- count and report number of parallel collections - calculate bytes scanned in addition to bytes copied per thread - calculate "work balance factor" - tidy up the formatting a bit
-
simonmarhaskell@gmail.com authored
This means we can calculate slop easily, and also improve predictability of GC.
-
simonmarhaskell@gmail.com authored
-
simonmarhaskell@gmail.com authored
-
simonmarhaskell@gmail.com authored
So we can parallelise minor collections too. Sometimes it's worth it.
-
simonmarhaskell@gmail.com authored
-
simonmarhaskell@gmail.com authored
-
simonmarhaskell@gmail.com authored
DEBUG imposes a significant performance hit in the GC, yet we often want some of the debugging output, so -vg gives us the cheap trace messages without the sanity checking of DEBUG, just like -vs for the scheduler.
-
simonmarhaskell@gmail.com authored
-
- 11 Jan, 2008 1 commit
-
-
simonmar@microsoft.com authored
-
- 21 Nov, 2007 1 commit
-
-
Simon Marlow authored
avoids cache contention: bd->todo_bd->free may clash with any cache line, so we localise it.
-
- 31 Oct, 2007 5 commits
-
-
Simon Marlow authored
-
Simon Marlow authored
Some objects don't need to be scavenged, in particular if they have no pointers. This seems like an obvious optimisation, but in fact it only accounts for about 1% of objects (in GHC, for example), and the extra complication means it probably isn't worth doing.
-
Simon Marlow authored
By establishing an ordering on step pointers, we can simplify the test (stp->gen_no < evac_gen) to (stp < evac_step) which is common in evacuate().
-
Simon Marlow authored
eg. use +RTS -g2 -RTS for 2 threads. Only major GCs are parallelised, minor GCs are still sequential. Don't use more threads than you have CPUs. It works most of the time, although you won't see much speedup yet. Tuning and more work on stability still required.
-
Simon Marlow authored
This patch localises the state of the GC into a gc_thread structure, and reorganises the inner loop of the GC to scavenge one block at a time from global work lists in each "step". The gc_thread structure has a "workspace" for each step, in which it collects evacuated objects until it has a full block to push out to the step's global list. Details of the algorithm will be on the wiki in due course. At the moment, THREADED_RTS does not compile, but the single-threaded GC works (and is 10-20% slower than before).
-
- 11 Oct, 2007 1 commit
-
-
Simon Marlow authored
Previously MVars were always on the mutable list of the old generation, which meant every MVar was visited during every minor GC. With lots of MVars hanging around, this gets expensive. We addressed this problem for MUT_VARs (aka IORefs) a while ago, the solution is to use a traditional GC write-barrier when the object is modified. This patch does the same thing for MVars. TVars are still done the old way, they could probably benefit from the same treatment too.
-
- 26 Oct, 2006 1 commit
-
-
Simon Marlow authored
-
- 24 Oct, 2006 1 commit
-
-
Simon Marlow authored
In preparation for parallel GC, split up the monolithic GC.c file into smaller parts. Also in this patch (and difficult to separate, unfortunatley): - Don't include Stable.h in Rts.h, instead just include it where necessary. - consistently use STATIC_INLINE in source files, and INLINE_HEADER in header files. STATIC_INLINE is now turned off when DEBUG is on, to make debugging easier. - The GC no longer takes the get_roots function as an argument. We weren't making use of this generalisation.
-