- 08 Jun, 2013 1 commit
-
-
ian@well-typed.com authored
Based on a patch from Stephen Blackheath.
-
- 14 Feb, 2013 1 commit
-
-
Simon Marlow authored
We were doing it in two different ways and asserting that the results were the same. In most cases they were, but I found one case where they weren't: the GC itself allocates some memory for running finalizers, and this memory was accounted for one way but not the other. It was simpler to remove the old way of counting allocation that to try to fix it up, so I did that.
-
- 21 Sep, 2012 1 commit
-
-
Simon Marlow authored
The program in #7257 was spending 90% of its time counting the live data in gen->large_objects. We already avoid doing this for small objects, but in this example the old generation was full of large objects (actually pinned ByteStrings).
-
- 07 Sep, 2012 2 commits
-
-
Simon Marlow authored
-
Simon Marlow authored
lnat was originally "long unsigned int" but we were using it when we wanted a 64-bit type on a 64-bit machine. This broke on Windows x64, where long == int == 32 bits. Using types of unspecified size is bad, but what we really wanted was a type with N bits on an N-bit machine. StgWord is exactly that. lnat was mentioned in some APIs that clients might be using (e.g. StackOverflowHook()), so we leave it defined but with a comment to say that it's deprecated.
-
- 19 Jun, 2012 1 commit
-
-
pcapriotti authored
-
- 04 Apr, 2012 1 commit
-
-
Duncan Coutts authored
Also rename internal variables to make the names match what they hold. The parallel GC work balance is calculated using the total amount of memory copied by all GC threads, and the maximum copied by any individual thread. You have serial GC when the max is the same as copied, and perfectly balanced GC when total/max == n_caps. Previously we presented this as the ratio total/max and told users that the serial value was 1 and the ideal value N, for N caps, e.g. Parallel GC work balance: 1.05 (4045071 / 3846774, ideal 2) The downside of this is that the user always has to keep in mind the number of cores being used. Our new presentation uses a normalised scale 0--1 as a percentage. The 0% means completely serial and 100% is perfect balance, e.g. Parallel GC work balance: 4.56% (serial 0%, perfect 100%)
-
- 09 Jan, 2012 1 commit
-
-
Gabor Greif authored
-
- 17 Oct, 2011 1 commit
-
-
Simon Marlow authored
See Note [atomic CAFs] in rts/sm/Storage.c
-
- 06 Aug, 2011 1 commit
-
-
Edward Z. Yang authored
Signed-off-by:
Edward Z. Yang <ezyang@mit.edu>
-
- 31 Jul, 2011 1 commit
-
-
Edward Z. Yang authored
We add a new RTS flag -T for collecting statistics but not giving any new inputs. There is one new struct in rts/storage/GC.h: GCStats. We add two new global counters current_residency and current_slop, which are useful for in-program GC statistics. See GHC.Stats in base for a Haskell interface to this functionality. Signed-off-by:
Edward Z. Yang <ezyang@mit.edu>
-
- 25 May, 2011 1 commit
-
-
Simon Marlow authored
in the future.
-
- 02 Feb, 2011 3 commits
-
-
Simon Marlow authored
Now we keep any partially-full blocks in the gc_thread[] structs after each GC, rather than moving them to the generation. This should give us slightly better locality (though I wasn't able to measure any difference). Also in this patch: better sanity checking with THREADED.
-
Simon Marlow authored
Store the *number* of the destination generation in the Bdescr struct, so that in evacuate() we don't have to deref gen to get it. This is another improvement ported over from my GC branch.
-
Simon Marlow authored
Now that we use the per-capability mutable lists exclusively.
-
- 21 Dec, 2010 1 commit
-
-
Simon Marlow authored
The allocation stats (+RTS -s etc.) used to count the slop at the end of each nursery block (except the last) as allocated space, now we count the allocated words accurately. This should make allocation figures more predictable, too. This has the side effect of reducing the apparent allocations by a small amount (~1%), so remember to take this into account when looking at nofib results.
-
- 17 Jun, 2010 1 commit
-
-
Simon Marlow authored
-
- 26 May, 2010 1 commit
-
-
Marco Túlio Gontijo e Silva authored
-
- 29 Mar, 2010 1 commit
-
-
Simon Marlow authored
-
- 31 Dec, 2009 1 commit
-
-
Simon Marlow authored
-
- 04 Dec, 2009 1 commit
-
-
Simon Marlow authored
-
- 03 Dec, 2009 1 commit
-
-
Simon Marlow authored
The GC had a two-level structure, G generations each of T steps. Steps are for aging within a generation, mostly to avoid premature promotion. Measurements show that more than 2 steps is almost never worthwhile, and 1 step is usually worse than 2. In theory fractional steps are possible, so the ideal number of steps is somewhere between 1 and 3. GHC's default has always been 2. We can implement 2 steps quite straightforwardly by having each block point to the generation to which objects in that block should be promoted, so blocks in the nursery point to generation 0, and blocks in gen 0 point to gen 1, and so on. This commit removes the explicit step structures, merging generations with steps, thus simplifying a lot of code. Performance is unaffected. The tunable number of steps is now gone, although it may be replaced in the future by a way to tune the aging in generation 0.
-
- 01 Dec, 2009 1 commit
-
-
Simon Marlow authored
This is a batch of refactoring to remove some of the GC's global state, as we move towards CPU-local GC. - allocateLocal() now allocates large objects into the local nursery, rather than taking a global lock and allocating then in gen 0 step 0. - allocatePinned() was still allocating from global storage and taking a lock each time, now it uses local storage. (mallocForeignPtrBytes should be faster with -threaded). - We had a gen 0 step 0, distinct from the nurseries, which are stored in a separate nurseries[] array. This is slightly strange. I removed the g0s0 global that pointed to gen 0 step 0, and removed all uses of it. I think now we don't use gen 0 step 0 at all, except possibly when there is only one generation. Possibly more tidying up is needed here. - I removed the global allocate() function, and renamed allocateLocal() to allocate(). - the alloc_blocks global is gone. MAYBE_GC() and doYouWantToGC() now check the local nursery only.
-
- 29 Nov, 2009 1 commit
-
-
Simon Marlow authored
At the moment, this just saves a memory reference in the GC inner loop (worth a percent or two of GC time). Later, it will hopefully let me experiment with partial steps, and simplifying the generation/step infrastructure.
-
- 29 Aug, 2009 1 commit
-
-
Simon Marlow authored
-
- 02 Aug, 2009 1 commit
-
-
Simon Marlow authored
The first phase of this tidyup is focussed on the header files, and in particular making sure we are exposinng publicly exactly what we need to, and no more. - Rts.h now includes everything that the RTS exposes publicly, rather than a random subset of it. - Most of the public header files have moved into subdirectories, and many of them have been renamed. But clients should not need to include any of the other headers directly, just #include the main public headers: Rts.h, HsFFI.h, RtsAPI.h. - All the headers needed for via-C compilation have moved into the stg subdirectory, which is self-contained. Most of the headers for the rest of the RTS APIs have moved into the rts subdirectory. - I left MachDeps.h where it is, because it is so widely used in Haskell code. - I left a deprecated stub for RtsFlags.h in place. The flag structures are now exposed by Rts.h. - Various internal APIs are no longer exposed by public header files. - Various bits of dead code and declarations have been removed - More gcc warnings are turned on, and the RTS code is more warning-clean. - More source files #include "PosixSource.h", and hence only use standard POSIX (1003.1c-1995) interfaces. There is a lot more tidying up still to do, this is just the first pass. I also intend to standardise the names for external RTS APIs (e.g use the rts_ prefix consistently), and declare the internal APIs as hidden for shared libraries.
-