1. 08 Jun, 2013 1 commit
  2. 14 Feb, 2013 1 commit
    • Simon Marlow's avatar
      Simplify the allocation stats accounting · 65a0e1eb
      Simon Marlow authored
      We were doing it in two different ways and asserting that the results
      were the same.  In most cases they were, but I found one case where
      they weren't: the GC itself allocates some memory for running
      finalizers, and this memory was accounted for one way but not the
      other.
      
      It was simpler to remove the old way of counting allocation that to
      try to fix it up, so I did that.
      65a0e1eb
  3. 21 Sep, 2012 1 commit
  4. 07 Sep, 2012 2 commits
    • Simon Marlow's avatar
      Lots of nat -> StgWord changes · bf2d58c2
      Simon Marlow authored
      bf2d58c2
    • Simon Marlow's avatar
      Deprecate lnat, and use StgWord instead · 41737f12
      Simon Marlow authored
      lnat was originally "long unsigned int" but we were using it when we
      wanted a 64-bit type on a 64-bit machine.  This broke on Windows x64,
      where long == int == 32 bits.  Using types of unspecified size is bad,
      but what we really wanted was a type with N bits on an N-bit machine.
      StgWord is exactly that.
      
      lnat was mentioned in some APIs that clients might be using
      (e.g. StackOverflowHook()), so we leave it defined but with a comment
      to say that it's deprecated.
      41737f12
  5. 19 Jun, 2012 1 commit
  6. 04 Apr, 2012 1 commit
    • Duncan Coutts's avatar
      Change the presentation of parallel GC work balance in +RTS -s · cd930da1
      Duncan Coutts authored
      Also rename internal variables to make the names match what they hold.
      The parallel GC work balance is calculated using the total amount of
      memory copied by all GC threads, and the maximum copied by any
      individual thread. You have serial GC when the max is the same as
      copied, and perfectly balanced GC when total/max == n_caps.
      
      Previously we presented this as the ratio total/max and told users
      that the serial value was 1 and the ideal value N, for N caps, e.g.
      
        Parallel GC work balance: 1.05 (4045071 / 3846774, ideal 2)
      
      The downside of this is that the user always has to keep in mind the
      number of cores being used. Our new presentation uses a normalised
      scale 0--1 as a percentage. The 0% means completely serial and 100%
      is perfect balance, e.g.
      
        Parallel GC work balance: 4.56% (serial 0%, perfect 100%)
      cd930da1
  7. 09 Jan, 2012 1 commit
  8. 17 Oct, 2011 1 commit
  9. 06 Aug, 2011 1 commit
  10. 31 Jul, 2011 1 commit
    • Edward Z. Yang's avatar
      Implement public interface for GC statistics. · 2088abaf
      Edward Z. Yang authored
      We add a new RTS flag -T for collecting statistics but not giving any
      new inputs.  There is one new struct in rts/storage/GC.h: GCStats.  We
      add two new global counters current_residency and current_slop, which
      are useful for in-program GC statistics.
      
      See GHC.Stats in base for a Haskell interface to this functionality.
      Signed-off-by: Edward Z. Yang's avatarEdward Z. Yang <ezyang@mit.edu>
      2088abaf
  11. 25 May, 2011 1 commit
  12. 02 Feb, 2011 3 commits
    • Simon Marlow's avatar
      GC refactoring and cleanup · 18896fa2
      Simon Marlow authored
      Now we keep any partially-full blocks in the gc_thread[] structs after
      each GC, rather than moving them to the generation.  This should give
      us slightly better locality (though I wasn't able to measure any
      difference).
      
      Also in this patch: better sanity checking with THREADED.
      18896fa2
    • Simon Marlow's avatar
      A small GC optimisation · bef3da1e
      Simon Marlow authored
      Store the *number* of the destination generation in the Bdescr struct,
      so that in evacuate() we don't have to deref gen to get it.
      This is another improvement ported over from my GC branch.
      bef3da1e
    • Simon Marlow's avatar
      Remove the per-generation mutable lists · 32907722
      Simon Marlow authored
      Now that we use the per-capability mutable lists exclusively.
      32907722
  13. 21 Dec, 2010 1 commit
    • Simon Marlow's avatar
      Count allocations more accurately · db0c13a4
      Simon Marlow authored
      The allocation stats (+RTS -s etc.) used to count the slop at the end
      of each nursery block (except the last) as allocated space, now we
      count the allocated words accurately.  This should make allocation
      figures more predictable, too.
      
      This has the side effect of reducing the apparent allocations by a
      small amount (~1%), so remember to take this into account when looking
      at nofib results.
      db0c13a4
  14. 17 Jun, 2010 1 commit
  15. 26 May, 2010 1 commit
  16. 29 Mar, 2010 1 commit
  17. 31 Dec, 2009 1 commit
  18. 04 Dec, 2009 1 commit
  19. 03 Dec, 2009 1 commit
    • Simon Marlow's avatar
      GC refactoring, remove "steps" · 214b3663
      Simon Marlow authored
      The GC had a two-level structure, G generations each of T steps.
      Steps are for aging within a generation, mostly to avoid premature
      promotion.  
      
      Measurements show that more than 2 steps is almost never worthwhile,
      and 1 step is usually worse than 2.  In theory fractional steps are
      possible, so the ideal number of steps is somewhere between 1 and 3.
      GHC's default has always been 2.
      
      We can implement 2 steps quite straightforwardly by having each block
      point to the generation to which objects in that block should be
      promoted, so blocks in the nursery point to generation 0, and blocks
      in gen 0 point to gen 1, and so on.
      
      This commit removes the explicit step structures, merging generations
      with steps, thus simplifying a lot of code.  Performance is
      unaffected.  The tunable number of steps is now gone, although it may
      be replaced in the future by a way to tune the aging in generation 0.
      214b3663
  20. 01 Dec, 2009 1 commit
    • Simon Marlow's avatar
      Make allocatePinned use local storage, and other refactorings · 5270423a
      Simon Marlow authored
      This is a batch of refactoring to remove some of the GC's global
      state, as we move towards CPU-local GC.  
      
        - allocateLocal() now allocates large objects into the local
          nursery, rather than taking a global lock and allocating
          then in gen 0 step 0.
      
        - allocatePinned() was still allocating from global storage and
          taking a lock each time, now it uses local storage. 
          (mallocForeignPtrBytes should be faster with -threaded).
          
        - We had a gen 0 step 0, distinct from the nurseries, which are
          stored in a separate nurseries[] array.  This is slightly strange.
          I removed the g0s0 global that pointed to gen 0 step 0, and
          removed all uses of it.  I think now we don't use gen 0 step 0 at
          all, except possibly when there is only one generation.  Possibly
          more tidying up is needed here.
      
        - I removed the global allocate() function, and renamed
          allocateLocal() to allocate().
      
        - the alloc_blocks global is gone.  MAYBE_GC() and
          doYouWantToGC() now check the local nursery only.
      5270423a
  21. 29 Nov, 2009 1 commit
    • Simon Marlow's avatar
      Store a destination step in the block descriptor · f9d15f9f
      Simon Marlow authored
      At the moment, this just saves a memory reference in the GC inner loop
      (worth a percent or two of GC time).  Later, it will hopefully let me
      experiment with partial steps, and simplifying the generation/step
      infrastructure.
      f9d15f9f
  22. 29 Aug, 2009 1 commit
  23. 02 Aug, 2009 1 commit
    • Simon Marlow's avatar
      RTS tidyup sweep, first phase · a2a67cd5
      Simon Marlow authored
      The first phase of this tidyup is focussed on the header files, and in
      particular making sure we are exposinng publicly exactly what we need
      to, and no more.
      
       - Rts.h now includes everything that the RTS exposes publicly,
         rather than a random subset of it.
      
       - Most of the public header files have moved into subdirectories, and
         many of them have been renamed.  But clients should not need to
         include any of the other headers directly, just #include the main
         public headers: Rts.h, HsFFI.h, RtsAPI.h.
      
       - All the headers needed for via-C compilation have moved into the
         stg subdirectory, which is self-contained.  Most of the headers for
         the rest of the RTS APIs have moved into the rts subdirectory.
      
       - I left MachDeps.h where it is, because it is so widely used in
         Haskell code.
       
       - I left a deprecated stub for RtsFlags.h in place.  The flag
         structures are now exposed by Rts.h.
      
       - Various internal APIs are no longer exposed by public header files.
      
       - Various bits of dead code and declarations have been removed
      
       - More gcc warnings are turned on, and the RTS code is more
         warning-clean.
      
       - More source files #include "PosixSource.h", and hence only use
         standard POSIX (1003.1c-1995) interfaces.
      
      There is a lot more tidying up still to do, this is just the first
      pass.  I also intend to standardise the names for external RTS APIs
      (e.g use the rts_ prefix consistently), and declare the internal APIs
      as hidden for shared libraries.
      a2a67cd5