1. 11 Apr, 2011 1 commit
    • Simon Marlow's avatar
      Refactoring and tidy up · 1fb38442
      Simon Marlow authored
      This is a port of some of the changes from my private local-GC branch
      (which is still in darcs, I haven't converted it to git yet).  There
      are a couple of small functional differences in the GC stats: first,
      per-thread GC timings should now be more accurate, and secondly we now
      report average and maximum pause times. e.g. from minimax +RTS -N8 -s:
      
                                          Tot time (elapsed)  Avg pause  Max pause
        Gen  0      2755 colls,  2754 par   13.16s    0.93s     0.0003s    0.0150s
        Gen  1       769 colls,   769 par    3.71s    0.26s     0.0003s    0.0059s
      1fb38442
  2. 02 Feb, 2011 1 commit
  3. 03 Dec, 2009 1 commit
    • Simon Marlow's avatar
      GC refactoring, remove "steps" · 214b3663
      Simon Marlow authored
      The GC had a two-level structure, G generations each of T steps.
      Steps are for aging within a generation, mostly to avoid premature
      promotion.  
      
      Measurements show that more than 2 steps is almost never worthwhile,
      and 1 step is usually worse than 2.  In theory fractional steps are
      possible, so the ideal number of steps is somewhere between 1 and 3.
      GHC's default has always been 2.
      
      We can implement 2 steps quite straightforwardly by having each block
      point to the generation to which objects in that block should be
      promoted, so blocks in the nursery point to generation 0, and blocks
      in gen 0 point to gen 1, and so on.
      
      This commit removes the explicit step structures, merging generations
      with steps, thus simplifying a lot of code.  Performance is
      unaffected.  The tunable number of steps is now gone, although it may
      be replaced in the future by a way to tune the aging in generation 0.
      214b3663
  4. 02 Dec, 2009 1 commit
  5. 29 Nov, 2009 1 commit
    • Simon Marlow's avatar
      Store a destination step in the block descriptor · f9d15f9f
      Simon Marlow authored
      At the moment, this just saves a memory reference in the GC inner loop
      (worth a percent or two of GC time).  Later, it will hopefully let me
      experiment with partial steps, and simplifying the generation/step
      infrastructure.
      f9d15f9f
  6. 20 Aug, 2009 1 commit
    • Simon Marlow's avatar
      Relax the assumption that all objects fit in a single block (#3424) · b99af863
      Simon Marlow authored
      It is possible for the program to allocate single object larger than a
      block, without going through the normal large-object mechanisms that
      we have for arrays and threads and so on.  
      
      The GC was assuming that no object was larger than a block, but #3424
      contains a program that breaks the assumption.  This patch removes the
      assumption.  The objects in question will still be copied, that is
      they don't get the normal large-object treatment, but this case is
      unlikely to occur often in practice.
      
      In the future we may improve things by generating code to allocate
      them as large objects in the first place.
      b99af863
  7. 02 Aug, 2009 1 commit
    • Simon Marlow's avatar
      RTS tidyup sweep, first phase · a2a67cd5
      Simon Marlow authored
      The first phase of this tidyup is focussed on the header files, and in
      particular making sure we are exposinng publicly exactly what we need
      to, and no more.
      
       - Rts.h now includes everything that the RTS exposes publicly,
         rather than a random subset of it.
      
       - Most of the public header files have moved into subdirectories, and
         many of them have been renamed.  But clients should not need to
         include any of the other headers directly, just #include the main
         public headers: Rts.h, HsFFI.h, RtsAPI.h.
      
       - All the headers needed for via-C compilation have moved into the
         stg subdirectory, which is self-contained.  Most of the headers for
         the rest of the RTS APIs have moved into the rts subdirectory.
      
       - I left MachDeps.h where it is, because it is so widely used in
         Haskell code.
       
       - I left a deprecated stub for RtsFlags.h in place.  The flag
         structures are now exposed by Rts.h.
      
       - Various internal APIs are no longer exposed by public header files.
      
       - Various bits of dead code and declarations have been removed
      
       - More gcc warnings are turned on, and the RTS code is more
         warning-clean.
      
       - More source files #include "PosixSource.h", and hence only use
         standard POSIX (1003.1c-1995) interfaces.
      
      There is a lot more tidying up still to do, this is just the first
      pass.  I also intend to standardise the names for external RTS APIs
      (e.g use the rts_ prefix consistently), and declare the internal APIs
      as hidden for shared libraries.
      a2a67cd5
  8. 03 Apr, 2009 1 commit
  9. 17 Mar, 2009 1 commit
    • Simon Marlow's avatar
      Add fast event logging · 8b18faef
      Simon Marlow authored
      Generate binary log files from the RTS containing a log of runtime
      events with timestamps.  The log file can be visualised in various
      ways, for investigating runtime behaviour and debugging performance
      problems.  See for example the forthcoming ThreadScope viewer.
      
      New GHC option:
      
        -eventlog   (link-time option) Enables event logging.
      
        +RTS -l     (runtime option) Generates <prog>.eventlog with
                    the binary event information.
      
      This replaces some of the tracing machinery we already had in the RTS:
      e.g. +RTS -vg  for GC tracing (we should do this using the new event
      logging instead).
      
      Event logging has almost no runtime cost when it isn't enabled, though
      in the future we might add more fine-grained events and this might
      change; hence having a link-time option and compiling a separate
      version of the RTS for event logging.  There's a small runtime cost
      for enabling event-logging, for most programs it shouldn't make much
      difference.
      
      (Todo: docs)
      8b18faef
  10. 13 Mar, 2009 1 commit
    • Simon Marlow's avatar
      Use work-stealing for load-balancing in the GC · 4e354226
      Simon Marlow authored
        
      New flag: "+RTS -qb" disables load-balancing in the parallel GC
      (though this is subject to change, I think we will probably want to do
      something more automatic before releasing this).
      
      To get the "PARGC3" configuration described in the "Runtime support
      for Multicore Haskell" paper, use "+RTS -qg0 -qb -RTS".
      
      The main advantage of this is that it allows us to easily disable
      load-balancing altogether, which turns out to be important in parallel
      programs.  Maintaining locality is sometimes more important that
      spreading the work out in parallel GC.  There is a side benefit in
      that the parallel GC should have improved locality even when
      load-balancing, because each processor prefers to take work from its
      own queue before stealing from others.
      4e354226
  11. 18 Jun, 2008 1 commit
  12. 17 Jun, 2008 1 commit
  13. 16 Apr, 2008 12 commits
  14. 13 Dec, 2007 1 commit
  15. 29 Nov, 2007 1 commit
  16. 27 Nov, 2007 1 commit
  17. 21 Nov, 2007 1 commit
  18. 31 Oct, 2007 2 commits
    • Simon Marlow's avatar
      Remove the optimisation of avoiding scavenging for certain objects · cacd714c
      Simon Marlow authored
      Some objects don't need to be scavenged, in particular if they have no
      pointers.  This seems like an obvious optimisation, but in fact it
      only accounts for about 1% of objects (in GHC, for example), and the
      extra complication means it probably isn't worth doing.
      cacd714c
    • Simon Marlow's avatar
      Refactoring of the GC in preparation for parallel GC · d5bd3e82
      Simon Marlow authored
        
      This patch localises the state of the GC into a gc_thread structure,
      and reorganises the inner loop of the GC to scavenge one block at a
      time from global work lists in each "step".  The gc_thread structure
      has a "workspace" for each step, in which it collects evacuated
      objects until it has a full block to push out to the step's global
      list.  Details of the algorithm will be on the wiki in due course.
      
      At the moment, THREADED_RTS does not compile, but the single-threaded
      GC works (and is 10-20% slower than before).
      d5bd3e82
  19. 26 Oct, 2006 1 commit
  20. 24 Oct, 2006 1 commit
    • Simon Marlow's avatar
      Split GC.c, and move storage manager into sm/ directory · ab0e778c
      Simon Marlow authored
      In preparation for parallel GC, split up the monolithic GC.c file into
      smaller parts.  Also in this patch (and difficult to separate,
      unfortunatley):
        
        - Don't include Stable.h in Rts.h, instead just include it where
          necessary.
        
        - consistently use STATIC_INLINE in source files, and INLINE_HEADER
          in header files.  STATIC_INLINE is now turned off when DEBUG is on,
          to make debugging easier.
        
        - The GC no longer takes the get_roots function as an argument.
          We weren't making use of this generalisation.
      ab0e778c