1. 26 May, 2010 1 commit
  2. 03 Dec, 2009 1 commit
    • Simon Marlow's avatar
      GC refactoring, remove "steps" · 214b3663
      Simon Marlow authored
      The GC had a two-level structure, G generations each of T steps.
      Steps are for aging within a generation, mostly to avoid premature
      promotion.  
      
      Measurements show that more than 2 steps is almost never worthwhile,
      and 1 step is usually worse than 2.  In theory fractional steps are
      possible, so the ideal number of steps is somewhere between 1 and 3.
      GHC's default has always been 2.
      
      We can implement 2 steps quite straightforwardly by having each block
      point to the generation to which objects in that block should be
      promoted, so blocks in the nursery point to generation 0, and blocks
      in gen 0 point to gen 1, and so on.
      
      This commit removes the explicit step structures, merging generations
      with steps, thus simplifying a lot of code.  Performance is
      unaffected.  The tunable number of steps is now gone, although it may
      be replaced in the future by a way to tune the aging in generation 0.
      214b3663
  3. 02 Dec, 2009 1 commit
  4. 29 Nov, 2009 1 commit
    • Simon Marlow's avatar
      Store a destination step in the block descriptor · f9d15f9f
      Simon Marlow authored
      At the moment, this just saves a memory reference in the GC inner loop
      (worth a percent or two of GC time).  Later, it will hopefully let me
      experiment with partial steps, and simplifying the generation/step
      infrastructure.
      f9d15f9f
  5. 02 Aug, 2009 1 commit
    • Simon Marlow's avatar
      RTS tidyup sweep, first phase · a2a67cd5
      Simon Marlow authored
      The first phase of this tidyup is focussed on the header files, and in
      particular making sure we are exposinng publicly exactly what we need
      to, and no more.
      
       - Rts.h now includes everything that the RTS exposes publicly,
         rather than a random subset of it.
      
       - Most of the public header files have moved into subdirectories, and
         many of them have been renamed.  But clients should not need to
         include any of the other headers directly, just #include the main
         public headers: Rts.h, HsFFI.h, RtsAPI.h.
      
       - All the headers needed for via-C compilation have moved into the
         stg subdirectory, which is self-contained.  Most of the headers for
         the rest of the RTS APIs have moved into the rts subdirectory.
      
       - I left MachDeps.h where it is, because it is so widely used in
         Haskell code.
       
       - I left a deprecated stub for RtsFlags.h in place.  The flag
         structures are now exposed by Rts.h.
      
       - Various internal APIs are no longer exposed by public header files.
      
       - Various bits of dead code and declarations have been removed
      
       - More gcc warnings are turned on, and the RTS code is more
         warning-clean.
      
       - More source files #include "PosixSource.h", and hence only use
         standard POSIX (1003.1c-1995) interfaces.
      
      There is a lot more tidying up still to do, this is just the first
      pass.  I also intend to standardise the names for external RTS APIs
      (e.g use the rts_ prefix consistently), and declare the internal APIs
      as hidden for shared libraries.
      a2a67cd5
  6. 12 Sep, 2008 1 commit
  7. 09 Sep, 2008 1 commit
  8. 19 Jun, 2008 1 commit
  9. 08 Jun, 2008 1 commit
  10. 16 Apr, 2008 4 commits
  11. 28 Feb, 2008 1 commit
  12. 31 Oct, 2007 1 commit
    • Simon Marlow's avatar
      Initial parallel GC support · f2ca6dee
      Simon Marlow authored
      eg. use +RTS -g2 -RTS for 2 threads.  Only major GCs are parallelised,
      minor GCs are still sequential. Don't use more threads than you
      have CPUs.
      
      It works most of the time, although you won't see much speedup yet.
      Tuning and more work on stability still required.
      f2ca6dee
  13. 14 Dec, 2006 1 commit
    • Simon Marlow's avatar
      Rework the block allocator · 485b8d1a
      Simon Marlow authored
      The main goal here is to reduce fragmentation, which turns out to be
      the case of #743.  While I was here I found some opportunities to
      improve performance too.  The code is rather more complex, but it also
      contains a long comment describing the strategy, so please take a look
      at that for the details.
      485b8d1a
  14. 21 Nov, 2006 2 commits
    • Simon Marlow's avatar
      d25ceab6
    • Simon Marlow's avatar
      optimisation to freeGroup() to avoid an O(N^2) pathalogical case · fec956d1
      Simon Marlow authored
      In the free list, we don't strictly speaking need to have every block
      in a coalesced group point to the head block, although this is an
      invariant for non-free blocks.  Dropping this invariant for the free
      list means that coalesce() is O(1) rather than O(N), and freeGroup()
      is therefore O(N) not O(N^2).
      
      The bad case probably didn't happen most of the time, indeed it has
      never shown up in a profile that I've seen.  I had a report from a
      while back that this was a problem with really large heaps, though.
      Fortunately the fix is easy.
      fec956d1
  15. 24 Oct, 2006 1 commit
    • Simon Marlow's avatar
      Split GC.c, and move storage manager into sm/ directory · ab0e778c
      Simon Marlow authored
      In preparation for parallel GC, split up the monolithic GC.c file into
      smaller parts.  Also in this patch (and difficult to separate,
      unfortunatley):
        
        - Don't include Stable.h in Rts.h, instead just include it where
          necessary.
        
        - consistently use STATIC_INLINE in source files, and INLINE_HEADER
          in header files.  STATIC_INLINE is now turned off when DEBUG is on,
          to make debugging easier.
        
        - The GC no longer takes the get_roots function as an argument.
          We weren't making use of this generalisation.
      ab0e778c
  16. 10 Aug, 2006 1 commit
  17. 07 Apr, 2006 1 commit
    • Simon Marlow's avatar
      Reorganisation of the source tree · 0065d5ab
      Simon Marlow authored
      Most of the other users of the fptools build system have migrated to
      Cabal, and with the move to darcs we can now flatten the source tree
      without losing history, so here goes.
      
      The main change is that the ghc/ subdir is gone, and most of what it
      contained is now at the top level.  The build system now makes no
      pretense at being multi-project, it is just the GHC build system.
      
      No doubt this will break many things, and there will be a period of
      instability while we fix the dependencies.  A straightforward build
      should work, but I haven't yet fixed binary/source distributions.
      Changes to the Building Guide will follow, too.
      0065d5ab
  18. 09 Feb, 2006 1 commit
    • Simon Marlow's avatar
      Merge the smp and threaded RTS ways · eba7b660
      Simon Marlow authored
      Now, the threaded RTS also includes SMP support.  The -smp flag is a
      synonym for -threaded.  The performance implications of this are small
      to negligible, and it results in a code cleanup and reduces the number
      of combinations we have to test.
      eba7b660
  19. 21 Oct, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-10-21 14:02:17 by simonmar] · 03a9ff01
      simonmar authored
      Big re-hash of the threaded/SMP runtime
      
      This is a significant reworking of the threaded and SMP parts of
      the runtime.  There are two overall goals here:
      
        - To push down the scheduler lock, reducing contention and allowing
          more parts of the system to run without locks.  In particular,
          the scheduler does not require a lock any more in the common case.
      
        - To improve affinity, so that running Haskell threads stick to the
          same OS threads as much as possible.
      
      At this point we have the basic structure working, but there are some
      pieces missing.  I believe it's reasonably stable - the important
      parts of the testsuite pass in all the (normal,threaded,SMP) ways.
      
      In more detail:
      
        - Each capability now has a run queue, instead of one global run
          queue.  The Capability and Task APIs have been completely
          rewritten; see Capability.h and Task.h for the details.
      
        - Each capability has its own pool of worker Tasks.  Hence, Haskell
          threads on a Capability's run queue will run on the same worker
          Task(s).  As long as the OS is doing something reasonable, this
          should mean they usually stick to the same CPU.  Another way to
          look at this is that we're assuming each Capability is associated
          with a fixed CPU.
      
        - What used to be StgMainThread is now part of the Task structure.
          Every OS thread in the runtime has an associated Task, and it
          can ask for its current Task at any time with myTask().
      
        - removed RTS_SUPPORTS_THREADS symbol, use THREADED_RTS instead
          (it is now defined for SMP too).
      
        - The RtsAPI has had to change; we must explicitly pass a Capability
          around now.  The previous interface assumed some global state.
          SchedAPI has also changed a lot.
      
        - The OSThreads API now supports thread-local storage, used to
          implement myTask(), although it could be done more efficiently
          using gcc's __thread extension when available.
      
        - I've moved some POSIX-specific stuff into the posix subdirectory,
          moving in the direction of separating out platform-specific
          implementations.
      
        - lots of lock-debugging and assertions in the runtime.  In particular,
          when DEBUG is on, we catch multiple ACQUIRE_LOCK()s, and there is
          also an ASSERT_LOCK_HELD() call.
      
      What's missing so far:
      
        - I have almost certainly broken the Win32 build, will fix soon.
      
        - any kind of thread migration or load balancing.  This is high up
          the agenda, though.
      
        - various performance tweaks to do
      
        - throwTo and forkProcess still do not work in SMP mode
      03a9ff01
  20. 27 Jul, 2005 1 commit
  21. 13 Jun, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-06-13 12:29:48 by simonmar] · b07f3876
      simonmar authored
      Block allocator performance fix: instead of keeping the free list
      ordered, keep it doubly-linked, and introduce a new flag BF_FREE so we
      can tell when a block is free.  We can still coalesce blocks on the
      free list because block descriptors are kept consecutively in memory,
      so we can tell based on the BF_FREE flag whether to coalesce with the
      next higher/lower blocks when freeing a block.
      
      This (almost) make freeChain O(n) rather than O(n^2), and has been
      reported to help a lot when dealing with very large heaps.
      b07f3876
  22. 10 Feb, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-02-10 13:01:52 by simonmar] · e7c3f957
      simonmar authored
      GC changes: instead of threading old-generation mutable lists
      through objects in the heap, keep it in a separate flat array.
      
      This has some advantages:
      
        - the IND_OLDGEN object is now only 2 words, so the minimum
          size of a THUNK is now 2 words instead of 3.  This saves
          some amount of allocation (about 2% on average according to
          my measurements), and is more friendly to the cache by
          squashing objects together more.
      
        - keeping the mutable list separate from the IND object
          will be necessary for our multiprocessor implementation.
      
        - removing the mut_link field makes the layout of some objects
          more uniform, leading to less complexity and special cases.
      
        - I also unified the two mutable lists (mut_once_list and mut_list)
          into a single mutable list, which lead to more simplifications
          in the GC.
      e7c3f957
  23. 12 Sep, 2004 1 commit
  24. 06 Sep, 2004 1 commit
  25. 03 Sep, 2004 1 commit
    • simonmar's avatar
      [project @ 2004-09-03 15:28:18 by simonmar] · 95ca6bff
      simonmar authored
      Cleanup: all (well, most) messages from the RTS now go through the
      functions in RtsUtils: barf(), debugBelch() and errorBelch().  The
      latter two were previously called belch() and prog_belch()
      respectively.  See the comments for the right usage of these message
      functions.
      
      One reason for doing this is so that we can avoid spurious uses of
      stdout/stderr by Haskell apps on platforms where we shouldn't be using
      them (eg. non-console apps on Windows).
      95ca6bff
  26. 12 Nov, 2003 1 commit
    • sof's avatar
      [project @ 2003-11-12 17:49:05 by sof] · 20593d1d
      sof authored
      Tweaks to have RTS (C) sources compile with MSVC. Apart from wibbles
      related to the handling of 'inline', changed Schedule.h:POP_RUN_QUEUE()
      not to use expression-level statement blocks.
      20593d1d
  27. 18 Feb, 2003 1 commit
  28. 28 Jan, 2003 1 commit
  29. 17 Jul, 2002 1 commit
    • simonmar's avatar
      [project @ 2002-07-17 09:21:48 by simonmar] · 7457757f
      simonmar authored
      Remove most #includes of system headers from Stg.h, and instead
      #include any required headers directly in each RTS source file.
      
      The idea is to (a) reduce namespace pollution from system headers that
      we don't need, (c) be clearer about dependencies on system things in
      the RTS, and (c) improve via-C compilation times (maybe).
      
      In practice though, HsBase.h #includes everything anyway, so the
      difference from the point of view of .hc source is minimal.  However,
      this makes it easier to move to zero-includes if we wanted to (see
      discussion on the FFI list; I'm still not sure that's possible but
      at least this is a step in the right direction).
      7457757f
  30. 08 Nov, 2001 3 commits
  31. 14 Aug, 2001 1 commit
    • sewardj's avatar
      [project @ 2001-08-14 13:40:07 by sewardj] · bc5c8021
      sewardj authored
      Change the story about POSIX headers in C compilation.
      
      Until now, all C code in the RTS and library cbits has by default been
      compiled with settings for POSIXness enabled, that is:
         #define _POSIX_SOURCE   1
         #define _POSIX_C_SOURCE 199309L
         #define _ISOC9X_SOURCE
      If you wanted to negate this, you'd have to define NON_POSIX_SOURCE
      before including headers.
      
      This scheme has some bad effects:
      
      * It means that ccall-unfoldings exported via interfaces from a
        module compiled with -DNON_POSIX_SOURCE may not compile when
        imported into a module which does not -DNON_POSIX_SOURCE.
      
      * It overlaps with the feature tests we do with autoconf.
      
      * It seems to have caused borkage in the Solaris builds for some
        considerable period of time.
      
      The New Way is:
      
      * The default changes to not-being-in-Posix mode.
      
      * If you want to force a C file into Posix mode, #include as
        the **first** include the new file ghc/includes/PosixSource.h.
        Most of the RTS C sources have this include now.
      
      * NON_POSIX_SOURCE is almost totally expunged.  Unfortunately
        we have to retain some vestiges of it in ghc/compiler so that
        modules compiled via C on Solaris using older compilers don't
        break.
      bc5c8021
  32. 23 Jul, 2001 2 commits
    • simonmar's avatar
      [project @ 2001-07-23 17:23:19 by simonmar] · dfd7d6d0
      simonmar authored
      Add a compacting garbage collector.
      
      It isn't enabled by default, as there are still a couple of problems:
      there's a fallback case I haven't implemented yet which means it will
      occasionally bomb out, and speed-wise it's quite a bit slower than the
      copying collector (about 1.8x slower).
      
      Until I can make it go faster, it'll only be useful when you're
      actually running low on real memory.
      
      '+RTS -c' to enable it.
      
      Oh, and I cleaned up a few things in the RTS while I was there, and
      fixed one or two possibly real bugs in the existing GC.
      dfd7d6d0
    • simonmar's avatar
      [project @ 2001-07-23 10:47:16 by simonmar] · 6f83fbc0
      simonmar authored
      Small changes to improve GC performance slightly:
      
        - store the generation *number* in the block descriptor rather
          than a pointer to the generation structure, since the most
          common operation is to pull out the generation number, and
          it's one less indirection this way.
      
        - cache the generation number in the step structure too, which
          avoids an extra indirection in several places.
      6f83fbc0
  33. 30 Jan, 2000 1 commit