1. 25 Aug, 2012 5 commits
  2. 20 Jul, 2011 1 commit
  3. 16 Dec, 2010 1 commit
  4. 15 Dec, 2010 1 commit
    • Simon Marlow's avatar
      Implement stack chunks and separate TSO/STACK objects · f30d5273
      Simon Marlow authored
      This patch makes two changes to the way stacks are managed:
      
      1. The stack is now stored in a separate object from the TSO.
      
      This means that it is easier to replace the stack object for a thread
      when the stack overflows or underflows; we don't have to leave behind
      the old TSO as an indirection any more.  Consequently, we can remove
      ThreadRelocated and deRefTSO(), which were a pain.
      
      This is obviously the right thing, but the last time I tried to do it
      it made performance worse.  This time I seem to have cracked it.
      
      2. Stacks are now represented as a chain of chunks, rather than
         a single monolithic object.
      
      The big advantage here is that individual chunks are marked clean or
      dirty according to whether they contain pointers to the young
      generation, and the GC can avoid traversing clean stack chunks during
      a young-generation collection.  This means that programs with deep
      stacks will see a big saving in GC overhead when using the default GC
      settings.
      
      A secondary advantage is that there is much less copying involved as
      the stack grows.  Programs that quickly grow a deep stack will see big
      improvements.
      
      In some ways the implementation is simpler, as nothing special needs
      to be done to reclaim stack as the stack shrinks (the GC just recovers
      the dead stack chunks).  On the other hand, we have to manage stack
      underflow between chunks, so there's a new stack frame
      (UNDERFLOW_FRAME), and we now have separate TSO and STACK objects.
      The total amount of code is probably about the same as before.
      
      There are new RTS flags:
      
         -ki<size> Sets the initial thread stack size (default 1k)  Egs: -ki4k -ki2m
         -kc<size> Sets the stack chunk size (default 32k)
         -kb<size> Sets the stack chunk buffer size (default 1k)
      
      -ki was previously called just -k, and the old name is still accepted
      for backwards compatibility.  These new options are documented.
      f30d5273
  5. 19 Jun, 2010 1 commit
  6. 01 Jan, 2010 1 commit
  7. 01 Apr, 2010 1 commit
    • Simon Marlow's avatar
      Remove the IND_OLDGEN and IND_OLDGEN_PERM closure types · 70a2431f
      Simon Marlow authored
      These are no longer used: once upon a time they used to have different
      layout from IND and IND_PERM respectively, but that is no longer the
      case since we changed the remembered set to be an array of addresses
      instead of a linked list of closures.
      70a2431f
  8. 29 Mar, 2010 1 commit
    • Simon Marlow's avatar
      New implementation of BLACKHOLEs · 5d52d9b6
      Simon Marlow authored
      This replaces the global blackhole_queue with a clever scheme that
      enables us to queue up blocked threads on the closure that they are
      blocked on, while still avoiding atomic instructions in the common
      case.
      
      Advantages:
      
       - gets rid of a locked global data structure and some tricky GC code
         (replacing it with some per-thread data structures and different
         tricky GC code :)
      
       - wakeups are more prompt: parallel/concurrent performance should
         benefit.  I haven't seen anything dramatic in the parallel
         benchmarks so far, but a couple of threading benchmarks do improve
         a bit.
      
       - waking up a thread blocked on a blackhole is now O(1) (e.g. if
         it is the target of throwTo).
      
       - less sharing and better separation of Capabilities: communication
         is done with messages, the data structures are strictly owned by a
         Capability and cannot be modified except by sending messages.
      
       - this change will utlimately enable us to do more intelligent
         scheduling when threads block on each other.  This is what started
         off the whole thing, but it isn't done yet (#3838).
      
      I'll be documenting all this on the wiki in due course.
      5d52d9b6
  9. 11 Mar, 2010 1 commit
    • Simon Marlow's avatar
      Use message-passing to implement throwTo in the RTS · 7408b392
      Simon Marlow authored
      This replaces some complicated locking schemes with message-passing
      in the implementation of throwTo. The benefits are
      
       - previously it was impossible to guarantee that a throwTo from
         a thread running on one CPU to a thread running on another CPU
         would be noticed, and we had to rely on the GC to pick up these
         forgotten exceptions. This no longer happens.
      
       - the locking regime is simpler (though the code is about the same
         size)
      
       - threads can be unblocked from a blocked_exceptions queue without
         having to traverse the whole queue now.  It's a rare case, but
         replaces an O(n) operation with an O(1).
      
       - generally we move in the direction of sharing less between
         Capabilities (aka HECs), which will become important with other
         changes we have planned.
      
      Also in this patch I replaced several STM-specific closure types with
      a generic MUT_PRIM closure type, which allowed a lot of code in the GC
      and other places to go away, hence the line-count reduction.  The
      message-passing changes resulted in about a net zero line-count
      difference.
      7408b392
  10. 17 Dec, 2009 1 commit
    • Simon Marlow's avatar
      Fix #650: use a card table to mark dirty sections of mutable arrays · 0417404f
      Simon Marlow authored
      The card table is an array of bytes, placed directly following the
      actual array data.  This means that array reading is unaffected, but
      array writing needs to read the array size from the header in order to
      find the card table.
      
      We use a bytemap rather than a bitmap, because updating the card table
      must be multi-thread safe.  Each byte refers to 128 entries of the
      array, but this is tunable by changing the constant
      MUT_ARR_PTRS_CARD_BITS in includes/Constants.h.
      0417404f
  11. 02 Aug, 2009 1 commit
    • Simon Marlow's avatar
      RTS tidyup sweep, first phase · a2a67cd5
      Simon Marlow authored
      The first phase of this tidyup is focussed on the header files, and in
      particular making sure we are exposinng publicly exactly what we need
      to, and no more.
      
       - Rts.h now includes everything that the RTS exposes publicly,
         rather than a random subset of it.
      
       - Most of the public header files have moved into subdirectories, and
         many of them have been renamed.  But clients should not need to
         include any of the other headers directly, just #include the main
         public headers: Rts.h, HsFFI.h, RtsAPI.h.
      
       - All the headers needed for via-C compilation have moved into the
         stg subdirectory, which is self-contained.  Most of the headers for
         the rest of the RTS APIs have moved into the rts subdirectory.
      
       - I left MachDeps.h where it is, because it is so widely used in
         Haskell code.
       
       - I left a deprecated stub for RtsFlags.h in place.  The flag
         structures are now exposed by Rts.h.
      
       - Various internal APIs are no longer exposed by public header files.
      
       - Various bits of dead code and declarations have been removed
      
       - More gcc warnings are turned on, and the RTS code is more
         warning-clean.
      
       - More source files #include "PosixSource.h", and hence only use
         standard POSIX (1003.1c-1995) interfaces.
      
      There is a lot more tidying up still to do, this is just the first
      pass.  I also intend to standardise the names for external RTS APIs
      (e.g use the rts_ prefix consistently), and declare the internal APIs
      as hidden for shared libraries.
      a2a67cd5