1. 07 Oct, 2006 1 commit
  2. 31 Aug, 2006 1 commit
    • Simon Marlow's avatar
      don't closeMutex() the Capability lock · afdbaf48
      Simon Marlow authored
      There might be threads in foreign calls that will attempt to return
      via resumeThread() and grab this lock, so we can't safely destroy it.
      
      Fixes one cause of
      
         internal error: ASSERTION FAILED: file Capability.c, line 90
      
      although I haven't repeated that assertion failure in the wild, only
      with a specially crafted test case, so I can't be sure I really got
      it.
      afdbaf48
  3. 25 Aug, 2006 1 commit
  4. 23 Aug, 2006 1 commit
  5. 16 Jun, 2006 1 commit
    • Simon Marlow's avatar
      Asynchronous exception support for SMP · b1953bbb
      Simon Marlow authored
      This patch makes throwTo work with -threaded, and also refactors large
      parts of the concurrency support in the RTS to clean things up.  We
      have some new files:
      
        RaiseAsync.{c,h}	asynchronous exception support
        Threads.{c,h}         general threading-related utils
      
      Some of the contents of these new files used to be in Schedule.c,
      which is smaller and cleaner as a result of the split.
      
      Asynchronous exception support in the presence of multiple running
      Haskell threads is rather tricky.  In fact, to my annoyance there are
      still one or two bugs to track down, but the majority of the tests run
      now.
      b1953bbb
  6. 08 Jun, 2006 1 commit
    • Simon Marlow's avatar
      New tracing interface · 5a2769f0
      Simon Marlow authored
      A simple interface for generating trace messages with timestamps and
      thread IDs attached to them.  Most debugging output goes through this
      interface now, so it is straightforward to get timestamped debugging
      traces with +RTS -vt.  Also, we plan to use this to generate
      parallelism profiles from the trace output.
      5a2769f0
  7. 07 Apr, 2006 1 commit
    • Simon Marlow's avatar
      Reorganisation of the source tree · 0065d5ab
      Simon Marlow authored
      Most of the other users of the fptools build system have migrated to
      Cabal, and with the move to darcs we can now flatten the source tree
      without losing history, so here goes.
      
      The main change is that the ghc/ subdir is gone, and most of what it
      contained is now at the top level.  The build system now makes no
      pretense at being multi-project, it is just the GHC build system.
      
      No doubt this will break many things, and there will be a period of
      instability while we fix the dependencies.  A straightforward build
      should work, but I haven't yet fixed binary/source distributions.
      Changes to the Building Guide will follow, too.
      0065d5ab
  8. 24 Mar, 2006 1 commit
    • Simon Marlow's avatar
      Add some more flexibility to the multiproc scheduler · 4368121d
      Simon Marlow authored
      There are two new options in the -threaded RTS:
       
        -qm       Don't automatically migrate threads between CPUs
        -qw       Migrate a thread to the current CPU when it is woken up
      
      previously both of these were effectively off, i.e. threads were
      migrated between CPUs willy-milly, and threads were always migrated to
      the current CPU when woken up.  This is the first step in tweaking the
      scheduling for more effective work balancing, there will no doubt be
      more to come.
      4368121d
  9. 15 Mar, 2006 1 commit
    • Simon Marlow's avatar
      Improvements to shutting down of the runtime · 5638488b
      Simon Marlow authored
      Yet another attempt at shutdown & interruption.  This one appears to
      work better; ^C is more responsive in multi threaded / SMP, and I
      fixed one case where the runtime wasn't responding to ^C at all.
      5638488b
  10. 13 Mar, 2006 2 commits
  11. 10 Feb, 2006 1 commit
  12. 09 Feb, 2006 2 commits
    • Simon Marlow's avatar
      Merge the smp and threaded RTS ways · eba7b660
      Simon Marlow authored
      Now, the threaded RTS also includes SMP support.  The -smp flag is a
      synonym for -threaded.  The performance implications of this are small
      to negligible, and it results in a code cleanup and reduces the number
      of combinations we have to test.
      eba7b660
    • Simon Marlow's avatar
      fix for the unregisterised way · 7a605453
      Simon Marlow authored
      We always assign to BaseReg on return from resumeThread(), but in
      cases where BaseReg is not an lvalue (eg. unreg) we need to disable
      this assigment.  See comments for more details.
      7a605453
  13. 18 Jan, 2006 1 commit
  14. 25 Nov, 2005 1 commit
  15. 21 Nov, 2005 1 commit
  16. 18 Nov, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-11-18 15:24:12 by simonmar] · c5cd2343
      simonmar authored
      Two improvements to the SMP runtime:
      
        - support for 'par', aka sparks.  Load balancing is very primitive
          right now, but I have seen programs that go faster using par.
      
        - support for backing off when a thread is found to be duplicating
          a computation currently underway in another thread.  This also
          fixes some instability in SMP, because it turned out that when
          an update frame points to an indirection, which can happen if
          a thunk is under evaluation in multiple threads, then after GC
          has shorted out the indirection the update will trash the value.
          Now we suspend the duplicate computation to the heap before this
          can happen.
      
      Additionally:
      
        - stack squeezing is separate from lazy blackholing, and now only
          happens if there's a reasonable amount of squeezing to be done
          in relation to the number of words of stack that have to be moved.
          This means we won't try to shift 10Mb of stack just to save 2
          words at the bottom (it probably never happened, but still).
      
        - update frames are now marked when they have been visited by lazy
          blackholing, as per the SMP paper.
      
        - cleaned up raiseAsync() a bit.
      c5cd2343
  17. 03 Nov, 2005 1 commit
  18. 29 Oct, 2005 1 commit
  19. 27 Oct, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-10-27 15:26:06 by simonmar] · 677c6345
      simonmar authored
      - Very simple work-sharing amongst Capabilities: whenever a Capability
        detects that it has more than 1 thread in its run queue, it runs
        around looking for empty Capabilities, and shares the threads on its
        run queue equally with the free Capabilities it finds.
      
      - unlock the garbage collector's mutable lists, by having private
        mutable lists per capability (and per generation).  The private
        mutable lists are moved onto the main mutable lists at each GC.
        This pulls the old-generation update code out of the storage manager
        mutex, which is one of the last remaining causes of (alleged) contention.
      
      - Fix some problems with synchronising when a GC is required.  We should
        synchronise quicker now.
      677c6345
  20. 26 Oct, 2005 2 commits
  21. 21 Oct, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-10-21 14:02:17 by simonmar] · 03a9ff01
      simonmar authored
      Big re-hash of the threaded/SMP runtime
      
      This is a significant reworking of the threaded and SMP parts of
      the runtime.  There are two overall goals here:
      
        - To push down the scheduler lock, reducing contention and allowing
          more parts of the system to run without locks.  In particular,
          the scheduler does not require a lock any more in the common case.
      
        - To improve affinity, so that running Haskell threads stick to the
          same OS threads as much as possible.
      
      At this point we have the basic structure working, but there are some
      pieces missing.  I believe it's reasonably stable - the important
      parts of the testsuite pass in all the (normal,threaded,SMP) ways.
      
      In more detail:
      
        - Each capability now has a run queue, instead of one global run
          queue.  The Capability and Task APIs have been completely
          rewritten; see Capability.h and Task.h for the details.
      
        - Each capability has its own pool of worker Tasks.  Hence, Haskell
          threads on a Capability's run queue will run on the same worker
          Task(s).  As long as the OS is doing something reasonable, this
          should mean they usually stick to the same CPU.  Another way to
          look at this is that we're assuming each Capability is associated
          with a fixed CPU.
      
        - What used to be StgMainThread is now part of the Task structure.
          Every OS thread in the runtime has an associated Task, and it
          can ask for its current Task at any time with myTask().
      
        - removed RTS_SUPPORTS_THREADS symbol, use THREADED_RTS instead
          (it is now defined for SMP too).
      
        - The RtsAPI has had to change; we must explicitly pass a Capability
          around now.  The previous interface assumed some global state.
          SchedAPI has also changed a lot.
      
        - The OSThreads API now supports thread-local storage, used to
          implement myTask(), although it could be done more efficiently
          using gcc's __thread extension when available.
      
        - I've moved some POSIX-specific stuff into the posix subdirectory,
          moving in the direction of separating out platform-specific
          implementations.
      
        - lots of lock-debugging and assertions in the runtime.  In particular,
          when DEBUG is on, we catch multiple ACQUIRE_LOCK()s, and there is
          also an ASSERT_LOCK_HELD() call.
      
      What's missing so far:
      
        - I have almost certainly broken the Win32 build, will fix soon.
      
        - any kind of thread migration or load balancing.  This is high up
          the agenda, though.
      
        - various performance tweaks to do
      
        - throwTo and forkProcess still do not work in SMP mode
      03a9ff01
  22. 23 May, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-05-23 15:44:10 by simonmar] · 6d16c476
      simonmar authored
      Simplify and improve the Capability-passing machinery for bound
      threads.
      
      The old story was quite complicated: if you find a thread on the run
      queue which the current task can't run, you had to call
      passCapability(), which set a flag saying where the next Capability
      was to go, and then release the Capability.  When multiple
      Capabilities are flying around, it's not clear how this story should
      extend.
      
      The new story is much simpler: each time around the scheduler loop,
      the task looks to see whether it can make any progress, and if not, it
      releases its Capability and wakes up a task which *can* make some
      progress.  The predicate for whether we can make any progress is
      encapsulated in the (inline) function ANY_WORK_FOR_ME(Condition).
      Waking up an appropriate task is encapsulated in the function
      threadRunnable() (previously it was in two places).
      
      The logic in Capability.c is simpler, but unfortunately it is now more
      closely connected with the Scheduler, because it inspects the run
      queue.  However, performance when communicating between bound and
      unbound threads might be better.
      
      The concurrency tests still work, so hopefully this hasn't broken
      anything.
      6d16c476
  23. 11 May, 2005 1 commit
  24. 10 May, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-05-10 13:25:41 by simonmar] · bf821981
      simonmar authored
      Two SMP-related changes:
      
        - New storage manager interface:
      
          bdescr *allocateLocal(StgRegTable *reg, nat words)
      
          which allocates from the current thread's nursery (being careful
          not to clash with the heap pointer).  It can do this without
          taking any locks; the lock only has to be taken if a block needs
          to be allocated.  allocateLocal() is now used instead of allocate()
          in a few PrimOps.
      
          This removes locks from most Integer operations, cutting down
          the overhead for SMP a bit more.
      
          To make this work, we have to be able to grab the current thread's
          Capability out of thin air (i.e. when called from GMP), so the
          Capability subsystem needs to keep a hash from thread IDs to
          Capabilities.
      
        - Small MVar optimisation: instead of taking the global
          storage-manager lock, do our own locking of MVars with a bit of
          inline assembly (x86 only for now).
      bf821981
  25. 27 Apr, 2005 1 commit
  26. 12 Apr, 2005 2 commits
    • simonmar's avatar
      [project @ 2005-04-12 12:24:44 by simonmar] · 3949e528
      simonmar authored
      cleanup
      3949e528
    • simonmar's avatar
      [project @ 2005-04-12 09:04:23 by simonmar] · 693550d9
      simonmar authored
      Per-task nurseries for SMP.  This was kind-of implemented before, but
      it's much cleaner now.  There is now one *step* per capability, so we
      have somewhere to hang the block count.  So for SMP, there are simply
      multiple instances of generation 0 step 0.  The rNursery entry in the
      register table now points to the step rather than the head block of
      the nurersy.
      693550d9
  27. 07 Apr, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-04-07 14:33:30 by simonmar] · 5a148f04
      simonmar authored
      Support handling signals in the threaded RTS by passing the signal
      number down the pipe to the IO manager.  This avoids needing
      synchronisation in the signal handler.
      
      Signals should now work with -threaded.  Since this is a bugfix, I'll
      merge the changes into the 6.4 branch.
      5a148f04
  28. 06 Apr, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-04-06 15:27:06 by simonmar] · 9a92cb1c
      simonmar authored
      Revamp the Task API: now we use the same implementation for threaded
      and SMP.  We also keep per-task timing stats in the threaded RTS now,
      which makes the output of +RTS -sstderr more useful.
      9a92cb1c
  29. 05 Apr, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-04-05 12:19:54 by simonmar] · 16214216
      simonmar authored
      Some multi-processor hackery, including
      
        - Don't hang blocked threads off BLACKHOLEs any more, instead keep
          them all on a separate queue which is checked periodically for
          threads to wake up.
      
          This is good because (a) we don't have to worry about locking the
          closure in SMP mode when we want to block on it, and (b) it means
          the standard update code doesn't need to wake up any threads or
          check for a BLACKHOLE_BQ, simplifying the update code.
      
          The downside is that if there are lots of threads blocked on
          BLACKHOLEs, we might have to do a lot of repeated list traversal.
          We don't expect this to be common, though.  conc023 goes slower
          with this change, but we expect most programs to benefit from the
          shorter update code.
      
        - Fixing up the Capability code to handle multiple capabilities (SMP
          mode), and related changes to get the SMP mode at least building.
      16214216
  30. 22 Nov, 2004 1 commit
  31. 14 Oct, 2004 1 commit
    • simonmar's avatar
      [project @ 2004-10-14 14:58:37 by simonmar] · bb01a96b
      simonmar authored
      Threaded RTS improvements:
      
       - Unix only: implement waitRead#, waitWrite# and delay# in Haskell,
         by having a single Haskell thread (the IO manager) performing a blocking
         select() operation.  Threads communicate with the IO manager
         via channels.  This is faster than doing the select() in the RTS,
         because we only restart the select() when a new request arrives,
         rather than each time around the scheduler.
      
         On Windows we just make blocking IO calls, we don't have a fancy IO
         manager (yet).
      
       - Simplify the scheduler for the threaded RTS, now that we don't have
         to wait for IO in the scheduler loop.
      
       - Remove detectBlackHoles(), which isn't used now (not sure how long
         this has been unused for... perhaps it was needed back when main threads
         used to be GC roots, so we had to check for blackholes manually rather
         than relying on the GC.)
      
      Signals aren't quite right in the threaded RTS.  In fact, they're
      slightly worse than before, because the thread receiving signals might
      be blocked in a C call - previously there always be another thread
      stuck in awaitEvent() that would notice the signal, but that's not
      true now.  I can't see an easy fix yet.
      bb01a96b
  32. 01 Mar, 2004 1 commit
  33. 26 Feb, 2004 1 commit
  34. 19 Dec, 2003 1 commit
  35. 16 Dec, 2003 1 commit
    • simonmar's avatar
      [project @ 2003-12-16 13:27:31 by simonmar] · de02e02a
      simonmar authored
      Clean up Capability API
      ~~~~~~~~~~~~~~~~~~~~~~~
      
      - yieldToReturningWorker() is now yieldCapability(), and performs all
        kinds of yielding (both to returning workers and passing to other
        OS threads).  yieldCapabiltiy() does *not* re-acquire a capability.
      
      - waitForWorkCapabilty() is now waitForCapability().
      
      - releaseCapbility() also releases the capability when passing to
        another OS thread.  It is the only way to release a capability (apart
        from yieldCapability(), which calls releaseCapability() internally).
      
      - passCapability() and passCapabilityToWorker() now do not release the
        capability.  They just set a flag to indicate where the capabiliy
        should go when it it next released.
      
      
      Other cleanups:
      
        - Removed all the SMP stuff from Schedule.c.  It had extensive bitrot,
          and was just obfuscating the code.  If it is ever needed again,
          it can be resurrected from CVS.
      
        - Removed some other dead code in Schedule.c, in an attempt to make
          this file more manageable.
      de02e02a
  36. 15 Dec, 2003 1 commit