1. 09 Feb, 2006 1 commit
    • Simon Marlow's avatar
      fix for the unregisterised way · 7a605453
      Simon Marlow authored
      We always assign to BaseReg on return from resumeThread(), but in
      cases where BaseReg is not an lvalue (eg. unreg) we need to disable
      this assigment.  See comments for more details.
      7a605453
  2. 18 Jan, 2006 1 commit
  3. 25 Nov, 2005 1 commit
  4. 21 Nov, 2005 1 commit
  5. 18 Nov, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-11-18 15:24:12 by simonmar] · c5cd2343
      simonmar authored
      Two improvements to the SMP runtime:
      
        - support for 'par', aka sparks.  Load balancing is very primitive
          right now, but I have seen programs that go faster using par.
      
        - support for backing off when a thread is found to be duplicating
          a computation currently underway in another thread.  This also
          fixes some instability in SMP, because it turned out that when
          an update frame points to an indirection, which can happen if
          a thunk is under evaluation in multiple threads, then after GC
          has shorted out the indirection the update will trash the value.
          Now we suspend the duplicate computation to the heap before this
          can happen.
      
      Additionally:
      
        - stack squeezing is separate from lazy blackholing, and now only
          happens if there's a reasonable amount of squeezing to be done
          in relation to the number of words of stack that have to be moved.
          This means we won't try to shift 10Mb of stack just to save 2
          words at the bottom (it probably never happened, but still).
      
        - update frames are now marked when they have been visited by lazy
          blackholing, as per the SMP paper.
      
        - cleaned up raiseAsync() a bit.
      c5cd2343
  6. 03 Nov, 2005 1 commit
  7. 29 Oct, 2005 1 commit
  8. 27 Oct, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-10-27 15:26:06 by simonmar] · 677c6345
      simonmar authored
      - Very simple work-sharing amongst Capabilities: whenever a Capability
        detects that it has more than 1 thread in its run queue, it runs
        around looking for empty Capabilities, and shares the threads on its
        run queue equally with the free Capabilities it finds.
      
      - unlock the garbage collector's mutable lists, by having private
        mutable lists per capability (and per generation).  The private
        mutable lists are moved onto the main mutable lists at each GC.
        This pulls the old-generation update code out of the storage manager
        mutex, which is one of the last remaining causes of (alleged) contention.
      
      - Fix some problems with synchronising when a GC is required.  We should
        synchronise quicker now.
      677c6345
  9. 26 Oct, 2005 2 commits
  10. 21 Oct, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-10-21 14:02:17 by simonmar] · 03a9ff01
      simonmar authored
      Big re-hash of the threaded/SMP runtime
      
      This is a significant reworking of the threaded and SMP parts of
      the runtime.  There are two overall goals here:
      
        - To push down the scheduler lock, reducing contention and allowing
          more parts of the system to run without locks.  In particular,
          the scheduler does not require a lock any more in the common case.
      
        - To improve affinity, so that running Haskell threads stick to the
          same OS threads as much as possible.
      
      At this point we have the basic structure working, but there are some
      pieces missing.  I believe it's reasonably stable - the important
      parts of the testsuite pass in all the (normal,threaded,SMP) ways.
      
      In more detail:
      
        - Each capability now has a run queue, instead of one global run
          queue.  The Capability and Task APIs have been completely
          rewritten; see Capability.h and Task.h for the details.
      
        - Each capability has its own pool of worker Tasks.  Hence, Haskell
          threads on a Capability's run queue will run on the same worker
          Task(s).  As long as the OS is doing something reasonable, this
          should mean they usually stick to the same CPU.  Another way to
          look at this is that we're assuming each Capability is associated
          with a fixed CPU.
      
        - What used to be StgMainThread is now part of the Task structure.
          Every OS thread in the runtime has an associated Task, and it
          can ask for its current Task at any time with myTask().
      
        - removed RTS_SUPPORTS_THREADS symbol, use THREADED_RTS instead
          (it is now defined for SMP too).
      
        - The RtsAPI has had to change; we must explicitly pass a Capability
          around now.  The previous interface assumed some global state.
          SchedAPI has also changed a lot.
      
        - The OSThreads API now supports thread-local storage, used to
          implement myTask(), although it could be done more efficiently
          using gcc's __thread extension when available.
      
        - I've moved some POSIX-specific stuff into the posix subdirectory,
          moving in the direction of separating out platform-specific
          implementations.
      
        - lots of lock-debugging and assertions in the runtime.  In particular,
          when DEBUG is on, we catch multiple ACQUIRE_LOCK()s, and there is
          also an ASSERT_LOCK_HELD() call.
      
      What's missing so far:
      
        - I have almost certainly broken the Win32 build, will fix soon.
      
        - any kind of thread migration or load balancing.  This is high up
          the agenda, though.
      
        - various performance tweaks to do
      
        - throwTo and forkProcess still do not work in SMP mode
      03a9ff01
  11. 23 May, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-05-23 15:44:10 by simonmar] · 6d16c476
      simonmar authored
      Simplify and improve the Capability-passing machinery for bound
      threads.
      
      The old story was quite complicated: if you find a thread on the run
      queue which the current task can't run, you had to call
      passCapability(), which set a flag saying where the next Capability
      was to go, and then release the Capability.  When multiple
      Capabilities are flying around, it's not clear how this story should
      extend.
      
      The new story is much simpler: each time around the scheduler loop,
      the task looks to see whether it can make any progress, and if not, it
      releases its Capability and wakes up a task which *can* make some
      progress.  The predicate for whether we can make any progress is
      encapsulated in the (inline) function ANY_WORK_FOR_ME(Condition).
      Waking up an appropriate task is encapsulated in the function
      threadRunnable() (previously it was in two places).
      
      The logic in Capability.c is simpler, but unfortunately it is now more
      closely connected with the Scheduler, because it inspects the run
      queue.  However, performance when communicating between bound and
      unbound threads might be better.
      
      The concurrency tests still work, so hopefully this hasn't broken
      anything.
      6d16c476
  12. 11 May, 2005 1 commit
  13. 10 May, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-05-10 13:25:41 by simonmar] · bf821981
      simonmar authored
      Two SMP-related changes:
      
        - New storage manager interface:
      
          bdescr *allocateLocal(StgRegTable *reg, nat words)
      
          which allocates from the current thread's nursery (being careful
          not to clash with the heap pointer).  It can do this without
          taking any locks; the lock only has to be taken if a block needs
          to be allocated.  allocateLocal() is now used instead of allocate()
          in a few PrimOps.
      
          This removes locks from most Integer operations, cutting down
          the overhead for SMP a bit more.
      
          To make this work, we have to be able to grab the current thread's
          Capability out of thin air (i.e. when called from GMP), so the
          Capability subsystem needs to keep a hash from thread IDs to
          Capabilities.
      
        - Small MVar optimisation: instead of taking the global
          storage-manager lock, do our own locking of MVars with a bit of
          inline assembly (x86 only for now).
      bf821981
  14. 27 Apr, 2005 1 commit
  15. 12 Apr, 2005 2 commits
    • simonmar's avatar
      [project @ 2005-04-12 12:24:44 by simonmar] · 3949e528
      simonmar authored
      cleanup
      3949e528
    • simonmar's avatar
      [project @ 2005-04-12 09:04:23 by simonmar] · 693550d9
      simonmar authored
      Per-task nurseries for SMP.  This was kind-of implemented before, but
      it's much cleaner now.  There is now one *step* per capability, so we
      have somewhere to hang the block count.  So for SMP, there are simply
      multiple instances of generation 0 step 0.  The rNursery entry in the
      register table now points to the step rather than the head block of
      the nurersy.
      693550d9
  16. 07 Apr, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-04-07 14:33:30 by simonmar] · 5a148f04
      simonmar authored
      Support handling signals in the threaded RTS by passing the signal
      number down the pipe to the IO manager.  This avoids needing
      synchronisation in the signal handler.
      
      Signals should now work with -threaded.  Since this is a bugfix, I'll
      merge the changes into the 6.4 branch.
      5a148f04
  17. 06 Apr, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-04-06 15:27:06 by simonmar] · 9a92cb1c
      simonmar authored
      Revamp the Task API: now we use the same implementation for threaded
      and SMP.  We also keep per-task timing stats in the threaded RTS now,
      which makes the output of +RTS -sstderr more useful.
      9a92cb1c
  18. 05 Apr, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-04-05 12:19:54 by simonmar] · 16214216
      simonmar authored
      Some multi-processor hackery, including
      
        - Don't hang blocked threads off BLACKHOLEs any more, instead keep
          them all on a separate queue which is checked periodically for
          threads to wake up.
      
          This is good because (a) we don't have to worry about locking the
          closure in SMP mode when we want to block on it, and (b) it means
          the standard update code doesn't need to wake up any threads or
          check for a BLACKHOLE_BQ, simplifying the update code.
      
          The downside is that if there are lots of threads blocked on
          BLACKHOLEs, we might have to do a lot of repeated list traversal.
          We don't expect this to be common, though.  conc023 goes slower
          with this change, but we expect most programs to benefit from the
          shorter update code.
      
        - Fixing up the Capability code to handle multiple capabilities (SMP
          mode), and related changes to get the SMP mode at least building.
      16214216
  19. 22 Nov, 2004 1 commit
  20. 14 Oct, 2004 1 commit
    • simonmar's avatar
      [project @ 2004-10-14 14:58:37 by simonmar] · bb01a96b
      simonmar authored
      Threaded RTS improvements:
      
       - Unix only: implement waitRead#, waitWrite# and delay# in Haskell,
         by having a single Haskell thread (the IO manager) performing a blocking
         select() operation.  Threads communicate with the IO manager
         via channels.  This is faster than doing the select() in the RTS,
         because we only restart the select() when a new request arrives,
         rather than each time around the scheduler.
      
         On Windows we just make blocking IO calls, we don't have a fancy IO
         manager (yet).
      
       - Simplify the scheduler for the threaded RTS, now that we don't have
         to wait for IO in the scheduler loop.
      
       - Remove detectBlackHoles(), which isn't used now (not sure how long
         this has been unused for... perhaps it was needed back when main threads
         used to be GC roots, so we had to check for blackholes manually rather
         than relying on the GC.)
      
      Signals aren't quite right in the threaded RTS.  In fact, they're
      slightly worse than before, because the thread receiving signals might
      be blocked in a C call - previously there always be another thread
      stuck in awaitEvent() that would notice the signal, but that's not
      true now.  I can't see an easy fix yet.
      bb01a96b
  21. 01 Mar, 2004 1 commit
  22. 26 Feb, 2004 1 commit
  23. 19 Dec, 2003 1 commit
  24. 16 Dec, 2003 1 commit
    • simonmar's avatar
      [project @ 2003-12-16 13:27:31 by simonmar] · de02e02a
      simonmar authored
      Clean up Capability API
      ~~~~~~~~~~~~~~~~~~~~~~~
      
      - yieldToReturningWorker() is now yieldCapability(), and performs all
        kinds of yielding (both to returning workers and passing to other
        OS threads).  yieldCapabiltiy() does *not* re-acquire a capability.
      
      - waitForWorkCapabilty() is now waitForCapability().
      
      - releaseCapbility() also releases the capability when passing to
        another OS thread.  It is the only way to release a capability (apart
        from yieldCapability(), which calls releaseCapability() internally).
      
      - passCapability() and passCapabilityToWorker() now do not release the
        capability.  They just set a flag to indicate where the capabiliy
        should go when it it next released.
      
      
      Other cleanups:
      
        - Removed all the SMP stuff from Schedule.c.  It had extensive bitrot,
          and was just obfuscating the code.  If it is ever needed again,
          it can be resurrected from CVS.
      
        - Removed some other dead code in Schedule.c, in an attempt to make
          this file more manageable.
      de02e02a
  25. 15 Dec, 2003 3 commits
    • simonmar's avatar
      [project @ 2003-12-15 16:45:23 by simonmar] · 2ca13796
      simonmar authored
      Add assertion.
      2ca13796
    • simonmar's avatar
      [project @ 2003-12-15 16:43:45 by simonmar] · 410a99e4
      simonmar authored
      Debugging output wibble
      410a99e4
    • simonmar's avatar
      [project @ 2003-12-15 16:23:54 by simonmar] · 56a125f2
      simonmar authored
      Fix a deadlock: an OS thread returning from a C call could enter
      grabReturnCapability, grabbing the capability that was in the process
      of being passed to another thread via passCapability.  This leads to a
      deadlock shortly afterward, because the passCapability flag is still
      set, so a normal worker won't pick up the capability when it is
      released.
      
      Fix (not sure if this is the best fix, though): don't grab the
      capability in grabReturnCapability() if passCapabilty is set.
      56a125f2
  26. 12 Dec, 2003 1 commit
  27. 12 Nov, 2003 1 commit
    • sof's avatar
      [project @ 2003-11-12 17:49:05 by sof] · 20593d1d
      sof authored
      Tweaks to have RTS (C) sources compile with MSVC. Apart from wibbles
      related to the handling of 'inline', changed Schedule.h:POP_RUN_QUEUE()
      not to use expression-level statement blocks.
      20593d1d
  28. 01 Oct, 2003 1 commit
    • wolfgang's avatar
      [project @ 2003-10-01 10:49:07 by wolfgang] · 324e96d2
      wolfgang authored
      Threaded RTS:
      Don't start new worker threads earlier than necessary.
      After this commit, a Haskell program that uses neither forkOS nor forkIO is
      really single-threaded (rather than using two OS threads internally).
      
      Some details:
      Worker threads are now only created when a capability is released, and
      only when
      (there are no worker threads)
      	&& (there are runnable Haskell threads ||
      	    there are Haskell threads blocked on IO or threadDelay)
      awaitEvent can now be called from bound thread scheduling loops
      (so that we don't have to create a worker thread just to run awaitEvent)
      324e96d2
  29. 23 Sep, 2003 1 commit
  30. 21 Sep, 2003 1 commit
    • wolfgang's avatar
      [project @ 2003-09-21 22:20:51 by wolfgang] · 85aa72b9
      wolfgang authored
      Bound Threads
      =============
      
      Introduce a way to use foreign libraries that rely on thread local state
      from multiple threads (mainly affects the threaded RTS).
      
      See the file threads.tex in CVS at haskell-report/ffi/threads.tex
      (not entirely finished yet) for a definition of this extension. A less formal
      description is also found in the documentation of Control.Concurrent.
      
      The changes mostly affect the THREADED_RTS (./configure --enable-threaded-rts),
      except for saving & restoring errno on a per-TSO basis, which is also necessary
      for the non-threaded RTS (a bugfix).
      
      Detailed list of changes
      ------------------------
      
      - errno is saved in the TSO object and restored when necessary:
      ghc/includes/TSO.h, ghc/rts/Interpreter.c, ghc/rts/Schedule.c
      
      - rts_mainLazyIO is no longer needed, main is no special case anymore
      ghc/includes/RtsAPI.h, ghc/rts/RtsAPI.c, ghc/rts/Main.c, ghc/rts/Weak.c
      
      - passCapability: a new function that releases the capability and "passes"
        it to a specific OS thread:
      ghc/rts/Capability.h ghc/rts/Capability.c
      
      - waitThread(), scheduleWaitThread() and schedule() get an optional
        Capability *initialCapability passed as an argument:
      ghc/includes/SchedAPI.h, ghc/rts/Schedule.c, ghc/rts/RtsAPI.c
      
      - Bound Thread scheduling (that's what this is all about):
      ghc/rts/Schedule.h, ghc/rts/Schedule.c
      
      - new Primop isCurrentThreadBound#:
      ghc/compiler/prelude/primops.txt.pp, ghc/includes/PrimOps.h, ghc/rts/PrimOps.hc,
      ghc/rts/Schedule.h, ghc/rts/Schedule.c
      
      - a simple function, rtsSupportsBoundThreads, that returns true if THREADED_RTS
        is defined:
      ghc/rts/Schedule.h, ghc/rts/Schedule.c
      
      - a new implementation of forkProcess (the old implementation stays in place
        for the non-threaded case). Partially broken; works for the standard
        fork-and-exec case, but not for much else. A proper forkProcess is
        really next to impossible to implement:
      ghc/rts/Schedule.c
      
      - Library support for bound threads:
          Control.Concurrent.
            rtsSupportsBoundThreads, isCurrentThreadBound, forkOS,
            runInBoundThread, runInUnboundThread
      libraries/base/Control/Concurrent.hs, libraries/base/Makefile,
      libraries/base/include/HsBase.h, libraries/base/cbits/forkOS.c (new file)
      85aa72b9
  31. 27 Jan, 2003 1 commit
  32. 25 Jan, 2003 1 commit
    • wolfgang's avatar
      [project @ 2003-01-25 15:54:48 by wolfgang] · af136096
      wolfgang authored
      This commit fixes many bugs and limitations in the threaded RTS.
      There are still some issues remaining, though.
      
      The following bugs should have been fixed:
      
      - [+] "safe" calls could cause crashes
      - [+] yieldToReturningWorker/grabReturnCapability
          -     It used to deadlock.
      - [+] couldn't wake blocked workers
          -     Calls into the RTS could go unanswered for a long time, and
                that includes ordinary callbacks in some circumstances.
      - [+] couldn't block on an MVar and expect to be woken up by a signal
            handler
          -     Depending on the exact situation, the RTS shut down or
                blocked forever and ignored the signal.
      - [+] The locking scheme in RtsAPI.c didn't work
      - [+] run_thread label in wrong place (schedule())
      - [+] Deadlock in GHC.Handle
          -     if a signal arrived at the wrong time, an mvar was never
                filled again
      - [+] Signals delivered to the "wrong" thread were ignored or handled
            too late.
      
      Issues:
      *) If GC can move TSO objects (I don't know - can it?), then ghci
      will occasionally crash when calling foreign functions, because the
      parameters are stored on the TSO stack.
      
      *) There is still a race condition lurking in the code
      (both threaded and non-threaded RTS are affected):
      If a signal arrives after the check for pending signals in
      schedule(), but before the call to select() in awaitEvent(),
      select() will be called anyway. The signal handler will be
      executed much later than expected.
      
      *) For Win32, GHC doesn't yet support non-blocking IO, so while a
      thread is waiting for IO, no call-ins can happen. If the RTS is
      blocked in awaitEvent, it uses a polling loop on Win32, so call-ins
      should work (although the polling loop looks ugly).
      
      *) Deadlock detection is disabled for the threaded rts, because I
      don't know how to do it properly in the presence of foreign call-ins
      from foreign threads.
      This causes the tests conc031, conc033 and conc034 to fail.
      
      *) "safe" is currently treated as "threadsafe". Implementing "safe" in
      a way that blocks other Haskell threads is more difficult than was
      thought at first. I think it could be done with a few additional lines
      of code, but personally, I'm strongly in favour of abolishing the
      distinction.
      
      *) Running finalizers at program termination is inefficient - there
      are two OS threads passing messages back and forth for every finalizer
      that is run. Also (just as in the non-threaded case) the finalizers
      are run in parallel to any remaining haskell threads and to any
      foreign call-ins that might still happen.
      af136096
  33. 11 Dec, 2002 1 commit
    • simonmar's avatar
      [project @ 2002-12-11 15:36:20 by simonmar] · 0bffc410
      simonmar authored
      Merge the eval-apply-branch on to the HEAD
      ------------------------------------------
      
      This is a change to GHC's evaluation model in order to ultimately make
      GHC more portable and to reduce complexity in some areas.
      
      At some point we'll update the commentary to describe the new state of
      the RTS.  Pending that, the highlights of this change are:
      
        - No more Su.  The Su register is gone, update frames are one
          word smaller.
      
        - Slow-entry points and arg checks are gone.  Unknown function calls
          are handled by automatically-generated RTS entry points (AutoApply.hc,
          generated by the program in utils/genapply).
      
        - The stack layout is stricter: there are no "pending arguments" on
          the stack any more, the stack is always strictly a sequence of
          stack frames.
      
          This means that there's no need for LOOKS_LIKE_GHC_INFO() or
          LOOKS_LIKE_STATIC_CLOSURE() any more, and GHC doesn't need to know
          how to find the boundary between the text and data segments (BIG WIN!).
      
        - A couple of nasty hacks in the mangler caused by the neet to
          identify closure ptrs vs. info tables have gone away.
      
        - Info tables are a bit more complicated.  See InfoTables.h for the
          details.
      
        - As a side effect, GHCi can now deal with polymorphic seq.  Some bugs
          in GHCi which affected primitives and unboxed tuples are now
          fixed.
      
        - Binary sizes are reduced by about 7% on x86.  Performance is roughly
          similar, some programs get faster while some get slower.  I've seen
          GHCi perform worse on some examples, but haven't investigated
          further yet (GHCi performance *should* be about the same or better
          in theory).
      
        - Internally the code generator is rather better organised.  I've moved
          info-table generation from the NCG into the main codeGen where it is
          shared with the C back-end; info tables are now emitted as arrays
          of words in both back-ends.  The NCG is one step closer to being able
          to support profiling.
      
      This has all been fairly thoroughly tested, but no doubt I've messed
      up the commit in some way.
      0bffc410
  34. 19 Aug, 2002 1 commit
  35. 16 Aug, 2002 1 commit
  36. 23 Apr, 2002 1 commit