1. 18 Nov, 2005 2 commits
    • simonmar's avatar
      [project @ 2005-11-18 15:38:26 by simonmar] · 942818de
      simonmar authored
      fix threaded build
      942818de
    • simonmar's avatar
      [project @ 2005-11-18 15:24:12 by simonmar] · c5cd2343
      simonmar authored
      Two improvements to the SMP runtime:
      
        - support for 'par', aka sparks.  Load balancing is very primitive
          right now, but I have seen programs that go faster using par.
      
        - support for backing off when a thread is found to be duplicating
          a computation currently underway in another thread.  This also
          fixes some instability in SMP, because it turned out that when
          an update frame points to an indirection, which can happen if
          a thunk is under evaluation in multiple threads, then after GC
          has shorted out the indirection the update will trash the value.
          Now we suspend the duplicate computation to the heap before this
          can happen.
      
      Additionally:
      
        - stack squeezing is separate from lazy blackholing, and now only
          happens if there's a reasonable amount of squeezing to be done
          in relation to the number of words of stack that have to be moved.
          This means we won't try to shift 10Mb of stack just to save 2
          words at the bottom (it probably never happened, but still).
      
        - update frames are now marked when they have been visited by lazy
          blackholing, as per the SMP paper.
      
        - cleaned up raiseAsync() a bit.
      c5cd2343
  2. 21 Oct, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-10-21 14:02:17 by simonmar] · 03a9ff01
      simonmar authored
      Big re-hash of the threaded/SMP runtime
      
      This is a significant reworking of the threaded and SMP parts of
      the runtime.  There are two overall goals here:
      
        - To push down the scheduler lock, reducing contention and allowing
          more parts of the system to run without locks.  In particular,
          the scheduler does not require a lock any more in the common case.
      
        - To improve affinity, so that running Haskell threads stick to the
          same OS threads as much as possible.
      
      At this point we have the basic structure working, but there are some
      pieces missing.  I believe it's reasonably stable - the important
      parts of the testsuite pass in all the (normal,threaded,SMP) ways.
      
      In more detail:
      
        - Each capability now has a run queue, instead of one global run
          queue.  The Capability and Task APIs have been completely
          rewritten; see Capability.h and Task.h for the details.
      
        - Each capability has its own pool of worker Tasks.  Hence, Haskell
          threads on a Capability's run queue will run on the same worker
          Task(s).  As long as the OS is doing something reasonable, this
          should mean they usually stick to the same CPU.  Another way to
          look at this is that we're assuming each Capability is associated
          with a fixed CPU.
      
        - What used to be StgMainThread is now part of the Task structure.
          Every OS thread in the runtime has an associated Task, and it
          can ask for its current Task at any time with myTask().
      
        - removed RTS_SUPPORTS_THREADS symbol, use THREADED_RTS instead
          (it is now defined for SMP too).
      
        - The RtsAPI has had to change; we must explicitly pass a Capability
          around now.  The previous interface assumed some global state.
          SchedAPI has also changed a lot.
      
        - The OSThreads API now supports thread-local storage, used to
          implement myTask(), although it could be done more efficiently
          using gcc's __thread extension when available.
      
        - I've moved some POSIX-specific stuff into the posix subdirectory,
          moving in the direction of separating out platform-specific
          implementations.
      
        - lots of lock-debugging and assertions in the runtime.  In particular,
          when DEBUG is on, we catch multiple ACQUIRE_LOCK()s, and there is
          also an ASSERT_LOCK_HELD() call.
      
      What's missing so far:
      
        - I have almost certainly broken the Win32 build, will fix soon.
      
        - any kind of thread migration or load balancing.  This is high up
          the agenda, though.
      
        - various performance tweaks to do
      
        - throwTo and forkProcess still do not work in SMP mode
      03a9ff01
  3. 05 Apr, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-04-05 12:19:54 by simonmar] · 16214216
      simonmar authored
      Some multi-processor hackery, including
      
        - Don't hang blocked threads off BLACKHOLEs any more, instead keep
          them all on a separate queue which is checked periodically for
          threads to wake up.
      
          This is good because (a) we don't have to worry about locking the
          closure in SMP mode when we want to block on it, and (b) it means
          the standard update code doesn't need to wake up any threads or
          check for a BLACKHOLE_BQ, simplifying the update code.
      
          The downside is that if there are lots of threads blocked on
          BLACKHOLEs, we might have to do a lot of repeated list traversal.
          We don't expect this to be common, though.  conc023 goes slower
          with this change, but we expect most programs to benefit from the
          shorter update code.
      
        - Fixing up the Capability code to handle multiple capabilities (SMP
          mode), and related changes to get the SMP mode at least building.
      16214216
  4. 12 Sep, 2004 1 commit
  5. 24 Jan, 2002 1 commit
  6. 22 Mar, 2001 1 commit
    • hwloidl's avatar
      [project @ 2001-03-22 03:51:08 by hwloidl] · 20fc2f0c
      hwloidl authored
      -*- outline -*-
      Time-stamp: <Thu Mar 22 2001 03:50:16 Stardate: [-30]6365.79 hwloidl>
      
      This commit covers changes in GHC to get GUM (way=mp) and GUM/GdH (way=md)
      working. It is a merge of my working version of GUM, based on GHC 4.06,
      with GHC 4.11. Almost all changes are in the RTS (see below).
      
      GUM is reasonably stable, we used the 4.06 version in large-ish programs for
      recent papers. Couple of things I want to change, but nothing urgent.
      GUM/GdH has just been merged and needs more testing. Hope to do that in the
      next weeks. It works in our working build but needs tweaking to run.
      GranSim doesn't work yet (*sigh*). Most of the code should be in, but needs
      more debugging.
      
      ToDo: I still want to make the following minor modifications before the release
      - Better wrapper skript for parallel execution [ghc/compiler/main]
      - Update parallel docu: started on it but it's minimal [ghc/docs/users_guide]
      - Clean up [nofib/parallel]: it's a real mess right now (*sigh*)
      - Update visualisation tools (minor things only IIRC) [ghc/utils/parallel]
      - Add a Klingon-English glossary
      
      * RTS:
      
      Almost all changes are restricted to ghc/rts/parallel and should not
      interfere with the rest. I only comment on changes outside the parallel
      dir:
      
      - Several changes in Schedule.c (scheduling loop; createThreads etc);
        should only affect parallel code
      - Added ghc/rts/hooks/ShutdownEachPEHook.c
      - ghc/rts/Linker.[ch]: GUM doesn't know about Stable Names (ifdefs)!!
      - StgMiscClosures.h: END_TSO_QUEUE etc now defined here (from StgMiscClosures.hc)
                           END_ECAF_LIST was missing a leading stg_
      - SchedAPI.h: taskStart now defined in here; it's only a wrapper around
                    scheduleThread now, but might use some init, shutdown later
      - RtsAPI.h: I have nuked the def of rts_evalNothing
      
      * Compiler:
      
      - ghc/compiler/main/DriverState.hs
        added PVM-ish flags to the parallel way
        added new ways for parallel ticky profiling and distributed exec
      
      - ghc/compiler/main/DriverPipeline.hs
        added a fct run_phase_MoveBinary which is called with way=mp after linking;
        it moves the bin file into a PVM dir and produces a wrapper script for
        parallel execution
        maybe cleaner to add a MoveBinary phase in DriverPhases.hs but this way
        it's less intrusive and MoveBinary makes probably only sense for mp anyway
      
      * Nofib:
      
      - nofib/spectral/Makefile, nofib/real/Makefile, ghc/tests/programs/Makefile:
        modified to skip some tests if HWL_NOFIB_HACK is set; only tmp to record
        which test prgs cause problems in my working build right now
      20fc2f0c
  7. 31 Mar, 2000 1 commit
    • hwloidl's avatar
      [project @ 2000-03-31 03:09:35 by hwloidl] · dd4c28a9
      hwloidl authored
      Numerous changes in the RTS to get GUM-4.06 working (currently works with
      parfib-ish programs). Most changes are isolated in the rts/parallel dir.
      
      rts/parallel/:
        The most important changes are a rewrite of the (un-)packing code (Pack.c)
        and changes in LAGA, GALA table operations (Global.c) expecially in
        rebuilding the tables during GC.
      
      rts/:
        Minor changes in Schedule.c, GC.c (interface to par specific root marking
        and evacuation), and lots of additions to Sanity.c (surprise ;-)
        Main.c change for startup: I use a new function rts_evalNothing to
        start non-main-PEs in a PAR || SMP setup (RtsAPI.c)
      
      includes/:
        Updated GranSim macros in PrimOps.h.
      
      lib/std:
        Few changes in PrelHandle.c etc replacing ForeignObj by Addr in a PAR
        setup (we still don't support ForeignObjs or WeakPtrs in GUM).
        Typically use
          #define FILE_OBJECT	    Addr
        when dealing with files.
      
      hslibs/lang/:
        Same as above (in Foreign(Obj).lhs, Weak.lhs, IOExts.lhs etc).
      
      -- HWL
      dd4c28a9
  8. 12 Jan, 2000 1 commit