1. 22 Feb, 2006 1 commit
  2. 21 Feb, 2006 2 commits
    • Simon Marlow's avatar
      fix a deadlock in atomicModifyMutVar# · 25cc1d1f
      Simon Marlow authored
      atomicModifyMutVar# was re-using the storage manager mutex (sm_mutex)
      to get its atomicity guarantee in SMP mode. But recently the addition
      of a call to dirty_MUT_VAR() to implement the read barrier lead to a
      rare deadlock case, because dirty_MUT_VAR() very occasionally needs to
      allocate a new block to chain on the mutable list, which requires
      sm_mutex.
      25cc1d1f
    • Simon Marlow's avatar
      warning fix · 5d9f7faf
      Simon Marlow authored
      5d9f7faf
  3. 10 Feb, 2006 1 commit
  4. 09 Feb, 2006 2 commits
  5. 17 Jan, 2006 1 commit
    • simonmar's avatar
      [project @ 2006-01-17 16:13:18 by simonmar] · 91b07216
      simonmar authored
      Improve the GC behaviour of IORefs (see Ticket #650).
      
      This is a small change to the way IORefs interact with the GC, which
      should improve GC performance for programs with plenty of IORefs.
      
      Previously we had a single closure type for mutable variables,
      MUT_VAR.  Mutable variables were *always* on the mutable list in older
      generations, and always traversed on every GC.
      
      Now, we have two closure types: MUT_VAR_CLEAN and MUT_VAR_DIRTY.  The
      latter is on the mutable list, but the former is not.  (NB. this
      differs from MUT_ARR_PTRS_CLEAN and MUT_ARR_PTRS_DIRTY, both of which
      are on the mutable list).  writeMutVar# now implements a write
      barrier, by calling dirty_MUT_VAR() in the runtime, that does the
      necessary modification of MUT_VAR_CLEAN into MUT_VAR_DIRY, and adding
      to the mutable list if necessary.
      
      This results in some pretty dramatic speedups for GHC itself.  I've
      just measureed a 30% overall speedup compiling a 31-module program
      (anna) with the default heap settings :-D
      91b07216
  6. 04 Nov, 2005 1 commit
  7. 27 Oct, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-10-27 15:26:06 by simonmar] · 677c6345
      simonmar authored
      - Very simple work-sharing amongst Capabilities: whenever a Capability
        detects that it has more than 1 thread in its run queue, it runs
        around looking for empty Capabilities, and shares the threads on its
        run queue equally with the free Capabilities it finds.
      
      - unlock the garbage collector's mutable lists, by having private
        mutable lists per capability (and per generation).  The private
        mutable lists are moved onto the main mutable lists at each GC.
        This pulls the old-generation update code out of the storage manager
        mutex, which is one of the last remaining causes of (alleged) contention.
      
      - Fix some problems with synchronising when a GC is required.  We should
        synchronise quicker now.
      677c6345
  8. 25 Oct, 2005 1 commit
  9. 21 Oct, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-10-21 14:02:17 by simonmar] · 03a9ff01
      simonmar authored
      Big re-hash of the threaded/SMP runtime
      
      This is a significant reworking of the threaded and SMP parts of
      the runtime.  There are two overall goals here:
      
        - To push down the scheduler lock, reducing contention and allowing
          more parts of the system to run without locks.  In particular,
          the scheduler does not require a lock any more in the common case.
      
        - To improve affinity, so that running Haskell threads stick to the
          same OS threads as much as possible.
      
      At this point we have the basic structure working, but there are some
      pieces missing.  I believe it's reasonably stable - the important
      parts of the testsuite pass in all the (normal,threaded,SMP) ways.
      
      In more detail:
      
        - Each capability now has a run queue, instead of one global run
          queue.  The Capability and Task APIs have been completely
          rewritten; see Capability.h and Task.h for the details.
      
        - Each capability has its own pool of worker Tasks.  Hence, Haskell
          threads on a Capability's run queue will run on the same worker
          Task(s).  As long as the OS is doing something reasonable, this
          should mean they usually stick to the same CPU.  Another way to
          look at this is that we're assuming each Capability is associated
          with a fixed CPU.
      
        - What used to be StgMainThread is now part of the Task structure.
          Every OS thread in the runtime has an associated Task, and it
          can ask for its current Task at any time with myTask().
      
        - removed RTS_SUPPORTS_THREADS symbol, use THREADED_RTS instead
          (it is now defined for SMP too).
      
        - The RtsAPI has had to change; we must explicitly pass a Capability
          around now.  The previous interface assumed some global state.
          SchedAPI has also changed a lot.
      
        - The OSThreads API now supports thread-local storage, used to
          implement myTask(), although it could be done more efficiently
          using gcc's __thread extension when available.
      
        - I've moved some POSIX-specific stuff into the posix subdirectory,
          moving in the direction of separating out platform-specific
          implementations.
      
        - lots of lock-debugging and assertions in the runtime.  In particular,
          when DEBUG is on, we catch multiple ACQUIRE_LOCK()s, and there is
          also an ASSERT_LOCK_HELD() call.
      
      What's missing so far:
      
        - I have almost certainly broken the Win32 build, will fix soon.
      
        - any kind of thread migration or load balancing.  This is high up
          the agenda, though.
      
        - various performance tweaks to do
      
        - throwTo and forkProcess still do not work in SMP mode
      03a9ff01
  10. 12 Oct, 2005 1 commit
  11. 02 Aug, 2005 1 commit
  12. 25 Jul, 2005 1 commit
  13. 12 May, 2005 2 commits
  14. 11 May, 2005 2 commits
  15. 10 May, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-05-10 13:25:41 by simonmar] · bf821981
      simonmar authored
      Two SMP-related changes:
      
        - New storage manager interface:
      
          bdescr *allocateLocal(StgRegTable *reg, nat words)
      
          which allocates from the current thread's nursery (being careful
          not to clash with the heap pointer).  It can do this without
          taking any locks; the lock only has to be taken if a block needs
          to be allocated.  allocateLocal() is now used instead of allocate()
          in a few PrimOps.
      
          This removes locks from most Integer operations, cutting down
          the overhead for SMP a bit more.
      
          To make this work, we have to be able to grab the current thread's
          Capability out of thin air (i.e. when called from GMP), so the
          Capability subsystem needs to keep a hash from thread IDs to
          Capabilities.
      
        - Small MVar optimisation: instead of taking the global
          storage-manager lock, do our own locking of MVars with a bit of
          inline assembly (x86 only for now).
      bf821981
  16. 28 Apr, 2005 1 commit
  17. 27 Apr, 2005 2 commits
  18. 22 Apr, 2005 2 commits
    • simonmar's avatar
      [project @ 2005-04-22 13:12:41 by simonmar] · 5d0394e2
      simonmar authored
      checkSanity: fix bug in nursery checking
      5d0394e2
    • simonmar's avatar
      [project @ 2005-04-22 12:28:00 by simonmar] · ec0984a9
      simonmar authored
      - Now that labels are always prefixed with '&' in .hc code, we have to
        fix some sloppiness in the RTS .cmm code.  Fortunately it's not too
        painful.
      
      - SMP: acquire/release the storage manager lock around
        atomicModifyMutVar#.  This is a hack: atomicModifyMutVar# isn't
        atomic under SMP otherwise, but the SM lock is a large sledgehammer.
        I think I'll apply the sledgehammer to the MVar primitives too, for
        the time being.
      ec0984a9
  19. 12 Apr, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-04-12 09:04:23 by simonmar] · 693550d9
      simonmar authored
      Per-task nurseries for SMP.  This was kind-of implemented before, but
      it's much cleaner now.  There is now one *step* per capability, so we
      have somewhere to hang the block count.  So for SMP, there are simply
      multiple instances of generation 0 step 0.  The rNursery entry in the
      register table now points to the step rather than the head block of
      the nurersy.
      693550d9
  20. 10 Apr, 2005 1 commit
  21. 07 Apr, 2005 1 commit
  22. 05 Apr, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-04-05 12:19:54 by simonmar] · 16214216
      simonmar authored
      Some multi-processor hackery, including
      
        - Don't hang blocked threads off BLACKHOLEs any more, instead keep
          them all on a separate queue which is checked periodically for
          threads to wake up.
      
          This is good because (a) we don't have to worry about locking the
          closure in SMP mode when we want to block on it, and (b) it means
          the standard update code doesn't need to wake up any threads or
          check for a BLACKHOLE_BQ, simplifying the update code.
      
          The downside is that if there are lots of threads blocked on
          BLACKHOLEs, we might have to do a lot of repeated list traversal.
          We don't expect this to be common, though.  conc023 goes slower
          with this change, but we expect most programs to benefit from the
          shorter update code.
      
        - Fixing up the Capability code to handle multiple capabilities (SMP
          mode), and related changes to get the SMP mode at least building.
      16214216
  23. 09 Mar, 2005 1 commit
    • wolfgang's avatar
      [project @ 2005-03-09 08:51:31 by wolfgang] · abde5fdf
      wolfgang authored
      Retain all CAFs when dynamic Haskell libraries are used from GHCi.
      The Linker usually replaces references to newCAF with references to newDynCAF,
      but the system dynamic linker won't do that for us.
      
      Also, the situation is slightly different - we never want CAFs from dylibs
      to be reverted, because the dylibs might be used both by the interpreted
      program and by GHCi itself.
      
      So instead of just caf_list, there's now both caf_list and revertible_caf_list.
      newDynCAF adds a CAF to revertible_caf_list, and newCAF either adds the CAF
      to caf_list or to the mutable list, depending on whether we are in GHCi.
      
      This hack is only active when Linker.c has loaded libHSbase_dyn.[so|dylib],
      but for now, it applies to all CAFs, not just dynamically-linked ones.
      If that is worth fixing, we could do that by checking whether the the CAF
      closure or it's info pointer is in the main executable's address range.
      
      MERGE TO STABLE
      abde5fdf
  24. 10 Feb, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-02-10 13:01:52 by simonmar] · e7c3f957
      simonmar authored
      GC changes: instead of threading old-generation mutable lists
      through objects in the heap, keep it in a separate flat array.
      
      This has some advantages:
      
        - the IND_OLDGEN object is now only 2 words, so the minimum
          size of a THUNK is now 2 words instead of 3.  This saves
          some amount of allocation (about 2% on average according to
          my measurements), and is more friendly to the cache by
          squashing objects together more.
      
        - keeping the mutable list separate from the IND object
          will be necessary for our multiprocessor implementation.
      
        - removing the mut_link field makes the layout of some objects
          more uniform, leading to less complexity and special cases.
      
        - I also unified the two mutable lists (mut_once_list and mut_list)
          into a single mutable list, which lead to more simplifications
          in the GC.
      e7c3f957
  25. 03 Sep, 2004 1 commit
    • simonmar's avatar
      [project @ 2004-09-03 15:28:18 by simonmar] · 95ca6bff
      simonmar authored
      Cleanup: all (well, most) messages from the RTS now go through the
      functions in RtsUtils: barf(), debugBelch() and errorBelch().  The
      latter two were previously called belch() and prog_belch()
      respectively.  See the comments for the right usage of these message
      functions.
      
      One reason for doing this is so that we can avoid spurious uses of
      stdout/stderr by Haskell apps on platforms where we shouldn't be using
      them (eg. non-console apps on Windows).
      95ca6bff
  26. 13 Aug, 2004 1 commit
  27. 21 Jul, 2004 1 commit
  28. 24 Oct, 2003 2 commits
  29. 23 Sep, 2003 1 commit
    • simonmar's avatar
      [project @ 2003-09-23 15:38:35 by simonmar] · 76ebf3dc
      simonmar authored
      Add a BF_PINNED block flag, and attach it to blocks containing pinned
      objects (in addition to the usual BF_LARGE).
      
      In heapCensus, we now ignore blocks containing pinned objects, because
      they might contain gaps, and in any case it isn't clear that we want
      to include the whole block in a heap census, because much of it might
      well be dead.  Ignoring it isn't right either, though, so this patch
      just fixes the crash and leaves a ToDo.
      76ebf3dc
  30. 26 Mar, 2003 2 commits
  31. 21 Mar, 2003 1 commit
    • sof's avatar
      [project @ 2003-03-21 16:18:37 by sof] · 557bca73
      sof authored
      Friday morning code-wibbling:
      - made RetainerProfile.c:firstStack a 'static'
      - added RetainerProfile.c:retainerStackBlocks()
      557bca73
  32. 01 Feb, 2003 1 commit