1. 29 Jun, 2006 2 commits
  2. 20 Jun, 2006 1 commit
  3. 07 Apr, 2006 1 commit
    • Simon Marlow's avatar
      Reorganisation of the source tree · 0065d5ab
      Simon Marlow authored
      Most of the other users of the fptools build system have migrated to
      Cabal, and with the move to darcs we can now flatten the source tree
      without losing history, so here goes.
      
      The main change is that the ghc/ subdir is gone, and most of what it
      contained is now at the top level.  The build system now makes no
      pretense at being multi-project, it is just the GHC build system.
      
      No doubt this will break many things, and there will be a period of
      instability while we fix the dependencies.  A straightforward build
      should work, but I haven't yet fixed binary/source distributions.
      Changes to the Building Guide will follow, too.
      0065d5ab
  4. 27 Mar, 2006 1 commit
    • Simon Marlow's avatar
      Add a new primitive forkOn#, for forking a thread on a specific Capability · c520a3a2
      Simon Marlow authored
      This gives some control over affinity, while we figure out the best
      way to automatically schedule threads to make best use of the
      available parallelism.
      
      In addition to the primitive, there is also:
       
        GHC.Conc.forkOnIO :: Int -> IO () -> IO ThreadId
      
      where 'forkOnIO i m' creates a thread on Capability (i `rem` N), where
      N is the number of available Capabilities set by +RTS -N.
      
      Threads forked by forkOnIO do not automatically migrate when there are
      free Capabilities, like normal threads do.  Still, if you're using
      forkOnIO exclusively, it's a good idea to do +RTS -qm to disable work
      pushing anyway (work pushing takes too much time when the run queues
      are large, this is something we need to fix).
      c520a3a2
  5. 28 Feb, 2006 3 commits
    • Simon Marlow's avatar
      takeMVar/putMVar were missing some write barriers when modifying a TSO · 080c9600
      Simon Marlow authored
      This relates to the recent introduction of clean/dirty TSOs, and the
      consqeuent write barriers required.  We were missing some write
      barriers in the takeMVar/putMVar family of primops, when performing
      the take/put directly on another TSO.
      
      Fixes #705, and probably some test failures.
      080c9600
    • Simon Marlow's avatar
      pass arguments to unknown function calls in registers · 04db0e9f
      Simon Marlow authored
      We now have more stg_ap entry points: stg_ap_*_fast, which take
      arguments in registers according to the platform calling convention.
      This is faster if the function being called is evaluated and has the
      right arity, which is the common case (see the eval/apply paper for
      measurements).  
      
      We still need the stg_ap_*_info entry points for stack-based
      application, such as an overflows when a function is applied to too
      many argumnets.  The stg_ap_*_fast functions actually just check for
      an evaluated function, and if they don't find one, push the args on
      the stack and invoke stg_ap_*_info.  (this might be slightly slower in
      some cases, but not the common case).
      04db0e9f
    • Simon Marlow's avatar
      fix live register annotations on foreign calls · 174c7f29
      Simon Marlow authored
      fix one incorrect case, and made several more accurate
      174c7f29
  6. 21 Feb, 2006 1 commit
    • Simon Marlow's avatar
      fix a deadlock in atomicModifyMutVar# · 25cc1d1f
      Simon Marlow authored
      atomicModifyMutVar# was re-using the storage manager mutex (sm_mutex)
      to get its atomicity guarantee in SMP mode. But recently the addition
      of a call to dirty_MUT_VAR() to implement the read barrier lead to a
      rare deadlock case, because dirty_MUT_VAR() very occasionally needs to
      allocate a new block to chain on the mutable list, which requires
      sm_mutex.
      25cc1d1f
  7. 10 Feb, 2006 1 commit
  8. 09 Feb, 2006 1 commit
  9. 07 Feb, 2006 1 commit
  10. 19 Jan, 2006 1 commit
  11. 17 Jan, 2006 2 commits
    • simonmar's avatar
      [project @ 2006-01-17 16:13:18 by simonmar] · 91b07216
      simonmar authored
      Improve the GC behaviour of IORefs (see Ticket #650).
      
      This is a small change to the way IORefs interact with the GC, which
      should improve GC performance for programs with plenty of IORefs.
      
      Previously we had a single closure type for mutable variables,
      MUT_VAR.  Mutable variables were *always* on the mutable list in older
      generations, and always traversed on every GC.
      
      Now, we have two closure types: MUT_VAR_CLEAN and MUT_VAR_DIRTY.  The
      latter is on the mutable list, but the former is not.  (NB. this
      differs from MUT_ARR_PTRS_CLEAN and MUT_ARR_PTRS_DIRTY, both of which
      are on the mutable list).  writeMutVar# now implements a write
      barrier, by calling dirty_MUT_VAR() in the runtime, that does the
      necessary modification of MUT_VAR_CLEAN into MUT_VAR_DIRY, and adding
      to the mutable list if necessary.
      
      This results in some pretty dramatic speedups for GHC itself.  I've
      just measureed a 30% overall speedup compiling a 31-module program
      (anna) with the default heap settings :-D
      91b07216
    • simonmar's avatar
      [project @ 2006-01-17 16:03:47 by simonmar] · da69fa9c
      simonmar authored
      Improve the GC behaviour of IOArrays/STArrays
      
      See Ticket #650
      
      This is a small change to the way mutable arrays interact with the GC,
      that can have a dramatic effect on performance, and make tricks with
      unsafeThaw/unsafeFreeze redundant.  Data.HashTable should be faster
      now (I haven't measured it yet).
      
      We now have two mutable array closure types, MUT_ARR_PTRS_CLEAN and
      MUT_ARR_PTRS_DIRTY.  Both are on the mutable list if the array is in
      an old generation.  writeArray# sets the type to MUT_ARR_PTRS_DIRTY.
      The garbage collector can set the type to MUT_ARR_PTRS_CLEAN if it
      finds that no element of the array points into a younger generation
      (discovering this required a small addition to evacuate(), but rough
      tests indicate that it doesn't measurably affect performance).
      
      NOTE: none of this affects unboxed arrays (IOUArray/STUArray), only
      boxed arrays (IOArray/STArray).
      
      We could go further and extend the DIRTY bit to be per-block rather
      than for the whole array, but for now this is an easy improvement.
      da69fa9c
  12. 13 Dec, 2005 1 commit
  13. 28 Nov, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-11-28 14:39:47 by simonmar] · d3d69395
      simonmar authored
      Small performance improvement to STM: reduce the size of an atomically
      frame from 3 words to 2 words by combining the "waiting" boolean field
      with the info pointer, i.e. having two separate info tables/return
      addresses for an atomically frame, one for the normal case and one for
      the waiitng case.
      d3d69395
  14. 21 Nov, 2005 1 commit
  15. 10 Nov, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-11-10 16:14:01 by simonmar] · 84b434c5
      simonmar authored
      Fix a crash in STM; we were releasing ownership of the transaction too
      early in stmWait(), so a TSO could be woken up before we had finished
      putting it to sleep properly.
      84b434c5
  16. 07 Nov, 2005 1 commit
  17. 21 Oct, 2005 2 commits
    • simonmar's avatar
      [project @ 2005-10-21 15:23:59 by simonmar] · b587ed53
      simonmar authored
      more Win32 fixes
      b587ed53
    • simonmar's avatar
      [project @ 2005-10-21 14:02:17 by simonmar] · 03a9ff01
      simonmar authored
      Big re-hash of the threaded/SMP runtime
      
      This is a significant reworking of the threaded and SMP parts of
      the runtime.  There are two overall goals here:
      
        - To push down the scheduler lock, reducing contention and allowing
          more parts of the system to run without locks.  In particular,
          the scheduler does not require a lock any more in the common case.
      
        - To improve affinity, so that running Haskell threads stick to the
          same OS threads as much as possible.
      
      At this point we have the basic structure working, but there are some
      pieces missing.  I believe it's reasonably stable - the important
      parts of the testsuite pass in all the (normal,threaded,SMP) ways.
      
      In more detail:
      
        - Each capability now has a run queue, instead of one global run
          queue.  The Capability and Task APIs have been completely
          rewritten; see Capability.h and Task.h for the details.
      
        - Each capability has its own pool of worker Tasks.  Hence, Haskell
          threads on a Capability's run queue will run on the same worker
          Task(s).  As long as the OS is doing something reasonable, this
          should mean they usually stick to the same CPU.  Another way to
          look at this is that we're assuming each Capability is associated
          with a fixed CPU.
      
        - What used to be StgMainThread is now part of the Task structure.
          Every OS thread in the runtime has an associated Task, and it
          can ask for its current Task at any time with myTask().
      
        - removed RTS_SUPPORTS_THREADS symbol, use THREADED_RTS instead
          (it is now defined for SMP too).
      
        - The RtsAPI has had to change; we must explicitly pass a Capability
          around now.  The previous interface assumed some global state.
          SchedAPI has also changed a lot.
      
        - The OSThreads API now supports thread-local storage, used to
          implement myTask(), although it could be done more efficiently
          using gcc's __thread extension when available.
      
        - I've moved some POSIX-specific stuff into the posix subdirectory,
          moving in the direction of separating out platform-specific
          implementations.
      
        - lots of lock-debugging and assertions in the runtime.  In particular,
          when DEBUG is on, we catch multiple ACQUIRE_LOCK()s, and there is
          also an ASSERT_LOCK_HELD() call.
      
      What's missing so far:
      
        - I have almost certainly broken the Win32 build, will fix soon.
      
        - any kind of thread migration or load balancing.  This is high up
          the agenda, though.
      
        - various performance tweaks to do
      
        - throwTo and forkProcess still do not work in SMP mode
      03a9ff01
  18. 17 Oct, 2005 1 commit
  19. 16 Sep, 2005 1 commit
  20. 02 Aug, 2005 1 commit
  21. 25 Jul, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-07-25 14:12:48 by simonmar] · e792bb84
      simonmar authored
      Remove the ForeignObj# type, and all its PrimOps.  The new efficient
      representation of ForeignPtr doesn't use ForeignObj# underneath, and
      there seems no need to keep it.
      e792bb84
  22. 14 Jun, 2005 1 commit
  23. 06 Jun, 2005 1 commit
  24. 27 May, 2005 1 commit
  25. 19 May, 2005 2 commits
    • simonmar's avatar
      [project @ 2005-05-19 13:46:24 by simonmar] · 88825a2e
      simonmar authored
      Fix locking when unblocking a thread in take/putMVar.  Due to CPP
      nonsense the previous locking wasn't actually working, which lead to
      deadlock problems.  It now turns out that I can call unblockOne
      directly rather than needing unblockOneLocked (the lock on the MVar
      means I have exclusive access to the threads on its queue).
      88825a2e
    • simonmar's avatar
      [project @ 2005-05-19 13:21:55 by simonmar] · 3595da95
      simonmar authored
      - Move the call to threadPaused() from the scheduler into STG land,
        and put it in a new code fragment (stg_returnToSched) that we pass
        through every time we return from STG to the scheduler.  Also, the
        SAVE_THREAD_STATE() is now in stg_returnToSched which might save a
        little code space (at the expense of an extra jump for every return
        to the scheduler).
      
      - SMP: when blocking on an MVar, we now wait until the thread has been
        made fully safe and placed on the blocked queue of the MVar before
        we unlock the MVar.  This closes a race whereby another OS thread could
        begin waking us up before the current TSO had been properly tidied up.
      
      Fixes one cause of crashes when using MVars with SMP.  I still have a
      deadlock problem to track down.
      3595da95
  26. 10 May, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-05-10 13:25:41 by simonmar] · bf821981
      simonmar authored
      Two SMP-related changes:
      
        - New storage manager interface:
      
          bdescr *allocateLocal(StgRegTable *reg, nat words)
      
          which allocates from the current thread's nursery (being careful
          not to clash with the heap pointer).  It can do this without
          taking any locks; the lock only has to be taken if a block needs
          to be allocated.  allocateLocal() is now used instead of allocate()
          in a few PrimOps.
      
          This removes locks from most Integer operations, cutting down
          the overhead for SMP a bit more.
      
          To make this work, we have to be able to grab the current thread's
          Capability out of thin air (i.e. when called from GMP), so the
          Capability subsystem needs to keep a hash from thread IDs to
          Capabilities.
      
        - Small MVar optimisation: instead of taking the global
          storage-manager lock, do our own locking of MVars with a bit of
          inline assembly (x86 only for now).
      bf821981
  27. 27 Apr, 2005 2 commits
    • simonmar's avatar
      [project @ 2005-04-27 14:25:17 by simonmar] · 0cc00995
      simonmar authored
      Hold the sm_mutex around access to the mutable list.
      
      The SMP RTS now seems quite stable, I've run my simple test program
      with 64 threads without crashes.
      0cc00995
    • simonmar's avatar
      [project @ 2005-04-27 09:48:34 by simonmar] · 03d63424
      simonmar authored
      Remove uses of stderr from .cmm code.
      
      We can't reliably refer to stdin/stdout/stderr from .cmm, because the
      C standard doesn't specify whether stderr is a variable or a macro.
      03d63424
  28. 25 Apr, 2005 1 commit
  29. 24 Apr, 2005 1 commit
  30. 22 Apr, 2005 2 commits
    • simonmar's avatar
      [project @ 2005-04-22 21:16:27 by simonmar] · f33bd72e
      simonmar authored
      fix uses of stderr
      f33bd72e
    • simonmar's avatar
      [project @ 2005-04-22 12:28:00 by simonmar] · ec0984a9
      simonmar authored
      - Now that labels are always prefixed with '&' in .hc code, we have to
        fix some sloppiness in the RTS .cmm code.  Fortunately it's not too
        painful.
      
      - SMP: acquire/release the storage manager lock around
        atomicModifyMutVar#.  This is a hack: atomicModifyMutVar# isn't
        atomic under SMP otherwise, but the SM lock is a large sledgehammer.
        I think I'll apply the sledgehammer to the MVar primitives too, for
        the time being.
      ec0984a9
  31. 17 Mar, 2005 1 commit
  32. 10 Feb, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-02-10 13:01:52 by simonmar] · e7c3f957
      simonmar authored
      GC changes: instead of threading old-generation mutable lists
      through objects in the heap, keep it in a separate flat array.
      
      This has some advantages:
      
        - the IND_OLDGEN object is now only 2 words, so the minimum
          size of a THUNK is now 2 words instead of 3.  This saves
          some amount of allocation (about 2% on average according to
          my measurements), and is more friendly to the cache by
          squashing objects together more.
      
        - keeping the mutable list separate from the IND object
          will be necessary for our multiprocessor implementation.
      
        - removing the mut_link field makes the layout of some objects
          more uniform, leading to less complexity and special cases.
      
        - I also unified the two mutable lists (mut_once_list and mut_list)
          into a single mutable list, which lead to more simplifications
          in the GC.
      e7c3f957