1. 08 Sep, 2013 2 commits
  2. 04 Sep, 2013 1 commit
    • Simon Marlow's avatar
      Don't move Capabilities in setNumCapabilities (#8209) · aa779e09
      Simon Marlow authored
      We have various problems with reallocating the array of Capabilities,
      due to threads in waitForReturnCapability that are already holding a
      pointer to a Capability.
      
      Rather than add more locking to make this safer, I decided it would be
      easier to ensure that we never move the Capabilities at all.  The
      capabilities array is now an array of pointers to Capabaility.  There
      are extra indirections, but it rarely matters - we don't often access
      Capabilities via the array, normally we already have a pointer to
      one.  I ran the parallel benchmarks and didn't see any difference.
      aa779e09
  3. 09 Jul, 2013 1 commit
  4. 07 Jul, 2013 1 commit
  5. 21 Jun, 2013 1 commit
  6. 20 Feb, 2013 1 commit
    • Simon Marlow's avatar
      Fix bug in setNumCapabilities · f469eff8
      Simon Marlow authored
      We were changing n_capabilities after we had released the
      Capabilities, which lead to a range of interesting crashes.  This
      should fix test failures in setnumcapabilities001.
      f469eff8
  7. 17 Feb, 2013 1 commit
  8. 16 Feb, 2013 1 commit
  9. 12 Feb, 2013 2 commits
  10. 07 Feb, 2013 1 commit
  11. 16 Jan, 2013 1 commit
  12. 16 Nov, 2012 1 commit
    • Simon Marlow's avatar
      Add a write barrier for TVAR closures · 6d784c43
      Simon Marlow authored
      This improves GC performance when there are a lot of TVars in the
      heap.  For instance, a TChan with a lot of elements causes a massive
      GC drag without this patch.
      
      There's more to do - several other STM closure types don't have write
      barriers, so GC performance when there are a lot of threads blocked on
      STM isn't great.  But fixing the problem for TVar is a good start.
      6d784c43
  13. 25 Oct, 2012 2 commits
  14. 23 Oct, 2012 1 commit
  15. 22 Oct, 2012 1 commit
  16. 24 Sep, 2012 1 commit
    • Simon Marlow's avatar
      Another overhaul of the recent_activity / idle GC handling (#5991) · 0b79d5cd
      Simon Marlow authored
      Improvements:
      
       - we now turn off the timer signal in the non-threaded RTS after
         idleGCDelay.  This should make the xmonad users on #5991 happy.
      
       - we now turn off the timer signal after idleGCDelay even if the
         idle GC is disabled with +RTS -I0.
      
       - we now do *not* turn off the timer when profiling.
      
       - more comments to explain the meaning of the various ACTIVITY_*
         values
      0b79d5cd
  17. 07 Sep, 2012 1 commit
    • Simon Marlow's avatar
      Deprecate lnat, and use StgWord instead · 41737f12
      Simon Marlow authored
      lnat was originally "long unsigned int" but we were using it when we
      wanted a 64-bit type on a 64-bit machine.  This broke on Windows x64,
      where long == int == 32 bits.  Using types of unspecified size is bad,
      but what we really wanted was a type with N bits on an N-bit machine.
      StgWord is exactly that.
      
      lnat was mentioned in some APIs that clients might be using
      (e.g. StackOverflowHook()), so we leave it defined but with a comment
      to say that it's deprecated.
      41737f12
  18. 21 Aug, 2012 1 commit
  19. 07 Aug, 2012 1 commit
    • Simon Marlow's avatar
      Fix a bug in the handling of recent_activity · 396f0903
      Simon Marlow authored
      The problem occurred when the idle GC was turned off with +RTS -I0.
      Then the scheduler would go into the state ACTIVITY_DONE_GC directly
      without doing a GC, and a subsequent GC would put it back to
      ACTIVITY_YES but without turning the timer back on.  Instead if the GC
      finds the state is ACTIVITY_DONE_GC it should leave it there.
      396f0903
  20. 10 Jul, 2012 2 commits
    • Duncan Coutts's avatar
      Emit the task-tracking events · 38397354
      Duncan Coutts authored
      Based on initial patches by Mikolaj Konarski <mikolaj@well-typed.com>
      
      Use the new task tracing functions traceTaskCreate/Migrate/Delete.
      There are two key places. One is for worker tasks which have a
      relatively simple life cycle. Worker tasks are created and deleted by
      the RTS. The other case is bound tasks which are either created by the
      RTS, or appear as foreign C threads making calls into the RTS. For bound
      threads we do the tracing in rts_lock/unlock, which actually covers both
      threads coming in from outside, and also bound threads made by the RTS.
      38397354
    • Simon Marlow's avatar
      The final GC should be a major one · 2f3a41d9
      Simon Marlow authored
      We do a final GC before shutting down the system, to clean up.
      However, we were doing an ordinary GC rather than forcing a major GC,
      so especially when the allocation area is large, this final GC could
      be expensive.  This is really just a bug - the final GC should have
      virtually nothing to do, because there is nothing live.
      2f3a41d9
  21. 07 Jun, 2012 1 commit
  22. 27 May, 2012 1 commit
  23. 04 Apr, 2012 2 commits
    • Mikolaj Konarski's avatar
      Fix the timestamps in GC_START and GC_END events on the GC-initiating cap · 598109eb
      Mikolaj Konarski authored
      There was a discrepancy between GC times reported in +RTS -s
      and the timestamps of GC_START and GC_END events on the cap,
      on which +RTS -s stats for the given GC are based.
      This is fixed by posting the events with exactly the same timestamp
      as generated for the stat calculation. The calls posting the events
      are moved too, so that the events are emitted close to the time instant
      they claim to be emitted at. The GC_STATS_GHC was moved, too, ensuring
      it's emitted before the moved GC_END on all caps, which simplifies tools code.
      598109eb
    • Duncan Coutts's avatar
      Add eventlog/trace stuff for capabilities: create/delete/enable/disable · f9c2e854
      Duncan Coutts authored
      Now that we can adjust the number of capabilities on the fly, we need
      this reflected in the eventlog. Previously the eventlog had a single
      startup event that declared a static number of capabilities. Obviously
      that's no good anymore.
      
      For compatability we're keeping the EVENT_STARTUP but adding new
      EVENT_CAP_CREATE/DELETE. The EVENT_CAP_DELETE is actually just the old
      EVENT_SHUTDOWN but renamed and extended (using the existing mechanism
      to extend eventlog events in a compatible way). So we now emit both
      EVENT_STARTUP and EVENT_CAP_CREATE. One day we will drop EVENT_STARTUP.
      
      Since reducing the number of capabilities at runtime does not really
      delete them, it just disables them, then we also have new events for
      disable/enable.
      
      The old EVENT_SHUTDOWN was in the scheduler class of events. The new
      EVENT_CAP_* events are in the unconditional class, along with the
      EVENT_CAPSET_* ones. Knowing when capabilities are created and deleted
      is crucial to making sense of eventlogs, you always want those events.
      In any case, they're extremely low volume.
      f9c2e854
  24. 19 Mar, 2012 1 commit
  25. 18 Mar, 2012 1 commit
  26. 16 Mar, 2012 1 commit
  27. 27 Feb, 2012 1 commit
  28. 06 Jan, 2012 1 commit
  29. 15 Dec, 2011 1 commit
    • Simon Marlow's avatar
      Support for reducing the number of Capabilities with setNumCapabilities · 9bae7915
      Simon Marlow authored
      This patch allows setNumCapabilities to /reduce/ the number of active
      capabilities as well as increase it.  This is particularly tricky to
      do, because a Capability is a large data structure and ties into the
      rest of the system in many ways.  Trying to clean it all up would be
      extremely error prone.
      
      So instead, the solution is to mark the extra capabilities as
      "disabled".  This has the following consequences:
      
        - threads on a disabled capability are migrated away by the
          scheduler loop
      
        - disabled capabilities do not participate in GC
          (see scheduleDoGC())
      
        - No spark threads are created on this capability
          (see scheduleActivateSpark())
      
        - We do not attempt to migrate threads *to* a disabled
          capability (see schedulePushWork()).
      
      So a disabled capability should do no work, and does not participate
      in GC, although it remains alive in other respects.  For example, a
      blocked thread might wake up on a disabled capability, and it will get
      quickly migrated to a live capability.  A disabled capability can
      still initiate GC if necessary.  Indeed, it turns out to be hard to
      migrate bound threads, so we wait until the next GC to do this (see
      comments for details).
      9bae7915
  30. 13 Dec, 2011 1 commit
    • Simon Marlow's avatar
      New flag +RTS -qi<n>, avoid waking up idle Capabilities to do parallel GC · a02eb298
      Simon Marlow authored
      This is an experimental tweak to the parallel GC that avoids waking up
      a Capability to do parallel GC if we know that the capability has been
      idle for a (tunable) number of GC cycles.  The idea is that if you're
      only using a few Capabilities, there's no point waking up the ones
      that aren't busy.
      
      e.g. +RTS -qi3
      
      says "A Capability will participate in parallel GC if it was running
      at all since the last 3 GC cycles."
      
      Results are a bit hit and miss, and I don't completely understand why
      yet.  Hence, for now it is turned off by default, and also not
      documented except in the +RTS -? output.
      a02eb298
  31. 06 Dec, 2011 2 commits
    • Simon Marlow's avatar
      Allow the number of capabilities to be increased at runtime (#3729) · 92e7d6c9
      Simon Marlow authored
      At present the number of capabilities can only be *increased*, not
      decreased.  The latter presents a few more challenges!
      92e7d6c9
    • Simon Marlow's avatar
      Make forkProcess work with +RTS -N · 8b75acd3
      Simon Marlow authored
      Consider this experimental for the time being.  There are a lot of
      things that could go wrong, but I've verified that at least it works
      on the test cases we have.
      
      I also did some API cleanups while I was here.  Previously we had:
      
      Capability * rts_eval (Capability *cap, HaskellObj p, /*out*/HaskellObj *ret);
      
      but this API is particularly error-prone: if you forget to discard the
      Capability * you passed in and use the return value instead, then
      you're in for subtle bugs with +RTS -N later on.  So I changed all
      these functions to this form:
      
      void rts_eval (/* inout */ Capability **cap,
                     /* in    */ HaskellObj p,
                     /* out */   HaskellObj *ret)
      
      It's much harder to use this version incorrectly, because you have to
      pass the Capability in by reference.
      8b75acd3
  32. 01 Dec, 2011 1 commit
    • Simon Marlow's avatar
      Fix a scheduling bug in the threaded RTS · 6d18141d
      Simon Marlow authored
      The parallel GC was using setContextSwitches() to stop all the other
      threads, which sets the context_switch flag on every Capability.  That
      had the side effect of causing every Capability to also switch
      threads, and since GCs can be much more frequent than context
      switches, this increased the context switch frequency.  When context
      switches are expensive (because the switch is between two bound
      threads or a bound and unbound thread), the difference is quite
      noticeable.
      
      The fix is to have a separate flag to indicate that a Capability
      should stop and return to the scheduler, but not switch threads.  I've
      called this the "interrupt" flag.
      6d18141d
  33. 29 Nov, 2011 1 commit
    • Simon Marlow's avatar
      Make profiling work with multiple capabilities (+RTS -N) · 50de6034
      Simon Marlow authored
      This means that both time and heap profiling work for parallel
      programs.  Main internal changes:
      
        - CCCS is no longer a global variable; it is now another
          pseudo-register in the StgRegTable struct.  Thus every
          Capability has its own CCCS.
      
        - There is a new built-in CCS called "IDLE", which records ticks for
          Capabilities in the idle state.  If you profile a single-threaded
          program with +RTS -N2, you'll see about 50% of time in "IDLE".
      
        - There is appropriate locking in rts/Profiling.c to protect the
          shared cost-centre-stack data structures.
      
      This patch does enough to get it working, I have cut one big corner:
      the cost-centre-stack data structure is still shared amongst all
      Capabilities, which means that multiple Capabilities will race when
      updating the "allocations" and "entries" fields of a CCS.  Not only
      does this give unpredictable results, but it runs very slowly due to
      cache line bouncing.
      
      It is strongly recommended that you use -fno-prof-count-entries to
      disable the "entries" count when profiling parallel programs. (I shall
      add a note to this effect to the docs).
      50de6034
  34. 25 Nov, 2011 1 commit
    • Simon Marlow's avatar
      Time handling overhaul · 6b109851
      Simon Marlow authored
      Terminology cleanup: the type "Ticks" has been renamed "Time", which
      is an StgWord64 in units of TIME_RESOLUTION (currently nanoseconds).
      The terminology "tick" is now used consistently to mean the interval
      between timer signals.
      
      The ticker now always ticks in realtime (actually CLOCK_MONOTONIC if
      we have it).  Before it used CPU time in the non-threaded RTS and
      realtime in the threaded RTS, but I've discovered that the CPU timer
      has terrible resolution (at least on Linux) and isn't much use for
      profiling.  So now we always use realtime.  This should also fix
      
      The default tick interval is now 10ms, except when profiling where we
      drop it to 1ms.  This gives more accurate profiles without affecting
      runtime too much (<1%).
      
      Lots of cleanups - the resolution of Time is now in one place
      only (Rts.h) rather than having calculations that depend on the
      resolution scattered all over the RTS.  I hope I found them all.
      6b109851