1. 30 Aug, 2014 2 commits
  2. 22 Aug, 2014 1 commit
  3. 20 Aug, 2014 2 commits
  4. 19 Aug, 2014 1 commit
    • AndreasVoellmy's avatar
      rts/base: Fix #9423 · f9f89b78
      AndreasVoellmy authored
      Summary:
      Fix #9423.
      
      The problem in #9423 is caused when code invoked by `hs_exit()` waits
      on all foreign calls to return, but some IO managers are in `safe` foreign
      calls and do not return. The previous design signaled to the timer manager
      (via its control pipe) that it should "die" and when the timer manager
      returned to Haskell-land, the Haskell code in timer manager then signalled
      to the IO manager threads that they should return from foreign calls and
      `die`. Unfortunately, in the shutdown sequence the timer manager is unable
      to return to Haskell-land fast enough and so the code that signals to the
      IO manager threads (via their control pipes) is never executed and the IO
      manager threads remain out in the foreign calls.
      
      This patch solves this problem by having the RTS signal to all the IO
      manager threads (via their control pipes; and in addition to signalling
      to the timer manager thread) that they should shutdown (in `ioManagerDie()`
      in `rts/Signals.c`. To do this, we arrange for each IO manager thread to
      register its control pipe with the RTS (in `GHC.Thread.startIOManagerThread`).
      In addition, `GHC.Thread.startTimerManagerThread` registers its control pipe.
      These are registered via C functions `setTimerManagerControlFd` (in
      `rts/Signals.c`) and `setIOManagerControlFd` (in `rts/Capability.c`). The IO
      manager control pipe file descriptors are stored in a new field of the
      `Capability_ struct`.
      
      Test Plan: See the notes on #9423 to recreate the problem and to verify that it no longer occurs with the fix.
      
      Auditors: simonmar
      
      Reviewers: simonmar, edsko, ezyang, austin
      
      Reviewed By: austin
      
      Subscribers: phaskell, simonmar, ezyang, carter, relrod
      
      Differential Revision: https://phabricator.haskell.org/D129
      
      GHC Trac Issues: #9423, #9284
      f9f89b78
  5. 28 Jul, 2014 1 commit
  6. 30 May, 2014 1 commit
  7. 17 Feb, 2014 1 commit
  8. 04 Sep, 2013 1 commit
    • Simon Marlow's avatar
      Don't move Capabilities in setNumCapabilities (#8209) · aa779e09
      Simon Marlow authored
      We have various problems with reallocating the array of Capabilities,
      due to threads in waitForReturnCapability that are already holding a
      pointer to a Capability.
      
      Rather than add more locking to make this safer, I decided it would be
      easier to ensure that we never move the Capabilities at all.  The
      capabilities array is now an array of pointers to Capabaility.  There
      are extra indirections, but it rarely matters - we don't often access
      Capabilities via the array, normally we already have a pointer to
      one.  I ran the parallel benchmarks and didn't see any difference.
      aa779e09
  9. 16 Jan, 2013 1 commit
  10. 25 Oct, 2012 1 commit
  11. 25 Jul, 2012 1 commit
  12. 07 Jun, 2012 2 commits
  13. 26 Apr, 2012 1 commit
    • Ian Lynagh's avatar
      Fix warnings on Win64 · 1dbe6d59
      Ian Lynagh authored
      Mostly this meant getting pointer<->int conversions to use the right
      sizes. lnat is now size_t, rather than unsigned long, as that seems a
      better match for how it's used.
      1dbe6d59
  14. 04 Apr, 2012 3 commits
    • Duncan Coutts's avatar
      Move trace of cap delete from shutdownCapability to freeCapability · e88f1625
      Duncan Coutts authored
      Will let us do final per-cap trace events from stat_exit().
      Otherwise we would end up with eventlogs with events for caps
      that have already been deleted.
      e88f1625
    • Duncan Coutts's avatar
      Calculate the total memory allocated on a per-capability basis · 8536f09c
      Duncan Coutts authored
      In addition to the existing global method. For now we just do
      it both ways and assert they give the same grand total. At some
      stage we can simplify the global method to just take the sum of
      the per-cap counters.
      8536f09c
    • Duncan Coutts's avatar
      Add eventlog/trace stuff for capabilities: create/delete/enable/disable · f9c2e854
      Duncan Coutts authored
      Now that we can adjust the number of capabilities on the fly, we need
      this reflected in the eventlog. Previously the eventlog had a single
      startup event that declared a static number of capabilities. Obviously
      that's no good anymore.
      
      For compatability we're keeping the EVENT_STARTUP but adding new
      EVENT_CAP_CREATE/DELETE. The EVENT_CAP_DELETE is actually just the old
      EVENT_SHUTDOWN but renamed and extended (using the existing mechanism
      to extend eventlog events in a compatible way). So we now emit both
      EVENT_STARTUP and EVENT_CAP_CREATE. One day we will drop EVENT_STARTUP.
      
      Since reducing the number of capabilities at runtime does not really
      delete them, it just disables them, then we also have new events for
      disable/enable.
      
      The old EVENT_SHUTDOWN was in the scheduler class of events. The new
      EVENT_CAP_* events are in the unconditional class, along with the
      EVENT_CAPSET_* ones. Knowing when capabilities are created and deleted
      is crucial to making sense of eventlogs, you always want those events.
      In any case, they're extremely low volume.
      f9c2e854
  15. 27 Feb, 2012 1 commit
  16. 13 Feb, 2012 1 commit
    • Simon Marlow's avatar
      Allocate pinned object blocks from the nursery, not the global · 67f4ab7e
      Simon Marlow authored
      allocator.
      
      Prompted by a benchmark posted to parallel-haskell@haskell.org by
      Andreas Voellmy <andreas.voellmy@gmail.com>.  This program exhibits
      contention for the block allocator when run with -N2 and greater
      without the fix:
      
      {-# LANGUAGE MagicHash, UnboxedTuples, BangPatterns #-}
      module Main where
      
      import Control.Monad
      import Control.Concurrent
      import System.Environment
      import GHC.IO
      import GHC.Exts
      import GHC.Conc
      
      main = do
       [m] <- fmap (fmap read) getArgs
       n <- getNumCapabilities
       ms <- replicateM n newEmptyMVar
       sequence [ forkIO $ busyWorkerB (m `quot` n) >> putMVar mv () | mv <- ms ]
       mapM takeMVar ms
      
      busyWorkerB :: Int -> IO ()
      busyWorkerB n_loops = go 0
        where go !n | n >= n_loops = return ()
                    | otherwise    =
                do p <- (IO $ \s ->
                          case newPinnedByteArray# 1024# s      of
                            { (# s', mbarr# #) ->
                                 (# s', () #)
                            }
                        )
                   go (n+1)
      67f4ab7e
  17. 16 Jan, 2012 1 commit
  18. 09 Jan, 2012 1 commit
  19. 15 Dec, 2011 1 commit
    • Simon Marlow's avatar
      Support for reducing the number of Capabilities with setNumCapabilities · 9bae7915
      Simon Marlow authored
      This patch allows setNumCapabilities to /reduce/ the number of active
      capabilities as well as increase it.  This is particularly tricky to
      do, because a Capability is a large data structure and ties into the
      rest of the system in many ways.  Trying to clean it all up would be
      extremely error prone.
      
      So instead, the solution is to mark the extra capabilities as
      "disabled".  This has the following consequences:
      
        - threads on a disabled capability are migrated away by the
          scheduler loop
      
        - disabled capabilities do not participate in GC
          (see scheduleDoGC())
      
        - No spark threads are created on this capability
          (see scheduleActivateSpark())
      
        - We do not attempt to migrate threads *to* a disabled
          capability (see schedulePushWork()).
      
      So a disabled capability should do no work, and does not participate
      in GC, although it remains alive in other respects.  For example, a
      blocked thread might wake up on a disabled capability, and it will get
      quickly migrated to a live capability.  A disabled capability can
      still initiate GC if necessary.  Indeed, it turns out to be hard to
      migrate bound threads, so we wait until the next GC to do this (see
      comments for details).
      9bae7915
  20. 13 Dec, 2011 1 commit
    • Simon Marlow's avatar
      New flag +RTS -qi<n>, avoid waking up idle Capabilities to do parallel GC · a02eb298
      Simon Marlow authored
      This is an experimental tweak to the parallel GC that avoids waking up
      a Capability to do parallel GC if we know that the capability has been
      idle for a (tunable) number of GC cycles.  The idea is that if you're
      only using a few Capabilities, there's no point waking up the ones
      that aren't busy.
      
      e.g. +RTS -qi3
      
      says "A Capability will participate in parallel GC if it was running
      at all since the last 3 GC cycles."
      
      Results are a bit hit and miss, and I don't completely understand why
      yet.  Hence, for now it is turned off by default, and also not
      documented except in the +RTS -? output.
      a02eb298
  21. 06 Dec, 2011 2 commits
    • Simon Marlow's avatar
      Allow the number of capabilities to be increased at runtime (#3729) · 92e7d6c9
      Simon Marlow authored
      At present the number of capabilities can only be *increased*, not
      decreased.  The latter presents a few more challenges!
      92e7d6c9
    • Simon Marlow's avatar
      Make forkProcess work with +RTS -N · 8b75acd3
      Simon Marlow authored
      Consider this experimental for the time being.  There are a lot of
      things that could go wrong, but I've verified that at least it works
      on the test cases we have.
      
      I also did some API cleanups while I was here.  Previously we had:
      
      Capability * rts_eval (Capability *cap, HaskellObj p, /*out*/HaskellObj *ret);
      
      but this API is particularly error-prone: if you forget to discard the
      Capability * you passed in and use the return value instead, then
      you're in for subtle bugs with +RTS -N later on.  So I changed all
      these functions to this form:
      
      void rts_eval (/* inout */ Capability **cap,
                     /* in    */ HaskellObj p,
                     /* out */   HaskellObj *ret)
      
      It's much harder to use this version incorrectly, because you have to
      pass the Capability in by reference.
      8b75acd3
  22. 01 Dec, 2011 1 commit
    • Simon Marlow's avatar
      Fix a scheduling bug in the threaded RTS · 6d18141d
      Simon Marlow authored
      The parallel GC was using setContextSwitches() to stop all the other
      threads, which sets the context_switch flag on every Capability.  That
      had the side effect of causing every Capability to also switch
      threads, and since GCs can be much more frequent than context
      switches, this increased the context switch frequency.  When context
      switches are expensive (because the switch is between two bound
      threads or a bound and unbound thread), the difference is quite
      noticeable.
      
      The fix is to have a separate flag to indicate that a Capability
      should stop and return to the scheduler, but not switch threads.  I've
      called this the "interrupt" flag.
      6d18141d
  23. 29 Nov, 2011 2 commits
    • Simon Marlow's avatar
      Add a new primop: getCCCS# :: State# s -> (# State# s, Addr# #) · 1f7433b7
      Simon Marlow authored
      Returns a pointer to the current cost-centre stack when profiling,
      NULL otherwise.
      1f7433b7
    • Simon Marlow's avatar
      Make profiling work with multiple capabilities (+RTS -N) · 50de6034
      Simon Marlow authored
      This means that both time and heap profiling work for parallel
      programs.  Main internal changes:
      
        - CCCS is no longer a global variable; it is now another
          pseudo-register in the StgRegTable struct.  Thus every
          Capability has its own CCCS.
      
        - There is a new built-in CCS called "IDLE", which records ticks for
          Capabilities in the idle state.  If you profile a single-threaded
          program with +RTS -N2, you'll see about 50% of time in "IDLE".
      
        - There is appropriate locking in rts/Profiling.c to protect the
          shared cost-centre-stack data structures.
      
      This patch does enough to get it working, I have cut one big corner:
      the cost-centre-stack data structure is still shared amongst all
      Capabilities, which means that multiple Capabilities will race when
      updating the "allocations" and "entries" fields of a CCS.  Not only
      does this give unpredictable results, but it runs very slowly due to
      cache line bouncing.
      
      It is strongly recommended that you use -fno-prof-count-entries to
      disable the "entries" count when profiling parallel programs. (I shall
      add a note to this effect to the docs).
      50de6034
  24. 26 Oct, 2011 1 commit
  25. 24 Oct, 2011 1 commit
  26. 05 Aug, 2011 1 commit
  27. 18 Jul, 2011 7 commits
    • Duncan Coutts's avatar
      Add new fully-accurate per-spark trace/eventlog events · 084b64f2
      Duncan Coutts authored
      Replaces the existing EVENT_RUN/STEAL_SPARK events with 7 new events
      covering all stages of the spark lifcycle:
        create, dud, overflow, run, steal, fizzle, gc
      
      The sampled spark events are still available. There are now two event
      classes for sparks, the sampled and the fully accurate. They can be
      enabled/disabled independently. By default +RTS -l includes the sampled
      but not full detail spark events. Use +RTS -lf-p to enable the detailed
      'f' and disable the sampled 'p' spark.
      
      Includes work by Mikolaj <mikolaj.konarski@gmail.com>
      084b64f2
    • Mikolaj Konarski's avatar
      5cc2670c
    • Duncan Coutts's avatar
      Add spark counter tracing · d77df1ca
      Duncan Coutts authored
      A new eventlog event containing 7 spark counters/statistics: sparks
      created, dud, overflowed, converted, GC'd, fizzled and remaining.
      These are maintained and logged separately for each capability.
      We log them at startup, on each GC (minor and major) and on shutdown.
      d77df1ca
    • Duncan Coutts's avatar
      Move allocation of spark pools into initCapability · 5d091088
      Duncan Coutts authored
      Rather than a separate phase of initSparkPools. It means all the spark
      stuff for a capability is initialisaed at the same time, which is then
      becomes a good place to stick an initial spark trace event.
      5d091088
    • Duncan Coutts's avatar
      Add assertion of the invariant for the spark counters · ddb47a91
      Duncan Coutts authored
      The invariant is: created = converted + remaining + gcd + fizzled
      Since sparks move between capabilities, we have to aggregate the
      counters over all capabilities. This in turn means we can only check
      the invariant at stable points where all but one capabilities are
      stopped. We can do this at shutdown time and before and after a global
      synchronised GC.
      ddb47a91
    • Duncan Coutts's avatar
      Classify overflowed sparks separately · fa8d20e6
      Duncan Coutts authored
      When you use `par` to make a spark, if the spark pool on the current
      capability is full then the spark is discarded. This represents a
      loss of potential parallelism and it also means there are simply a
      lot of sparks around. Both are things that might be of concern to a
      programmer when tuning a parallel program that uses par.
      
      The "+RTS -s" stats command now reports overflowed sparks, e.g.
      SPARKS: 100001 (15521 converted, 84480 overflowed, 0 dud, 0 GC'd, 0 fizzled)
      fa8d20e6
    • Duncan Coutts's avatar
      Use a struct for the set of spark counters · 556557eb
      Duncan Coutts authored
      556557eb