1. 02 Dec, 2016 1 commit
    • Alexander Vershilov's avatar
      Install toplevel handler inside fork. · 895a131f
      Alexander Vershilov authored
      When rts is forked it doesn't update toplevel handler, so UserInterrupt
      exception is sent to Thread1 that doesn't exist in forked process.
      
      We install toplevel handler when fork so signal will be delivered to the
      new main thread.
      
      Fixes #12903
      
      Reviewers: simonmar, austin, erikd, bgamari
      
      Reviewed By: bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2770
      
      GHC Trac Issues: #12903
      895a131f
  2. 29 Nov, 2016 1 commit
  3. 18 Nov, 2016 1 commit
  4. 29 Oct, 2016 1 commit
    • Simon Marlow's avatar
      Fix a bug in parallel GC synchronisation · 4e088b49
      Simon Marlow authored
      Summary:
      The problem boils down to global variables: in particular gc_threads[],
      which was being modified by a subsequent GC before the previous GC had
      finished with it.  The fix is to not use global variables.
      
      This was causing setnumcapabilities001 to fail (again!).  It's an old
      bug though.
      
      Test Plan:
      Ran setnumcapabilities001 in a loop for a couple of hours.  Before this
      patch it had been failing after a few minutes.  Not a very scientific
      test, but it's the best I have.
      
      Reviewers: bgamari, austin, fryguybob, niteria, erikd
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2654
      4e088b49
  5. 28 Oct, 2016 1 commit
    • Simon Marlow's avatar
      Make it possible to use +RTS -qn without -N · aae2b3d5
      Simon Marlow authored
      It's entirely reasonable to set +RTS -qn without setting -N, because the
      program might later call setNumCapabilities.  If we disallow it, there's
      no way to use -qn on programs that use setNumCapabilities.
      aae2b3d5
  6. 22 Oct, 2016 1 commit
    • Simon Marlow's avatar
      Fix failure in setnumcapabilities001 (#12728) · acc98510
      Simon Marlow authored
      The value of enabled_capabilities can change across a call to
      requestSync(), and we were erroneously using an old value, causing
      things to go wrong later.  It manifested as an assertion failure, I'm
      not sure whether there are worse consequences or not, but we should
      get this fix into 8.0.2 anyway.
      
      The failure didn't happen for me because it only shows up on machines
      with fewer than 4 processors, due to the new logic to enable -qn
      automatically.  I've bumped the test parameter 8 to make it more
      likely to exercise that code.
      
      Test Plan: Ran setnumcapabilities001 many times
      
      Reviewers: niteria, austin, erikd, rwbarton, bgamari
      
      Reviewed By: bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2617
      
      GHC Trac Issues: #12728
      acc98510
  7. 09 Oct, 2016 1 commit
    • Simon Marlow's avatar
      Default +RTS -qn to the number of cores · 6c47f2ef
      Simon Marlow authored
      Setting a -N value that is too large has a dramatic negative effect on
      performance, but the new -qn flag can mitigate the worst of the effects
      by limiting the number of GC threads.
      
      So now, if you don't explcitly set +RTS -qn, and you set -N larger than
      the number of cores (or use setNumCapabilities to do the same), we'll
      default -qn to the number of cores.
      
      These are the results from nofib/parallel on my 4-core (2 cores x 2
      threads) i7 laptop, comparing -N8 before and after this change.
      
      ```
      ------------------------------------------------------------------------
              Program           Size    Allocs   Runtime   Elapsed  TotalMem
      ------------------------------------------------------------------------
         blackscholes          +0.0%     +0.0%    -72.5%    -72.0%     +9.5%
                coins          +0.0%     -0.0%    -73.7%    -72.2%     -0.8%
               mandel          +0.0%     +0.0%    -76.4%    -75.4%     +3.3%
              matmult          +0.0%    +15.5%    -26.8%    -33.4%     +1.0%
                nbody          +0.0%     +2.4%     +0.7%     0.076      0.0%
               parfib          +0.0%     -8.5%    -33.2%    -31.5%     +2.0%
              partree          +0.0%     -0.0%    -60.4%    -56.8%     +5.7%
                 prsa          +0.0%     -0.0%    -65.4%    -60.4%      0.0%
               queens          +0.0%     +0.2%    -58.8%    -58.8%     -1.5%
                  ray          +0.0%     -1.5%    -88.7%    -85.6%     -3.6%
             sumeuler          +0.0%     -0.0%    -47.8%    -46.9%      0.0%
      ------------------------------------------------------------------------
                  Min          +0.0%     -8.5%    -88.7%    -85.6%     -3.6%
                  Max          +0.0%    +15.5%     +0.7%    -31.5%     +9.5%
       Geometric Mean          +0.0%     +0.6%    -61.4%    -63.1%     +1.4%
      ```
      
      Test Plan: validate, nofib/parallel benchmarks
      
      Reviewers: niteria, ezyang, nh2, austin, erikd, trofi, bgamari
      
      Reviewed By: trofi, bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2580
      
      GHC Trac Issues: #9221
      6c47f2ef
  8. 12 Sep, 2016 1 commit
    • Simon Marlow's avatar
      Add hs_try_putmvar() · 454033b5
      Simon Marlow authored
      Summary:
      This is a fast, non-blocking, asynchronous, interface to tryPutMVar that
      can be called from C/C++.
      
      It's useful for callback-based C/C++ APIs: the idea is that the callback
      invokes hs_try_putmvar(), and the Haskell code waits for the callback to
      run by blocking in takeMVar.
      
      The callback doesn't block - this is often a requirement of
      callback-based APIs.  The callback wakes up the Haskell thread with
      minimal overhead and no unnecessary context-switches.
      
      There are a couple of benchmarks in
      testsuite/tests/concurrent/should_run.  Some example results comparing
      hs_try_putmvar() with using a standard foreign export:
      
          ./hs_try_putmvar003 1 64 16 100 +RTS -s -N4     0.49s
          ./hs_try_putmvar003 2 64 16 100 +RTS -s -N4     2.30s
      
      hs_try_putmvar() is 4x faster for this workload (see the source for
      hs_try_putmvar003.hs for details of the workload).
      
      An alternative solution is to use the IO Manager for this.  We've tried
      it, but there are problems with that approach:
      * Need to create a new file descriptor for each callback
      * The IO Manger thread(s) become a bottleneck
      * More potential for things to go wrong, e.g. throwing an exception in
        an IO Manager callback kills the IO Manager thread.
      
      Test Plan: validate; new unit tests
      
      Reviewers: niteria, erikd, ezyang, bgamari, austin, hvr
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2501
      454033b5
  9. 05 Aug, 2016 1 commit
    • Simon Marlow's avatar
      Another try to get thread migration right · 89fa4e96
      Simon Marlow authored
      Summary:
      This is surprisingly tricky.  There were linked list bugs in the
      previous version (D2430) that showed up as a test failure in
      setnumcapabilities001 (that's a great stress test!).
      
      This new version uses a different strategy that doesn't suffer from
      the problem that @ezyang pointed out in D2430.  We now pre-calculate
      how many threads to keep for this capability, and then migrate any
      surplus threads off the front of the queue, taking care to account for
      threads that can't be migrated.
      
      Test Plan:
      1. setnumcapabilities001 stress test with sanity checking (+RTS -DS) turned on:
      
      ```
      cd testsuite/tests/concurrent/should_run
      make TEST=setnumcapabilities001 WAY=threaded1 EXTRA_HC_OPTS=-with-rtsopts=-DS CLEANUP=0
      while true; do ./setnumcapabilities001.run/setnumcapabilities001 4 9 2000 || break; done
      ```
      
      2. The test case from #12419
      
      Reviewers: niteria, ezyang, rwbarton, austin, bgamari, erikd
      
      Subscribers: thomie, ezyang
      
      Differential Revision: https://phabricator.haskell.org/D2441
      
      GHC Trac Issues: #12419
      89fa4e96
  10. 03 Aug, 2016 2 commits
    • Simon Marlow's avatar
      Fix to thread migration · 988ad8ba
      Simon Marlow authored
      Summary:
      If we had 2 threads on the run queue, say [A,B], and B is bound to the
      current Task, then we would fail to migrate any threads.  This fixes it
      so that we would migrate A in that case.
      
      This will help parallelism a bit in programs that have lots of bound
      threads.
      
      Test Plan:
      Test program in #12419, which is actually not a great program but it
      does behave a bit better after this change.
      
      Reviewers: ezyang, niteria, bgamari, austin, erikd
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2430
      
      GHC Trac Issues: #12419
      988ad8ba
    • Simon Marlow's avatar
      Track the lengths of the thread queues · 55f5aed7
      Simon Marlow authored
      Summary:
      Knowing the length of the run queue in O(1) time is useful: for example
      we don't have to traverse the run queue to know how many threads we have
      to migrate in schedulePushWork().
      
      Test Plan: validate
      
      Reviewers: ezyang, erikd, bgamari, austin
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2437
      55f5aed7
  11. 27 Jul, 2016 1 commit
  12. 20 Jun, 2016 1 commit
  13. 10 Jun, 2016 1 commit
    • Simon Marlow's avatar
      NUMA support · 9e5ea67e
      Simon Marlow authored
      Summary:
      The aim here is to reduce the number of remote memory accesses on
      systems with a NUMA memory architecture, typically multi-socket servers.
      
      Linux provides a NUMA API for doing two things:
      * Allocating memory local to a particular node
      * Binding a thread to a particular node
      
      When given the +RTS --numa flag, the runtime will
      * Determine the number of NUMA nodes (N) by querying the OS
      * Assign capabilities to nodes, so cap C is on node C%N
      * Bind worker threads on a capability to the correct node
      * Keep a separate free lists in the block layer for each node
      * Allocate the nursery for a capability from node-local memory
      * Allocate blocks in the GC from node-local memory
      
      For example, using nofib/parallel/queens on a 24-core 2-socket machine:
      
      ```
      $ ./Main 15 +RTS -N24 -s -A64m
        Total   time  173.960s  (  7.467s elapsed)
      
      $ ./Main 15 +RTS -N24 -s -A64m --numa
        Total   time  150.836s  (  6.423s elapsed)
      ```
      
      The biggest win here is expected to be allocating from node-local
      memory, so th...
      9e5ea67e
  14. 17 May, 2016 1 commit
    • Erik de Castro Lopo's avatar
      rts: More const correct-ness fixes · 33c029dd
      Erik de Castro Lopo authored
      In addition to more const-correctness fixes this patch fixes an
      infelicity of the previous const-correctness patch (995cf0f3) which
      left `UNTAG_CLOSURE` taking a `const StgClosure` pointer parameter
      but returning a non-const pointer. Here we restore the original type
      signature of `UNTAG_CLOSURE` and add a new function
      `UNTAG_CONST_CLOSURE` which takes and returns a const `StgClosure`
      pointer and uses that wherever possible.
      
      Test Plan: Validate on Linux, OS X and Windows
      
      Reviewers: Phyx, hsyl20, bgamari, austin, simonmar, trofi
      
      Reviewed By: simonmar, trofi
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2231
      33c029dd
  15. 11 May, 2016 1 commit
    • Simon Marlow's avatar
      Fix ASSERT failure and re-enable setnumcapabilities001 · cfc5df43
      Simon Marlow authored
      The assertion failure was fairly benign, I think, but this fixes it.
      I've been running the test repeatedly for the last 30 mins and it hasn't
      triggered.
      
      There are other problems exposed by this test (see #12038), but I've
      worked around those in the test itself for now.
      
      I also copied the relevant bits of the parallel library here so that we
      don't need parallel for the test to run.
      cfc5df43
  16. 10 May, 2016 1 commit
    • Simon Marlow's avatar
      Fix a crash in requestSync() · ea3d1efb
      Simon Marlow authored
      It was possible for a thread to read invalid memory after a conflict
      when multiple threads were synchronising.
      
      I haven't been successful in constructing a test case that triggers
      this, but we have some internal code that ran into it.
      ea3d1efb
  17. 04 May, 2016 3 commits
    • Erik de Castro Lopo's avatar
      rts: Replace `nat` with `uint32_t` · db9de7eb
      Erik de Castro Lopo authored
      The `nat` type was an alias for `unsigned int` with a comment saying
      it was at least 32 bits. We keep the typedef in case client code is
      using it but mark it as deprecated.
      
      Test Plan: Validated on Linux, OS X and Windows
      
      Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20
      
      Differential Revision: https://phabricator.haskell.org/D2166
      db9de7eb
    • Simon Marlow's avatar
      schedulePushWork: avoid unnecessary wakeups · 1fa92ca9
      Simon Marlow authored
      This function had some pathalogically bad behaviour: if we had 2 threads
      on the current capability and 23 other idle capabilities, we would
      
      * grab all 23 capabilities
      * migrate one Haskell thread to one of them
      * wake up a worker on *all* 23 other capabilities.
      
      This lead to a lot of unnecessary wakeups when using large -N values.
      
      Now, we
      
      * Count how many capabilities we need to wake up
      * Start from cap->no+1, so that we don't overload low-numbered capabilities
      * Only wake up capabilities that we migrated a thread to (unless we have
        sparks to steal)
      
      This results in a pretty dramatic improvement in our production system.
      1fa92ca9
    • Simon Marlow's avatar
      Allow limiting the number of GC threads (+RTS -qn<n>) · 76ee2607
      Simon Marlow authored
      This allows the GC to use fewer threads than the number of capabilities.
      At each GC, we choose some of the capabilities to be "idle", which means
      that the thread running on that capability (if any) will sleep for the
      duration of the GC, and the other threads will do its work.  We choose
      capabilities that are already idle (if any) to be the idle capabilities.
      
      The idea is that this helps in the following situation:
      
      * We want to use a large -N value so as to make use of hyperthreaded
        cores
      * We use a large heap size, so GC is infrequent
      * But we don't want to use all -N threads in the GC, because that
        thrashes the memory too much.
      
      See docs for usage.
      76ee2607
  18. 10 Apr, 2016 1 commit
  19. 07 Feb, 2016 1 commit
  20. 04 Dec, 2015 1 commit
  21. 22 Oct, 2015 1 commit
  22. 15 Jul, 2015 1 commit
  23. 10 Jul, 2015 1 commit
  24. 26 Jun, 2015 1 commit
    • Simon Marlow's avatar
      Fix deadlock (#10545) · 111ba4be
      Simon Marlow authored
      yieldCapability() was not prepared to be called by a Task that is not
      either a worker or a bound Task.  This could happen if we ended up in
      yieldCapability via this call stack:
      
      performGC()
      scheduleDoGC()
      requestSync()
      yieldCapability()
      
      and there were a few other ways this could happen via requestSync.
      The fix is to handle this case in yieldCapability(): when the Task is
      not a worker or a bound Task, we put it on the returning_workers
      queue, where it will be woken up again.
      
      Summary of changes:
      
      * `yieldCapability`: factored out subroutine waitForWorkerCapability`
      * `waitForReturnCapability` renamed to `waitForCapability`, and
        factored out subroutine `waitForReturnCapability`
      * `releaseCapabilityAndQueue` worker renamed to `enqueueWorker`, does
        not take a lock and no longer tests if `!isBoundTask()`
      * `yieldCapability` adjusted for refactorings, only change in behavior
        is when it is not a worker or bound task.
      
      Test Plan:
      * new test concurrent/should_run/performGC
      * validate
      
      Reviewers: niteria, austin, ezyang, bgamari
      
      Subscribers: thomie, bgamari
      
      Differential Revision: https://phabricator.haskell.org/D997
      
      GHC Trac Issues: #10545
      111ba4be
  25. 01 Jun, 2015 1 commit
    • Simon Marlow's avatar
      Don't call DEAD_WEAK finalizer again on shutdown (#7170) · dfdc50d6
      Simon Marlow authored
      Summary:
      There's a race condition like this:
      
        # A foreign pointer gets promoted to the last generation
        # It has its finalizer called manually
        # We start shutting down the runtime in `hs_exit_` from the main
          thread
        # A minor GC starts running (`scheduleDoGC`) on one of the threads
        # The minor GC notices that we're in `SCHED_INTERRUPTING` state and
          advances to `SCHED_SHUTTING_DOWN`
        # The main thread tries to do major GC (with `scheduleDoGC`), but it
          exits early because we're in `SCHED_SHUTTING_DOWN` state
        # We end up with a `DEAD_WEAK` left on the list of weak pointers of
          the last generation, because it relied on major GC removing it from
          that list
      
      This change:
        * Ignores DEAD_WEAK finalizers when shutting down
        * Makes the major GC on shutdown more likely
        * Fixes a bogus assert
      
      Test Plan:
      before this diff https://ghc.haskell.org/trac/ghc/ticket/7170#comment:5
      reproduced and after it doesn't
      
      Reviewers: ezyang, austin, simonmar
      
      Reviewed By: simonmar
      
      Subscribers: bgamari, thomie
      
      Differential Revision: https://phabricator.haskell.org/D921
      
      GHC Trac Issues: #7170
      dfdc50d6
  26. 23 Feb, 2015 1 commit
  27. 25 Nov, 2014 2 commits
    • Simon Marlow's avatar
      Add +RTS -n<size>: divide the nursery into chunks · 452eb80f
      Simon Marlow authored
      See the documentation for details.
      452eb80f
    • Simon Marlow's avatar
      Make clearNursery free · e22bc0de
      Simon Marlow authored
      Summary:
      clearNursery resets all the bd->free pointers of nursery blocks to
      make the blocks empty.  In profiles we've seen clearNursery taking
      significant amounts of time particularly with large -N and -A values.
      
      This patch moves the work of clearNursery to the point at which we
      actually need the new block, thereby introducing an invariant that
      blocks to the right of the CurrentNursery pointer still need their
      bd->free pointer reset.  This should make things faster overall,
      because we don't need to clear blocks that we don't use.
      
      Test Plan: validate
      
      Reviewers: AndreasVoellmy, ezyang, austin
      
      Subscribers: thomie, carter, ezyang, simonmar
      
      Differential Revision: https://phabricator.haskell.org/D318
      e22bc0de
  28. 18 Nov, 2014 1 commit
  29. 17 Nov, 2014 1 commit
  30. 12 Nov, 2014 1 commit
  31. 21 Oct, 2014 1 commit
  32. 29 Sep, 2014 1 commit
  33. 04 Aug, 2014 1 commit
  34. 28 Jul, 2014 1 commit
  35. 13 Jul, 2014 1 commit
  36. 04 May, 2014 1 commit