1. 12 Jan, 2019 1 commit
    • Ömer Sinan Ağacan's avatar
      Fix negative mutator time in GC stats in prof builds · 19670bc3
      Ömer Sinan Ağacan authored
      Because garbage collector calls `retainerProfile()` and `heapCensus()`,
      GC times normally include some of PROF times too. To fix this we have
      these lines:
      
          // heapCensus() is called by the GC, so RP and HC time are
          // included in the GC stats.  We therefore subtract them to
          // obtain the actual GC cpu time.
          stats.gc_cpu_ns      -=  prof_cpu;
          stats.gc_elapsed_ns  -=  prof_elapsed;
      
      These variables are later used for calculating GC time excluding the
      final GC (which should be attributed to EXIT).
      
          exit_gc_elapsed      = stats.gc_elapsed_ns - start_exit_gc_elapsed;
      
      The problem is if we subtract PROF times from `gc_elapsed_ns` and then
      subtract `start_exit_gc_elapsed` from the result, we end up subtracting
      PROF times twice, because `start_exit_gc_elapsed` also includes PROF
      times.
      
      We now subtract PROF times from GC after the calculations for EXIT and
      MUT times. The existing assertion that checks
      
          INIT + MUT + GC + EXIT = TOTAL
      
      now holds. When we subtract PROF numbers from GC, and a new assertion
      
          INIT + MUT + GC + PROF + EXIT = TOTAL
      
      also holds.
      
      Fixes #15897. New assertions added in this commit also revealed #16102,
      which is also fixed by this commit.
      19670bc3
  2. 19 Nov, 2018 2 commits
  3. 17 Nov, 2018 1 commit
  4. 29 Aug, 2018 1 commit
    • David Feuer's avatar
      Finish stable split · f48e276a
      David Feuer authored
      Long ago, the stable name table and stable pointer tables were one.
      Now, they are separate, and have significantly different
      implementations. I believe the time has come to finish the split
      that began in #7674.
      
      * Divide `rts/Stable` into `rts/StableName` and `rts/StablePtr`.
      
      * Give each table its own mutex.
      
      * Add FFI functions `hs_lock_stable_ptr_table` and
      `hs_unlock_stable_ptr_table` and document them.
        These are intended to replace the previously undocumented
      `hs_lock_stable_tables` and `hs_lock_stable_tables`,
        which are now documented as deprecated synonyms.
      
      * Make `eqStableName#` use pointer equality instead of unnecessarily
      comparing stable name table indices.
      
      Reviewers: simonmar, bgamari, erikd
      
      Reviewed By: bgamari
      
      Subscribers: rwbarton, carter
      
      GHC Trac Issues: #15555
      
      Differential Revision: https://phabricator.haskell.org/D5084
      f48e276a
  5. 12 Jun, 2018 1 commit
  6. 11 Apr, 2018 2 commits
  7. 10 Apr, 2018 1 commit
  8. 25 Mar, 2018 1 commit
    • Simon Marlow's avatar
      Run C finalizers incrementally during mutation · f7bbc343
      Simon Marlow authored
      With a large heap it's possible to build up a lot of finalizers
      between GCs.  We've observed GC spending up to 50% of its time running
      finalizers.  But there's no reason we have to run finalizers during
      GC, and especially no reason we have to block *all* the mutator
      threads while *one* GC thread runs finalizers one by one.
      
      I thought about a bunch of alternative ways to handle this, which are
      documented along with runSomeFinalizers() in Weak.c.  The approach I
      settled on is to have a capability run finalizers if it is idle.  So
      running finalizers is like a low-priority background thread. This
      requires some minor scheduler changes, but not much.  In the future we
      might be able to move more GC work into here (I have my eye on freeing
      large blocks, for example).
      
      Test Plan:
      * validate
      * tested on our system and saw reductions in GC pauses of 40-50%.
      
      Reviewers: bgamari, niteria, osa1, erikd
      
      Reviewed By: bgamari, osa1
      
      Subscribers: rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4521
      f7bbc343
  9. 07 Mar, 2018 1 commit
  10. 06 Mar, 2018 1 commit
  11. 02 Mar, 2018 2 commits
  12. 06 Feb, 2018 2 commits
  13. 18 Jul, 2017 1 commit
    • Simon Marlow's avatar
      Fix a missing getNewNursery(), and related cleanup · 12ae1fa5
      Simon Marlow authored
      Summary:
      When we use nursery chunks with +RTS -n<size>, when the current nursery
      runs out we have to check whether there's another chunk available with
      getNewNursery().  There was one place we weren't doing this: the ad-hoc
      heap check in scheduleProcessInbox().
      
      The impact of the bug was that we would GC too early when using nursery
      chunks, especially in programs that used messages (throwTo between
      capabilities could do this, also hs_try_putmvar()).
      
      Test Plan: validate, also local testing in our application
      
      Reviewers: bgamari, niteria, austin, erikd
      
      Subscribers: rwbarton, thomie
      
      Differential Revision: https://phabricator.haskell.org/D3749
      12ae1fa5
  14. 18 Jun, 2017 1 commit
  15. 08 Jun, 2017 1 commit
    • Simon Marlow's avatar
      Fix a lost-wakeup bug in BLACKHOLE handling (#13751) · 59847290
      Simon Marlow authored
      Summary:
      The problem occurred when
      * Threads A & B evaluate the same thunk
      * Thread A context-switches, so the thunk gets blackholed
      * Thread C enters the blackhole, creates a BLOCKING_QUEUE attached to
        the blackhole and thread A's `tso->bq` queue
      * Thread B updates the blackhole with a value, overwriting the BLOCKING_QUEUE
      * We GC, replacing A's update frame with stg_enter_checkbh
      * Throw an exception in A, which ignores the stg_enter_checkbh frame
      
      Now we have C blocked on A's tso->bq queue, but we forgot to check the
      queue because the stg_enter_checkbh frame has been thrown away by the
      exception.
      
      The solution and alternative designs are discussed in Note [upd-black-hole].
      
      This also exposed a bug in the interpreter, whereby we were sometimes
      context-switching without calling `threadPaused()`.  I've fixed this
      and added some Notes.
      
      Test Plan:
      * `cd testsuite/tests/concurrent && make slow`
      * validate
      
      Reviewers: niteria, bgamari, austin, erikd
      
      Reviewed By: erikd
      
      Subscribers: rwbarton, thomie
      
      GHC Trac Issues: #13751
      
      Differential Revision: https://phabricator.haskell.org/D3630
      59847290
  16. 10 May, 2017 1 commit
  17. 29 Apr, 2017 2 commits
  18. 05 Apr, 2017 1 commit
  19. 04 Apr, 2017 1 commit
  20. 01 Apr, 2017 1 commit
  21. 04 Feb, 2017 1 commit
    • Takenobu Tani's avatar
      Fix comment (old file names) in rts/ · 31bb85ff
      Takenobu Tani authored
      [skip ci]
      
      There ware some old file names (.lhs, ...) at comments.
      
      * rts/win32/ThrIOManager.c
        - Conc.lhs -> Conc.hs
      
      * rts/PrimOps.cmm
        - ByteCodeLink.lhs -> ByteCodeLink.hs
        - StgMiscClosures.hc -> StgMiscClosures.cmm
      
      * rts/AutoApply.h
        - AutoApply.hc -> AutoApply.cmm
      
      * rts/HeapStackCheck.cmm
        - PrimOps.hc -> PrimOps.cmm
      
      * rts/LdvProfile.h
        - Updates.hc -> Updates.cmm
      
      * rts/Schedule.c
        - StgStartup.hc -> StgStartup.cmm
      
      * rts/Weak.c
        - StgMiscClosures.hc -> StgMiscClosures.cmm
      
      Reviewers: bgamari, austin, erikd, simonmar
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D3075
      31bb85ff
  22. 10 Jan, 2017 1 commit
  23. 02 Dec, 2016 1 commit
    • Alexander Vershilov's avatar
      Install toplevel handler inside fork. · 895a131f
      Alexander Vershilov authored
      When rts is forked it doesn't update toplevel handler, so UserInterrupt
      exception is sent to Thread1 that doesn't exist in forked process.
      
      We install toplevel handler when fork so signal will be delivered to the
      new main thread.
      
      Fixes #12903
      
      Reviewers: simonmar, austin, erikd, bgamari
      
      Reviewed By: bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2770
      
      GHC Trac Issues: #12903
      895a131f
  24. 29 Nov, 2016 1 commit
  25. 18 Nov, 2016 1 commit
  26. 29 Oct, 2016 1 commit
    • Simon Marlow's avatar
      Fix a bug in parallel GC synchronisation · 4e088b49
      Simon Marlow authored
      Summary:
      The problem boils down to global variables: in particular gc_threads[],
      which was being modified by a subsequent GC before the previous GC had
      finished with it.  The fix is to not use global variables.
      
      This was causing setnumcapabilities001 to fail (again!).  It's an old
      bug though.
      
      Test Plan:
      Ran setnumcapabilities001 in a loop for a couple of hours.  Before this
      patch it had been failing after a few minutes.  Not a very scientific
      test, but it's the best I have.
      
      Reviewers: bgamari, austin, fryguybob, niteria, erikd
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2654
      4e088b49
  27. 28 Oct, 2016 1 commit
    • Simon Marlow's avatar
      Make it possible to use +RTS -qn without -N · aae2b3d5
      Simon Marlow authored
      It's entirely reasonable to set +RTS -qn without setting -N, because the
      program might later call setNumCapabilities.  If we disallow it, there's
      no way to use -qn on programs that use setNumCapabilities.
      aae2b3d5
  28. 22 Oct, 2016 1 commit
    • Simon Marlow's avatar
      Fix failure in setnumcapabilities001 (#12728) · acc98510
      Simon Marlow authored
      The value of enabled_capabilities can change across a call to
      requestSync(), and we were erroneously using an old value, causing
      things to go wrong later.  It manifested as an assertion failure, I'm
      not sure whether there are worse consequences or not, but we should
      get this fix into 8.0.2 anyway.
      
      The failure didn't happen for me because it only shows up on machines
      with fewer than 4 processors, due to the new logic to enable -qn
      automatically.  I've bumped the test parameter 8 to make it more
      likely to exercise that code.
      
      Test Plan: Ran setnumcapabilities001 many times
      
      Reviewers: niteria, austin, erikd, rwbarton, bgamari
      
      Reviewed By: bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2617
      
      GHC Trac Issues: #12728
      acc98510
  29. 09 Oct, 2016 1 commit
    • Simon Marlow's avatar
      Default +RTS -qn to the number of cores · 6c47f2ef
      Simon Marlow authored
      Setting a -N value that is too large has a dramatic negative effect on
      performance, but the new -qn flag can mitigate the worst of the effects
      by limiting the number of GC threads.
      
      So now, if you don't explcitly set +RTS -qn, and you set -N larger than
      the number of cores (or use setNumCapabilities to do the same), we'll
      default -qn to the number of cores.
      
      These are the results from nofib/parallel on my 4-core (2 cores x 2
      threads) i7 laptop, comparing -N8 before and after this change.
      
      ```
      ------------------------------------------------------------------------
              Program           Size    Allocs   Runtime   Elapsed  TotalMem
      ------------------------------------------------------------------------
         blackscholes          +0.0%     +0.0%    -72.5%    -72.0%     +9.5%
                coins          +0.0%     -0.0%    -73.7%    -72.2%     -0.8%
               mandel          +0.0%     +0.0%    -76.4%    -75.4%     +3.3%
              matmult          +0.0%    +15.5%    -26.8%    -33.4%     +1.0%
                nbody          +0.0%     +2.4%     +0.7%     0.076      0.0%
               parfib          +0.0%     -8.5%    -33.2%    -31.5%     +2.0%
              partree          +0.0%     -0.0%    -60.4%    -56.8%     +5.7%
                 prsa          +0.0%     -0.0%    -65.4%    -60.4%      0.0%
               queens          +0.0%     +0.2%    -58.8%    -58.8%     -1.5%
                  ray          +0.0%     -1.5%    -88.7%    -85.6%     -3.6%
             sumeuler          +0.0%     -0.0%    -47.8%    -46.9%      0.0%
      ------------------------------------------------------------------------
                  Min          +0.0%     -8.5%    -88.7%    -85.6%     -3.6%
                  Max          +0.0%    +15.5%     +0.7%    -31.5%     +9.5%
       Geometric Mean          +0.0%     +0.6%    -61.4%    -63.1%     +1.4%
      ```
      
      Test Plan: validate, nofib/parallel benchmarks
      
      Reviewers: niteria, ezyang, nh2, austin, erikd, trofi, bgamari
      
      Reviewed By: trofi, bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2580
      
      GHC Trac Issues: #9221
      6c47f2ef
  30. 12 Sep, 2016 1 commit
    • Simon Marlow's avatar
      Add hs_try_putmvar() · 454033b5
      Simon Marlow authored
      Summary:
      This is a fast, non-blocking, asynchronous, interface to tryPutMVar that
      can be called from C/C++.
      
      It's useful for callback-based C/C++ APIs: the idea is that the callback
      invokes hs_try_putmvar(), and the Haskell code waits for the callback to
      run by blocking in takeMVar.
      
      The callback doesn't block - this is often a requirement of
      callback-based APIs.  The callback wakes up the Haskell thread with
      minimal overhead and no unnecessary context-switches.
      
      There are a couple of benchmarks in
      testsuite/tests/concurrent/should_run.  Some example results comparing
      hs_try_putmvar() with using a standard foreign export:
      
          ./hs_try_putmvar003 1 64 16 100 +RTS -s -N4     0.49s
          ./hs_try_putmvar003 2 64 16 100 +RTS -s -N4     2.30s
      
      hs_try_putmvar() is 4x faster for this workload (see the source for
      hs_try_putmvar003.hs for details of the workload).
      
      An alternative solution is to use the IO Manager for this.  We've tried
      it, but there are problems with that approach:
      * Need to create a new file descriptor for each callback
      * The IO Manger thread(s) become a bottleneck
      * More potential for things to go wrong, e.g. throwing an exception in
        an IO Manager callback kills the IO Manager thread.
      
      Test Plan: validate; new unit tests
      
      Reviewers: niteria, erikd, ezyang, bgamari, austin, hvr
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2501
      454033b5
  31. 05 Aug, 2016 1 commit
    • Simon Marlow's avatar
      Another try to get thread migration right · 89fa4e96
      Simon Marlow authored
      Summary:
      This is surprisingly tricky.  There were linked list bugs in the
      previous version (D2430) that showed up as a test failure in
      setnumcapabilities001 (that's a great stress test!).
      
      This new version uses a different strategy that doesn't suffer from
      the problem that @ezyang pointed out in D2430.  We now pre-calculate
      how many threads to keep for this capability, and then migrate any
      surplus threads off the front of the queue, taking care to account for
      threads that can't be migrated.
      
      Test Plan:
      1. setnumcapabilities001 stress test with sanity checking (+RTS -DS) turned on:
      
      ```
      cd testsuite/tests/concurrent/should_run
      make TEST=setnumcapabilities001 WAY=threaded1 EXTRA_HC_OPTS=-with-rtsopts=-DS CLEANUP=0
      while true; do ./setnumcapabilities001.run/setnumcapabilities001 4 9 2000 || break; done
      ```
      
      2. The test case from #12419
      
      Reviewers: niteria, ezyang, rwbarton, austin, bgamari, erikd
      
      Subscribers: thomie, ezyang
      
      Differential Revision: https://phabricator.haskell.org/D2441
      
      GHC Trac Issues: #12419
      89fa4e96
  32. 03 Aug, 2016 2 commits
    • Simon Marlow's avatar
      Fix to thread migration · 988ad8ba
      Simon Marlow authored
      Summary:
      If we had 2 threads on the run queue, say [A,B], and B is bound to the
      current Task, then we would fail to migrate any threads.  This fixes it
      so that we would migrate A in that case.
      
      This will help parallelism a bit in programs that have lots of bound
      threads.
      
      Test Plan:
      Test program in #12419, which is actually not a great program but it
      does behave a bit better after this change.
      
      Reviewers: ezyang, niteria, bgamari, austin, erikd
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2430
      
      GHC Trac Issues: #12419
      988ad8ba
    • Simon Marlow's avatar
      Track the lengths of the thread queues · 55f5aed7
      Simon Marlow authored
      Summary:
      Knowing the length of the run queue in O(1) time is useful: for example
      we don't have to traverse the run queue to know how many threads we have
      to migrate in schedulePushWork().
      
      Test Plan: validate
      
      Reviewers: ezyang, erikd, bgamari, austin
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2437
      55f5aed7
  33. 27 Jul, 2016 1 commit
  34. 20 Jun, 2016 1 commit