1. 08 Jun, 2017 1 commit
    • Simon Marlow's avatar
      Fix a lost-wakeup bug in BLACKHOLE handling (#13751) · 59847290
      Simon Marlow authored
      Summary:
      The problem occurred when
      * Threads A & B evaluate the same thunk
      * Thread A context-switches, so the thunk gets blackholed
      * Thread C enters the blackhole, creates a BLOCKING_QUEUE attached to
        the blackhole and thread A's `tso->bq` queue
      * Thread B updates the blackhole with a value, overwriting the BLOCKING_QUEUE
      * We GC, replacing A's update frame with stg_enter_checkbh
      * Throw an exception in A, which ignores the stg_enter_checkbh frame
      
      Now we have C blocked on A's tso->bq queue, but we forgot to check the
      queue because the stg_enter_checkbh frame has been thrown away by the
      exception.
      
      The solution and alternative designs are discussed in Note [upd-black-hole].
      
      This also exposed a bug in the interpreter, whereby we were sometimes
      context-switching without calling `threadPaused()`.  I've fixed this
      and added some Notes.
      
      Test Plan:
      * `cd testsuite/tests/concurrent && make slow`
      * validate
      
      Reviewers: niteria, bgamari, austin, erikd
      
      Reviewed By: erikd
      
      Subscribers: rwbarton, thomie
      
      GHC Trac Issues: #13751
      
      Differential Revision: https://phabricator.haskell.org/D3630
      59847290
  2. 01 Mar, 2017 1 commit
    • David Feuer's avatar
      Change catch# demand signature · 701256df
      David Feuer authored
      * Give `catch#` a lazy demand signature, to make it more honest.
      
      * Make `catchException` and `catchAny` force their arguments so they
      actually behave as advertised.
      
      * Use `catch` rather than `catchException` in `forkIO`, `forkOn`, and
      `forkOS` to avoid losing exceptions.
      
      Fixes #13330
      
      Reviewers: rwbarton, simonpj, simonmar, bgamari, hvr, austin
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D3244
      701256df
  3. 22 Jan, 2017 1 commit
  4. 22 Oct, 2016 2 commits
    • Matthew Pickering's avatar
      Skip T5611 on OSX as it fails non-deterministically. · a662f46c
      Matthew Pickering authored
      Reviewers: austin, bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2622
      
      GHC Trac Issues: #12751
      a662f46c
    • Simon Marlow's avatar
      Fix failure in setnumcapabilities001 (#12728) · acc98510
      Simon Marlow authored
      The value of enabled_capabilities can change across a call to
      requestSync(), and we were erroneously using an old value, causing
      things to go wrong later.  It manifested as an assertion failure, I'm
      not sure whether there are worse consequences or not, but we should
      get this fix into 8.0.2 anyway.
      
      The failure didn't happen for me because it only shows up on machines
      with fewer than 4 processors, due to the new logic to enable -qn
      automatically.  I've bumped the test parameter 8 to make it more
      likely to exercise that code.
      
      Test Plan: Ran setnumcapabilities001 many times
      
      Reviewers: niteria, austin, erikd, rwbarton, bgamari
      
      Reviewed By: bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2617
      
      GHC Trac Issues: #12728
      acc98510
  5. 12 Sep, 2016 1 commit
    • Simon Marlow's avatar
      Add hs_try_putmvar() · 454033b5
      Simon Marlow authored
      Summary:
      This is a fast, non-blocking, asynchronous, interface to tryPutMVar that
      can be called from C/C++.
      
      It's useful for callback-based C/C++ APIs: the idea is that the callback
      invokes hs_try_putmvar(), and the Haskell code waits for the callback to
      run by blocking in takeMVar.
      
      The callback doesn't block - this is often a requirement of
      callback-based APIs.  The callback wakes up the Haskell thread with
      minimal overhead and no unnecessary context-switches.
      
      There are a couple of benchmarks in
      testsuite/tests/concurrent/should_run.  Some example results comparing
      hs_try_putmvar() with using a standard foreign export:
      
          ./hs_try_putmvar003 1 64 16 100 +RTS -s -N4     0.49s
          ./hs_try_putmvar003 2 64 16 100 +RTS -s -N4     2.30s
      
      hs_try_putmvar() is 4x faster for this workload (see the source for
      hs_try_putmvar003.hs for details of the workload).
      
      An alternative solution is to use the IO Manager for this.  We've tried
      it, but there are problems with that approach:
      * Need to create a new file descriptor for each callback
      * The IO Manger thread(s) become a bottleneck
      * More potential for things to go wrong, e.g. throwing an exception in
        an IO Manager callback kills the IO Manager thread.
      
      Test Plan: validate; new unit tests
      
      Reviewers: niteria, erikd, ezyang, bgamari, austin, hvr
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2501
      454033b5
  6. 29 Jun, 2016 1 commit
    • thomie's avatar
      Testsuite: use ignore_stderr/stdout instead of ignore_output · 1084d375
      thomie authored
      The problem with ignore_output is that it hides errors for WAY=ghci.
      GHCi always returns with exit code 0 (unless it is broken itself).
      
      For example: ghci015 must have been failing with compile errors for
      years, but we didn't notice because all output was ignored.
      
      Therefore, replace all uses of ignore_output with either ignore_stderr
      or ignore_stdout. In some cases I opted for adding the expected output.
      
      Update submodule hpc and stm.
      
      Reviewed by: simonmar
      
      Differential Revision: https://phabricator.haskell.org/D2367
      1084d375
  7. 10 Jun, 2016 1 commit
    • Simon Marlow's avatar
      NUMA support · 9e5ea67e
      Simon Marlow authored
      Summary:
      The aim here is to reduce the number of remote memory accesses on
      systems with a NUMA memory architecture, typically multi-socket servers.
      
      Linux provides a NUMA API for doing two things:
      * Allocating memory local to a particular node
      * Binding a thread to a particular node
      
      When given the +RTS --numa flag, the runtime will
      * Determine the number of NUMA nodes (N) by querying the OS
      * Assign capabilities to nodes, so cap C is on node C%N
      * Bind worker threads on a capability to the correct node
      * Keep a separate free lists in the block layer for each node
      * Allocate the nursery for a capability from node-local memory
      * Allocate blocks in the GC from node-local memory
      
      For example, using nofib/parallel/queens on a 24-core 2-socket machine:
      
      ```
      $ ./Main 15 +RTS -N24 -s -A64m
        Total   time  173.960s  (  7.467s elapsed)
      
      $ ./Main 15 +RTS -N24 -s -A64m --numa
        Total   time  150.836s  (  6.423s elapsed)
      ```
      
      The biggest win here is expected to be allocating from node-local
      memory, so that means programs using a large -A value (as here).
      
      According to perf, on this program the number of remote memory accesses
      were reduced by more than 50% by using `--numa`.
      
      Test Plan:
      * validate
      * There's a new flag --debug-numa=<n> that pretends to do NUMA without
        actually making the OS calls, which is useful for testing the code
        on non-NUMA systems.
      * TODO: I need to add some unit tests
      
      Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2199
      9e5ea67e
  8. 11 May, 2016 2 commits
  9. 25 Feb, 2016 1 commit
    • thomie's avatar
      Testsuite: for tests that use TH, omit *all* prof_ways · e02b8c8d
      thomie authored
      Instead of just profasm and profthreaded. And at least until
      -fexternal-interpreter is the default.
      
      Also:
        * WAY=profc doesn't exist anymore.
        * Omit all threaded_ways for conc039, not just a few.
      e02b8c8d
  10. 17 Dec, 2015 1 commit
  11. 30 Oct, 2015 1 commit
  12. 09 Sep, 2015 1 commit
  13. 02 Sep, 2015 2 commits
  14. 21 Jul, 2015 1 commit
  15. 18 Jul, 2015 1 commit
  16. 13 Jul, 2015 1 commit
  17. 26 Jun, 2015 1 commit
    • Simon Marlow's avatar
      Fix deadlock (#10545) · 111ba4be
      Simon Marlow authored
      yieldCapability() was not prepared to be called by a Task that is not
      either a worker or a bound Task.  This could happen if we ended up in
      yieldCapability via this call stack:
      
      performGC()
      scheduleDoGC()
      requestSync()
      yieldCapability()
      
      and there were a few other ways this could happen via requestSync.
      The fix is to handle this case in yieldCapability(): when the Task is
      not a worker or a bound Task, we put it on the returning_workers
      queue, where it will be woken up again.
      
      Summary of changes:
      
      * `yieldCapability`: factored out subroutine waitForWorkerCapability`
      * `waitForReturnCapability` renamed to `waitForCapability`, and
        factored out subroutine `waitForReturnCapability`
      * `releaseCapabilityAndQueue` worker renamed to `enqueueWorker`, does
        not take a lock and no longer tests if `!isBoundTask()`
      * `yieldCapability` adjusted for refactorings, only change in behavior
        is when it is not a worker or bound task.
      
      Test Plan:
      * new test concurrent/should_run/performGC
      * validate
      
      Reviewers: niteria, austin, ezyang, bgamari
      
      Subscribers: thomie, bgamari
      
      Differential Revision: https://phabricator.haskell.org/D997
      
      GHC Trac Issues: #10545
      111ba4be
  18. 19 Jun, 2015 1 commit
  19. 12 Jun, 2015 1 commit
  20. 09 Jun, 2015 1 commit
  21. 12 Nov, 2014 2 commits
  22. 28 Aug, 2014 1 commit
  23. 01 Aug, 2014 1 commit
    • Simon Marlow's avatar
      interruptible() was not returning true for BlockedOnSTM (#9379) · 9d9a5546
      Simon Marlow authored
      Summary:
      There's an knock-on fix in HeapStackCheck.c which is potentially
      scary, but I'm pretty confident is OK.  See comment for details.
      
      Test Plan:
      I've run all the STM
      tests I can find, including libraries/stm/tests/stm049 with +RTS -N8
      and some of the constants bumped to make it more of a stress test.
      
      Reviewers: hvr, rwbarton, austin
      
      Subscribers: simonmar, relrod, ezyang, carter
      
      Differential Revision: https://phabricator.haskell.org/D104
      
      GHC Trac Issues: #9379
      9d9a5546
  24. 28 Jul, 2014 1 commit
    • Jost Berthold's avatar
      use GHC-7.8.3's values for thread block reason (fixes #9333) · 4ee8c273
      Jost Berthold authored
      Summary:
      For now, BlockedOnMVar and BlockedOnMVarRead are not distinguished.
      Making the distinction would mean to change an exported datatype
      (API change). Code for this change is included but commented out.
      
      The patch adds a test for the threadstatus, which retrieves status
      BlockedOnMVar for two threads blocked on writing and reading an MVar.
      
      Test Plan: ran validate, including the new test
      
      Reviewers: simonmar, austin, ezyang
      
      Reviewed By: austin, ezyang
      
      Subscribers: phaskell, simonmar, relrod, carter
      
      Differential Revision: https://phabricator.haskell.org/D83
      4ee8c273
  25. 30 Jun, 2014 1 commit
    • tibbe's avatar
      Re-add more primops for atomic ops on byte arrays · 4ee4ab01
      tibbe authored
      This is the second attempt to add this functionality. The first
      attempt was reverted in 950fcae4, due
      to register allocator failure on x86. Given how the register
      allocator currently works, we don't have enough registers on x86 to
      support cmpxchg using complicated addressing modes. Instead we fall
      back to a simpler addressing mode on x86.
      
      Adds the following primops:
      
       * atomicReadIntArray#
       * atomicWriteIntArray#
       * fetchSubIntArray#
       * fetchOrIntArray#
       * fetchXorIntArray#
       * fetchAndIntArray#
      
      Makes these pre-existing out-of-line primops inline:
      
       * fetchAddIntArray#
       * casIntArray#
      4ee4ab01
  26. 26 Jun, 2014 1 commit
  27. 24 Jun, 2014 1 commit
    • tibbe's avatar
      Add more primops for atomic ops on byte arrays · d8abf85f
      tibbe authored
      Summary:
      Add more primops for atomic ops on byte arrays
      
      Adds the following primops:
      
       * atomicReadIntArray#
       * atomicWriteIntArray#
       * fetchSubIntArray#
       * fetchOrIntArray#
       * fetchXorIntArray#
       * fetchAndIntArray#
      
      Makes these pre-existing out-of-line primops inline:
      
       * fetchAddIntArray#
       * casIntArray#
      d8abf85f
  28. 30 May, 2014 1 commit
  29. 04 May, 2014 1 commit
  30. 02 May, 2014 1 commit
    • Simon Marlow's avatar
      Per-thread allocation counters and limits · b0534f78
      Simon Marlow authored
      This tracks the amount of memory allocation by each thread in a
      counter stored in the TSO.  Optionally, when the counter drops below
      zero (it counts down), the thread can be sent an asynchronous
      exception: AllocationLimitExceeded.  When this happens, given a small
      additional limit so that it can handle the exception.  See
      documentation in GHC.Conc for more details.
      
      Allocation limits are similar to timeouts, but
      
        - timeouts use real time, not CPU time.  Allocation limits do not
          count anything while the thread is blocked or in foreign code.
      
        - timeouts don't re-trigger if the thread catches the exception,
          allocation limits do.
      
        - timeouts can catch non-allocating loops, if you use
          -fno-omit-yields.  This doesn't work for allocation limits.
      
      I couldn't measure any impact on benchmarks with these changes, even
      for nofib/smp.
      b0534f78
  31. 09 Oct, 2013 1 commit
  32. 02 Oct, 2013 1 commit
  33. 30 Sep, 2013 1 commit
    • rwbarton's avatar
      Deal with failures for T367, T367_letnoescape under ghci · 18f2895d
      rwbarton authored
      These tests had a very short timeout (0.3 s). With WAY=ghci,
      the time ghci takes to start up and compile the test modules
      is counted in this timeout, and that causes the tests to fail.
      
      T367 really needs the very short timeout, so this commit disables
      the ghci way for T367. T367_letnoescape can handle any timeout,
      so I bumped up the timeout to 6 s to give ghci time to start up.
      18f2895d
  34. 21 Aug, 2013 2 commits
  35. 24 Jul, 2013 1 commit