... | ... | @@ -29,6 +29,12 @@ Lightweight concurrency implementation resides in the `ghc-lwc` branch in the gi |
|
|
- [SCont Affinity](lightweight-concurrency#scont-affinity)
|
|
|
- [Bound SCont](lightweight-concurrency#bound-scont)
|
|
|
- [Scheduler Interaction with RTS](lightweight-concurrency#scheduler-interaction-with-rts)
|
|
|
|
|
|
- [Blocked Indefinitely](lightweight-concurrency#blocked-indefinitely)
|
|
|
|
|
|
- [Unreachable Concurrent Datastructure](lightweight-concurrency#unreachable-concurrent-datastructure)
|
|
|
- [Unreachable Scheduler](lightweight-concurrency#unreachable-scheduler)
|
|
|
- [SafeForeign Calls](lightweight-concurrency#safe-foreign-calls)
|
|
|
- [Related Work](lightweight-concurrency#related-work)
|
|
|
|
|
|
## Introduction
|
... | ... | @@ -40,7 +46,7 @@ GHC has a rich support for concurrency (forkIO, MVars, STM, Asynchronous excepti |
|
|
But, why would we be interested in modifying GHC's concurrency environment? There are several good reasons to believe that a particular concurrent programming model, or a scheduling policy would not suit every application. With the emergence of many-core processors, we see NUMA effects becoming more prominent, and applications might benefit from NUMA aware scheduling and load balancing policies. Moreover, an application might have a better knowledge of the scheduling requirements -- a thread involved in user-interaction is expected to be given more priority over threads performing background processing. We might want to experiment with various work-stealing or work-sharing policies. More ambitiously, we might choose to build X10 style async-finish or Cilk style spawn-sync task parallel abstractions. Ideally, we would like allow the programmer to write an application that can seamlessly combine all of these different programming abstractions, with pluggable scheduling and load balancing policies.
|
|
|
|
|
|
|
|
|
While we want to provide flexibility to the Haskell programmer, this should not come at a cost of added complexity and decreased performance. This idea reflects in the synchronization abstractions exposed to the programmer - [Primitive Transactional Memory(PTM)](lightweight-concurrency#ptm)), and our decision to keep certain pieces of the concurrency puzzle in the RTS ([Safe FFI](lightweight-concurrency#),[Blackholes](lightweight-concurrency#)). One would think lifting parts of the runtime system to Haskell, and retaining other parts in C, would complicate the interactions between the concurrency primitives and schedulers. We abstract the scheduler interface using PTM monads, which simplifies the interactions. The figure below captures the key design principles of the proposed system.
|
|
|
While we want to provide flexibility to the Haskell programmer, this should not come at a cost of added complexity and decreased performance. This idea reflects in the synchronization abstractions exposed to the programmer - [Primitive Transactional Memory(PTM)](lightweight-concurrency#ptm)), and our decision to keep certain pieces of the concurrency puzzle in the RTS ([Safe Foreign Calls](lightweight-concurrency#safe-foreign-calls),[Blackholes](lightweight-concurrency#)). One would think lifting parts of the runtime system to Haskell, and retaining other parts in C, would complicate the interactions between the concurrency primitives and schedulers. We abstract the scheduler interface using PTM monads, which simplifies the interactions. The figure below captures the key design principles of the proposed system.
|
|
|
|
|
|
[](/trac/ghc/attachment/wiki/LightweightConcurrency/GHC_LWC_Key.jpg)
|
|
|
|
... | ... | @@ -393,11 +399,47 @@ Next, we shall look at various RTS interaction with the user-level scheduler and |
|
|
#### Unreachable Concurrent Datastructure
|
|
|
|
|
|
|
|
|
The [ BlockedIndefinitelyOnMVar](http://hackage.haskell.org/packages/archive/base/latest/doc/html/Control-Exception-Base.html#t:BlockedIndefinitelyOnMVar) is raised on a thread that is blocked on an MVar, but the MVar has become unreachable. This is performed at the end of a garbage collection, just before resuming execution of the Haskell threads. In the vanilla RTS, after raising BlockedIndefinitelyOnMVar exception on a blocked thread, the thread is added back to the run queue. However, in the lightweight concurrency implementation, this is not so straightforward. In particular, *How do we know whether the SCont
|
|
|
*
|
|
|
The [ BlockedIndefinitelyOnMVar](http://hackage.haskell.org/packages/archive/base/latest/doc/html/Control-Exception-Base.html#t:BlockedIndefinitelyOnMVar) is raised on a thread that is blocked on an MVar, but the MVar has become unreachable. This is performed at the end of a garbage collection, just before resuming execution of the Haskell threads. In the vanilla RTS, after raising BlockedIndefinitelyOnMVar exception on a blocked thread, the thread is added back to the run queue. However, in the lightweight concurrency (LWC) implementation, this is not so straightforward. In particular,
|
|
|
|
|
|
- How do we know whether the SCont is blocked on a concurrent data structure in Haskell?
|
|
|
- How do we safely add the thread to the Haskell scheduler?
|
|
|
|
|
|
|
|
|
We know that any SCont blocked with status `SContSwitched BlockedInHaskell` is blocked on a concurrent data structure. For an SCont that is blocked on a concurrent data structure which has become unreachable, we raise `BlockedIndefinitelyOnConcDS` exception. Subsequently, we utilize the SCont's scheduleSContAction to put the SCont back into its corresponding scheduler. Importantly, since the scheduler actions are PTM actions, the necessary synchronization is taken care of by the PTM layer.
|
|
|
|
|
|
#### Unreachable Scheduler
|
|
|
|
|
|
|
|
|
Unlike the vanilla RTS, schedulers can become unreachable in the LWC implementation. Concurrency substrate allows the programmer to install finalizers with the following primitive.
|
|
|
|
|
|
```wiki
|
|
|
setFinalizer :: SCont -> IO () -> IO()
|
|
|
```
|
|
|
|
|
|
|
|
|
If an SCont is blocked with status `SContSwitched Yielded` has become unreachable, we run the SCont's finalizer, if installed.
|
|
|
|
|
|
## Safe Foreign Calls
|
|
|
|
|
|
|
|
|
A [ safe foreign call](http://community.haskell.org/~simonmar/papers/conc-ffi.pdf) does not impede the execution of other Haskell threads on the same scheduler, if the foreign call blocks. Before performing the foreign call, the task, say `T1`, releases the capability that it currently owns. This might wake up other tasks which are waiting to acquire a free capability. After the foreign call has completed, `T1` tries to reacquire the last owned capability. In the fast path, the foreign call quickly completes and `T1` reacquires the capability. In the slow path, some other task, say `T2`, acquires the capability.
|
|
|
|
|
|
### Slow-path in Vanilla RTS
|
|
|
|
|
|
|
|
|
In the vanilla RTS, `T2` will pick the next available thread from the current capability's run queue and resume execution. When `T1` eventually completes the foreign call, it tries to reacquire the capability. Thus, performing a safe-foreign call does not block all the threads on that capability.
|
|
|
|
|
|
### Slow-path in LWC RTS
|
|
|
|
|
|
|
|
|
The fast path in the LWC implementation is the same as vanilla implementation. However, in the slow path, we need a way for `T2` to resume the scheduler, and a way for `T1` to join the scheduler when the foreign call execution eventually completes. Assume that the Haskell thread that is running on the task `T1` is `t1`. We utilize the yieldContrlAction of `t1` to enable `T2` to resume execution of other threads on the scheduler. When `T1` eventually resumes execution after the foreign call, it finds that it has lost the race to acquire the capability to T2. At this point, `T1` executes `t1`'s scheduleSContAction to join the scheduler.
|
|
|
|
|
|
## Asynchronous Exceptions
|
|
|
|
|
|
## PTM retry
|
|
|
|
|
|
## Black-hole Handling
|
|
|
|
|
|
## Related Work
|
|
|
|
|
|
- [Concurrent Programming in GHC](lightweight-concurrency#)
|
... | ... | |