Ben Gamari · 1fe58f02
--- a/lightweight-concurrency.md
+++ b/lightweight-concurrency.md
@@ -17,7 +17,7 @@ But, why would we be interested in modifying GHC's concurrency environment? Ther

 While we want to provide flexibility to the Haskell programmer, this should not come at a cost of added complexity and decreased performance. This idea reflects in the synchronization abstractions exposed to the programmer - [Primitive Transactional Memory(PTM)](lightweight-concurrency#ptm)), and our decision to keep certain pieces of the concurrency puzzle in the RTS ([Safe Foreign Calls](lightweight-concurrency#safe-foreign-calls),[Blackholes](lightweight-concurrency#)). One would think lifting parts of the runtime system to Haskell, and retaining other parts in C, would complicate the interactions between the concurrency primitives and schedulers. We abstract the scheduler interface using PTM monads, which simplifies the interactions. The figure below captures the key design principles of the proposed system.

-[](/trac/ghc/attachment/wiki/LightweightConcurrency/GHC_LWC_Key.jpg)
+![](LightweightConcurrency/GHC_LWC_Key.jpg)


 Although implementing concurrency primitives as a library is hardly a novel idea, the aim of this work is to bring it to the GHC programmer, without having to give up any of the existing concurrency features in return.
@@ -37,7 +37,7 @@ The idea of the concurrency substrate is to provide a minimal set of primitives

 The only synchronization mechanism exposed by the concurrency substrate is a primitive transactional memory (PTM). Locks and condition variables can be notoriously difficult to use, especially in an environment with user-level threads, where they tend to interfere with the scheduler. Moreover, they do not compose elegantly with lazy evaluation. PTM interface is shown below:

-```wiki
+```haskell
 data PTM a
 data PVar a
 instance Monad PTM
@@ -57,7 +57,7 @@ A PTM transaction may allocate, read and write transactional variables of type `

 The concurrency substrate enables creation and scheduling of I/O-performing computations through *one-shot continuations*. An SCont (stack continuation) is a suspended I/O computation, which is in fact just a reference to a TSO object. Capturing the current continuation is just getting a reference to the current TSO, and hence is very fast. In the following discussion, we use SConts when referring to the Haskell object and threads when referring to its corresponding RTS version. SCont interface is shown below:

-```wiki
+```haskell
 data SCont
 newSCont         :: IO () -> IO SCont
 switch           :: (SCont -> PTM SCont) -> IO ()
@@ -76,7 +76,7 @@ Performing the body of switch atomically in a transaction avoids the nasty race

 Since switchTo eagerly commits the transaction, the code that follows switchTo is not evaluated. This is a problem if the transaction that contains switchTo has a type different than PTM (). Consider the following code:

-```wiki
+```haskell
 s :: String <- atomically $ do {
  sc <- getCurrentSCont;
  -- save the current SCont somewhere
@@ -94,7 +94,7 @@ The type of the transaction that contains switchTo is PTM string, and atomically

 Of course, care must be taken to ensure that the control does not switch to an SCont that is either running, blocked on an MVar, or completed. But how do we know whether the given SCont is ready to run? We expect the scheduler writer or library implementer to indicate the status of SCont before switching. SCont status API is show below.

-```wiki
+```haskell
 data ResumeToken

 data SContStatus = SContRunning | 
@@ -130,7 +130,7 @@ Resume tokens are utilized for supporting asynchronous exceptions. Resume tokens

 SCont-local storage (SLS) provides a solution for associating arbitrary state with an SCont. Each SCont has a single slot with type [Dynamic](http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Dynamic.html). SLS interface is give below:

-```wiki
+```haskell
 setSLS :: SCont -> Dynamic -> IO ()
 getSLS :: SCont -> PTM Dynamic
 ```
@@ -142,7 +142,7 @@ getSLS :: SCont -> PTM Dynamic

 Concurrency substrate does not impose any structure on the user-level schedulers. The programmer might choose to implement a single scheduler for the entire system or a scheduler per capability. The schedulers might also be hierarchical, with pluggable load-balancing policies. However, we need a uniform interface such that the concurrency libraries, STM, asynchronous exceptions, safe-foreign calls, blackholes and other such mechanisms can interact with the user-level scheduler. For this purpose, we introduce the notion of `scheduler actions`, which is expected for every SCont. The substrate interface for scheduler actions is shown below:

-```wiki
+```haskell
 ------ Schedule SCont Action :: SCont -> PTM () ------

 getScheduleSContAction :: PTM (SCont -> PTM ())
@@ -157,7 +157,7 @@ setYieldControlAction :: SCont -> PTM () -> PTM ()

 Abstractly, given an SCont, the scheduleSContAction appends the SCont to a scheduler. The yieldControlAction picks an SCont from a scheduler and switches to it. The `get*` functions will fetch the scheduler actions of the current SCont. In order to make the ideas more concrete, let us assume that we have a very simple round-robin scheduler, implemented as a `PVar[SCont]`. One possible implementation of scheduler actions for this scheduler is given below.

-```wiki
+```haskell
 scheduleSContAction :: SCont -> PTM () 
 scheduleSContAction sc = do
  sched :: PVar [SCont] <- -- get sched 
@@ -191,7 +191,7 @@ Now that we have defined an abstract scheduler interface, lets look at how to co

 Since our first goal is to implement GHC's concurrency support in Haskell, let us start with`forkIO` and `yield`. These two primitives are sufficient for a simple cooperatively scheduled lightweight thread system. In order to make the presentation cleaner, assume that we have the following helper functions.

-```wiki
+```haskell
 getSSA = getScheduleSContAction
 setSSA = setScheduleSContAction
 getYCA = getYieldControlAction
@@ -201,7 +201,7 @@ setYCA = setYieldControlAction

 Primitive `yield` appends the current SCont to the scheduler, picks the next SCont from the scheduler and switches to it. We utilize the scheduler actions to achieve this. The implementation of yield is shown below. It is important to notice that yield does not assume anything about the implementation of the scheduler except for the scheduler actions. Hence, there is no need to re-implement yield primitive for every scheduler, thus minimizing the overhead of implementing new schedulers.

-```wiki
+```haskell
 yield :: IO ()
 yield = atomically $ do
  -- Append current SCont to scheduler
@@ -215,7 +215,7 @@ yield = atomically $ do

 Primitive `forkIO` also follows the strategy of utilizing scheduler actions. The implementation of forkIO is shown below. 

-```wiki
+```haskell
 forkIO :: IO () -> IO SCont
 forkIO f = do
  -- Switch to next thread after completion
@@ -248,7 +248,7 @@ A full implementation of a round-robin scheduler can be found [here](https://git

 [MVars](http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Concurrent-MVar.html) are one of the basic synchronization mechanisms exposed by GHC's concurrency library. A simple user-level implementation of MVar might look like:

-```wiki
+```haskell
 newtype MVar a = MVar (PVar (State a))
 data State a = Full a [(a, PTM ())] | Empty [(PVar a, PTM ())]
 ```
@@ -256,7 +256,7 @@ data State a = Full a [(a, PTM ())] | Empty [(PVar a, PTM ())]

 MVar is either empty with a list of pending takers, or full with a value and a list of pending putters. The PTM () action in the full and empty list represents the logic necessary for waking up the pending putters and takers. The following snippet shows the implementation of `takeMVar`.

-```wiki
+```haskell
 takeMVar :: MVar a -> IO a 
 takeMVar (MVar ref) = do
  hole <- atomically $ newPVar undefined 
@@ -300,7 +300,7 @@ We retain the task model of the current runtime system. There is a one-to-one ma

 A program always boots up on 1 core running the `main` function. Additional capabilities can be created on demand using the following primitives.

-```wiki
+```haskell
 getNumCapabilities   :: IO Int
 getCurrentCapability :: PTM Int
 newCapability :: SCont -> IO ()
@@ -309,7 +309,7 @@ newCapability :: SCont -> IO ()

 Primitive `newCapability` runs the given SCont on a free capability. If there are no free capabilities, a runtime error is raised. A typical, initial task spawned on another core would pull work from the scheduler and switch to it. For example,

-```wiki
+```haskell
 initialTask :: IO ()
 initialTask = atomically $ do
  s <- getCurrentSCont
@@ -326,14 +326,14 @@ When a program boots up with `N` capabilities, the programmer can choose to crea

 In a multicore setting, runnable SConts might not always be available on a capability. In this case, the capability must wait for SConts to be added to the scheduler. In our system, this must be handled under `yieldControlAction`, where it is expected that the control switches to another SCont. The concurrency substrate provides 

-```wiki
+```haskell
 sleepCapability :: PTM ()
 ```


 primitive that aborts the current transaction and blocks the current capability. The capability is implicitly woken up when one of the PVars that it has read from has been updated. Then, the original transaction is re-executed. Under yieldControlAction, one of the PVars read before sleeping will be the scheduler data structure. Hence, the capability is woken up when the scheduler data structure is updated. The complete implementation of yieldControlAction example introduced [earlier](lightweight-concurrency#abstracting-the-scheduler) is given below.

-```wiki
+```haskell
 yieldControlAction :: PTM () 
 yieldControlAction = do
  sched :: PVar [SCont] <- -- get sched 
@@ -351,7 +351,7 @@ yieldControlAction = do

 Every SCont is bound to a particular capability and only that capability is capable of running the SCont. Switching to an SCont that is not bound to the current capability raises a runtime error. SCont affinity interface is shown below.

-```wiki
+```haskell
 setSContCapability   :: SCont -> Int -> IO ()
 getSContCapability   :: SCont -> PTM Int
 ```
@@ -364,7 +364,7 @@ A newly created SCont is bound to the current capability. Primitive `setSContCap

 Similar to [bound threads](http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Concurrent.html#g:9) concurrency substrate supports bound SConts. The interface is shown below.

-```wiki
+```haskell
 newBoundSCont          :: IO () -> IO SCont
 isCurrentSContBound    :: IO Bool
 rtsSupportsBoundSConts :: Bool
@@ -405,7 +405,7 @@ We know that any SCont blocked with status `SContSwitched BlockedInHaskell t` is

 Unlike the vanilla RTS, schedulers can become unreachable in the LWC implementation. Concurrency substrate allows the programmer to install finalizers with the following primitive.

-```wiki
+```haskell
 setFinalizer :: SCont -> IO () -> IO()
 ```

@@ -482,7 +482,7 @@ Every SCont has a top-level exception handler, which catches all exceptions and

 The substrate exposes 

-```wiki
+```haskell
 throwTo :: Exception e => SCont -> e -> IO ()
 ```

@@ -498,7 +498,7 @@ For blocking actions in the RTS, such as STM, and blackholes, RTS knows how to r

 Alternatively, instead of eagerly removing the SCont from the user-level blocking queue, we can defer it until the SCont is about to be unblocked from the blocking queue. In this case, on receiving the asynchronous exception, we will raise the exception on the SCont, eagerly append it to the scheduler, and mark the blocked action as invalid. The invalidation is achieved through resume tokens.

-```wiki
+```haskell
 data ResumeToken

 newResumeToken     :: PTM ResumeToken
@@ -510,7 +510,7 @@ data SContSwitchReason = BlockedInHaskell ResumeToken | ...

 Primitive `newResumeToken` allocates a new, valid resume token. The validity of a resume token can be queried using the primitive `isResumeTokenValid`. Whenever an SCont blocks on a user-level data structure (i.e. updating switch reason to `BlockedInHaskell`), it is expected that it is provided a new, valid resume token. If an asynchronous exception is raised on this blocked SCont, the resume token is transparently invalidated. Eventually, when the SCont is about to be unblocked from the concurrent data-structure, the resume token can be queried for validity. If the resume token is invalid, then the blocked SCont has been resumed already and hence it should not be resumed again. The following snippet shows the implementation of `takeMVar` primitive that can tolerate asynchronous exceptions. The only change is to the `wakeup` function.

-```wiki
+```haskell
 takeMVar :: MVar a -> IO a 
 takeMVar (MVar ref) = do
  hole <- atomically $ newPVar undefined