Skip to content
Snippets Groups Projects

Initial commit of Note [Thunks, blackholes, and indirections]

Closed Ben Gamari requested to merge wip/thunk-note into master
+ 341
0
@@ -12,6 +12,347 @@
#include "BeginPrivate.h"
#endif
/* Note [Thunks, blackholes, and indirections]
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Consider the following STG binding:
*
* thnk = {fv_0, fv_1} \u [] g x y;
*
* This binding is a updatable thunk carrying free variables `fv_0` and `fv_1`.
* Over its lifetime, this closure may transition through three states:
*
* 1. it starts life as a thunk, which carries with it free variables
* 2. if a thread enters it, it may turn into a blackhole
* 3. if evaluation finishes, it will be "updated", turning it into an
* indirection pointing to the result of the evaluation
*
* On the heap, state (1) is represented as a closure with the following layout
* (embodied in the C runtime by the `StgThunk` struct):
*
* thnk
* ┌───────────────────────┐ ╮
* │ blah_info │ │
* ├────────────┬──────────┤ │ StgThunkHeader
* │ indirectee │ NULL │ │
* ├────────────┼──────────┤ ╯
* │ payload[0] │ fv_0 │
* ├────────────┼──────────┤
* │ payload[1] │ fv_1 │
* └────────────┴──────────┘
*
* Here `blah_info` is a pointer to the thunk's info table, which will be of
* type THUNK. The `indirectee` field (also known as the "SMP header") is
* initially NULL and is unused while the closure remains a thunk. However, as
* we will see, it will eventually point to the result of the thunk's
* evaluation.
*
*
* Entry
* -----
* As usual, to enter `thnk` (step (2) of the lifetime) a mutator thread
* `tso_0` will jump to its entry code (which is recorded in or next to its
* info table, depending upon whether tables-next-to-code is enabled). This
* entry code will push an "update frame" (namely, either `stg_upd_frame` or
* `stg_bh_upd_frame`) to `tso_0`'s stack and begin the evaluation of the
* thunk's RHS.
*
* However, before commencing evaluation, the entry code may also mark the
* thunk as being under evaluation; this process is known a "blackholing".
* Whether this happens depends upon which of GHC's two blackholing strategies
* which was selected during the compilation of the defining module. We will
* discuss the simpler "eager blackholing" case first and later introduce the
* more-efficient "lazy blackholing" strategy.
*
*
* Eager blackholing
* -----------------
* Under the eager blackholing strategy (which is enabled via the
* `-feager-blackholing` flag), a thunk's entry code (generated by
* `GHC.StgToCmm.Bind.emitBlackHoleCode`) will immediately turn the thunk into
* a blackhole, indicating that the thunk is under evaluation. Additionally,
* the indirectee field will be updated to point to the thread performing the
* evaluation, `tso_0`. Since we know statically that the thunk is now a
* `BLACKHOLE`, the thunk entry code will push an `stg_bh_upd_frame` to the
* stack in this case (in contrast to the lazy strategy, as we will see later).
*
* After this, `thnk` will look like,
*
* thnk
* ┌───────────────────────┐
* │ EAGER_BLACKHOLE_info │
* ├────────────┬──────────┤ tso_0
* │ indirectee │ tso_0 │─────────►┌──────┐
* ├────────────┼──────────┤ │ ... │
* │ payload[0] │ fv_0 │ └──────┘
* ├────────────┼──────────┤
* │ payload[1] │ fv_1 │
* └────────────┴──────────┘
*
* Note that blackholing in this way does not guarantee mutual exclusion: two
* threads may indeed race to enter `thnk`. This will result in both threads
* performing evaluation and, in some cases, the blackhole being updated
* multiple times.
*
*
* Updating a thunk
* ----------------
* When `tso_0` finishes the evaluation of `thnk`, it will return to the entry
* code of the update frame pushed when the thunk was entered (e.g.
* `stg_bh_upd_frame`). This code first checks whether the blackhole
* has already been updated by another thread; if it has then `tso_0` will
* throw out its result and reuse that which the earlier thread recorded in the
* blackhole's `indirectee` field.
*
* If the blackhole has not yet been updated then `tso_0` will:
*
* 2A. set `thnk`'s `indirectee` field to point to the result of its
* evaluation, and
* 2B. set its info table to `BLACKHOLE_info`
*
* N.B. You might notice that step (2B) here appears to be redundant as we
* already set the info table pointer to `EAGER_BLACKHOLE_info` above. However,
* as we will see below, this may not be the case when lazy blackholing is in
* use.
*
* After these updates, we will have the following:
*
* thnk
* ┌───────────────────────┐
* │ BLACKHOLE_info │
* ├────────────┬──────────┤ result
* │ indirectee │ result │─────────►┌────────┐
* ├────────────┼──────────┤ │ ... │
* │ payload[0] │ fv_0 │ └────────┘
* ├────────────┼──────────┤
* │ payload[1] │ fv_1 │
* └────────────┴──────────┘
*
* In addition, the code will check the blocking queues that were added to the
* blackhole (recorded in the `indirectee` field, as we will see below) and
* wake them up (see `Threads.c:updateThunk`).
*
* Note that we are using `BLACKHOLE_info` to represent two distinct states of
* a thunk:
*
* - if the indirectee points to a `TSO` or `BLOCKING_QUEUE`, then the
* `BLACKHOLE` represents a thunk currently being evaluated
*
* - otherwise `BLACKHOLE` is merely representing an evaluated thunk and
* serves as an indirection to the final result.
*
* This overloading may seem odd given that we also have `stg_IND_info`, which
* also represents an indirection. However, the overloading serves a purpose:
* it means that safely updating a blackhole (step (3) of the lifetime above)
* requires only a single store (namely the store to the `indirectee` field).
*
* If we were to instead use `stg_IND` to represent the updated thunk, we would
* require two stores and consequently have an awkward period where the info
* table and indirectee fields are inconsistent:
*
* - if we were to update the info table first, there would be a period where
* the `indirectee` field pointed to the TSO which did the evaluation and
* not the result as one would expect.
*
* - if we were to update the indirectee first, there would be a period where
* the closure is still a `BLACKHOLE_info` yet the indirectee points to the
* evaluation result.
*
* For this reason, it is simpler to use `BLACKHOLE` to represent both states
* (2) and (3), distinguishing them using the identity of the indirectee. The
* uses of `stg_IND` are few and will be discussed below.
*
*
* Lazy blackholing
* ----------------
* While the strict blackholing strategy described above is simple and is
* faithful to the semantics of the STG machine, it is fairly costly on modern
* hardware. Specifically, thunk entry can be extremely common and in a
* parallel program blackholing may induce considerable pressure on the
* machine's memory subsystem.
*
* To mitigate this, GHC uses a lazy blackholing strategy by default. Here we
* take advantage of the fact that redundant evaluation of a thunk is
* acceptable, and defer blackholing until the thread returns to the scheduler.
* This is an optimisation, because we will often finish evaluation *before*
* yielding; in this case we avoid incurring the memory writes necessary to
* blackhole the thunk (step (2)), and rather update the thunk straight to an
* indirection.
*
* When entering a thunk compiled with lazy blackholing, we push an
* `stg_upd_frame` (not `stg_upd_bh_frame`) frame to the stack and do not
* modify the thunk closure itself.
*
* If the thread yields before finishing evaluation, the thunk will be turned
* into a `BLACKHOLE` in `ThreadPaused.c:threadPaused`. This function traverses
* the stack of the yielding thread looking for update frames; when such a
* frame is encountered, it checks the info table of the updatee and:
*
* - if it is `BLACKHOLE`, then the thunk has already been claimed for evaluation
* by another thread, and the yielding thread is instead added to the
* `BLACKHOLE`'s blocking queue (see Note [suspend duplicate work] in
* `ThreadPaused.c`).
*
* - if not, then it blackholes the thunk as done in eager blackholing (but
* using the `BLACKHOLE_info` info table instead of `EAGER_BLACKHOLE_info`).
*
* Update frames processed in this manner are rewritten to become
* `stg_marked_upd_frame`s. The stack traversal continues until a
* `stg_marked_upd_frame_info` frame is encountered, at which point we know
* that all subsequent frames have already been processed in a previous yield.
*
* The entry code of `stg_upd_frame` is considerably simpler than that of
* `stg_bh_upd_frame` since we know that the thunk has not accumulated any
* `BLOCKING_QUEUE`s in need of waking (since it was never blackhole'd). This
* is itself a small optimisation for the common case of uncontended thunk
* update. By contrast, the entry code of `stg_marked_upd_frame` is identical
* to that of `stg_bh_upd_frame` and must deal with waking of blocked threads.
*
* See `Note [suspend duplicate work]` in `ThreadPaused.c` for a subtle case
* involving the interaction between lazy and eager blackholing.
*
* See `Note [upd-black-hole]` in `Scav.c` for another subtle case.
*
*
* Blocking on a blackhole'd thunk
* -------------------------------
* If another thread `tso_1` enters `thnk` while it is blackholed by `tso_0`,
* the entry code of `BLACKHOLE` will allocate a `MSG_BLACKHOLE` object
* `msg_bh_0`. This message will be sent to the capability where the thread
* owning the thunk resides (see `Messages.c:messageBlackHole`). This
* capability will allocate a `BLOCKING_QUEUE` object `bq_0` recording the fact
* that `tso_1` is waiting for the result of `thnk`'s evaluation, and link it to
* `thnk` as follows:
*
* thnk
* ┌─►┌───────────────────────┐
* │ │ EAGER_BLACKHOLE_info │
* │ ├────────────┬──────────┤
* │ │ indirectee │ bq_0 ├──────┐
* │ ├────────────┼──────────┤ │
* │ │ payload[0] │ fv_0 │ │
* │ ├────────────┼──────────┤ │
* │ │ payload[1] │ fv_1 │ │
* │ └────────────┴──────────┘ │
* │ │ msg_bh_0
* │ ┌───────────────────────────┘ ┌──►┌───────────────────────────┐
* │ │ │ │ MSG_BLACKHOLE_info │
* │ │ │ ├───────────┬───────────────┤
* │ │ bq_0 │ │ link │ END_TSO_QUEUE │
* │ └─►┌──────────────────────┐ │ ├───────────┼───────────────┤
* │ │ BLOCKING_QUEUE_info │ │ │ result │ NULL │
* │ ├───────────┬──────────┤ │ ├───────────┼───────────────┤ tso_1
* │ │ link │ NULL │ │ │ tso │ tso_1 ├────►┌──────┐
* │ ├───────────┼──────────┤ │ └───────────┴───────────────┘ │ ... │
* │ │ queue │ msg_bh_0 ├────┘ └──────┘
* │ ├───────────┼──────────┤ tso_0
* │ │ owner │ tso_0 ├───────►┌──────┐
* │ ├───────────┼──────────┤ │ ... │
* │ │ bh │ thnk ├────┐ └──────┘
* │ └───────────┴──────────┘ │
* │ │
* └────────────────────────────────────┘
*
* Additionally, the `BLOCKING_QUEUE` is added to the `bq` list of the owning
* TSO, which collects all blocking queue objects which are blocked on thunks
* owned by the thread.
*
* In addition to this book-keeping, the `MSG_BLACKHOLE` message results in
* `tso_0` being promoted in its capability's run queue in the hope that
* `tso_1` and other blocked threads may be unblocked more quickly.
*
*
* Exception handling
* ------------------
* When an exception is thrown to a thread which is evaluating a thunk, it is
* important that we put things back into a state in which evaluation can
* be resumed by another thread. This is done by
* `RaiseAsync.c:raiseAsync` which walks the stack looking for update
* frames and rewrites the updatees into indirections pointing to an
* `AP_STACK` closure recording the aborted execution state.
* See `RaiseAsync.c:raiseAsync` for details.
*
*
* CAFs
* ----
* Top-level thunks (CAFs) reuse much of this machinery. The only differences
* are:
*
* - CAF entry ensures mutual exclusion (see `Note [atomic CAF entry]`
* in `Storage.c` for why)
*
* - we have a distinct blackhole type, `stg_CAF_BLACKHOLE_info`; it is not
* clear that maintaining this distinction from `stg_EAGER_BLACKHOLE_info`
* is strictly necessary.
*
* See `Note [CAF management]` in `Storage.c` .
*
*
* Memory ordering
* ---------------
* The memory orderings necessary for safe concurrent thunk evaluation
* are rather subtle and described in Note [Heap memory barriers] in `SMP.h`.
*
*
* The uses of `stg_IND`
* ---------------------
* As noted above, `stg_IND_info` is not used for thunk evaluation. Instead, it
* merely serves as a general-purpose indirection in a few miscellaneous cases:
*
* * it is used to "short-out" `BLOCKING_QUEUE`s and `MVAR_TSO_QUEUES` that have
* already been woken-up. See Note [BLACKHOLE pointing to IND] in `Evac.c`.
*
* * It is used to perform indirection of selector thunks (see
* `Evac.c:unchain_thunk_selectors`).
*
*
* Indirection shortcutting
* ------------------------
* Note that the garbage collector can "shortcut" both `IND` and
* `BLACKHOLE` indirections. That is, the heap:
*
* ref
* ┌──────────┐ evald_thunk
* │ ├────────►┌───────────────┬──────┐
* └──────────┘ │ stg_IND_info │ │
* ├───────────────┼──────┤ x
* │ indirectee │ ├────────►┌──────────────┐
* └───────────────┴──────┘ │ ... │
* │ │
* └──────────────┘
* Can be rewritten to:
*
* ref ┌────────────────────────────────────┐
* ┌──────────┐ │ evald_thunk │
* │ ├────┘ ┌───────────────┬──────┐ │
* └──────────┘ │ stg_IND_info │ │ │
* ├───────────────┼──────┤ ▼ x
* │ indirectee │ ├────────►┌──────────────┐
* └───────────────┴──────┘ │ ... │
* │ │
* └──────────────┘
*
*
* Selector optimisation
* ---------------------
* In addition to shortcutting indirections, the garbage collector can do a
* limited form of evaluation known as the "selector optimisation"
* [Wadler1987]. Specifically, the GC knows to rewrite a certain class of thunk
* (so-called "selector thunks", which are applications of selector functions
* like `fst`) into their result. For instance, given
*
* x = (a,b)
* y = fst x
*
* the GC can rewrite `y` into an indirection to `a`. References to `y` can
* then be further shortcutted via indirection shortcutting as described above.
*
*
* [Wadler1987]:
* Wadler, P. (1987), Fixing some space leaks with a garbage collector.
* Softw: Pract. Exper., 17: 595-608. https://doi.org/10.1002/spe.4380170904
*/
/* -----------------------------------------------------------------------------
Updates
-------------------------------------------------------------------------- */
Loading