Ben Gamari · 6567852f · d34a358a · 6936926f · cd48959f · 6567852f
--- a/rts/Updates.h

+ 341

− 0
+++ b/rts/Updates.h

+ 341

− 0
 @@ -12,6 +12,347 @@
 #include "BeginPrivate.h"
 #endif

+/* Note [Thunks, blackholes, and indirections]
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ * Consider the following STG binding:
+ *
+ *     thnk = {fv_0, fv_1} \u [] g x y;
+ *
+ * This binding is a updatable thunk carrying free variables `fv_0` and `fv_1`.
+ * Over its lifetime, this closure may transition through three states:
+ *
+ *  1. it starts life as a thunk, which carries with it free variables
+ *  2. if a thread enters it, it may turn into a blackhole
+ *  3. if evaluation finishes, it will be "updated", turning it into an
+ *     indirection pointing to the result of the evaluation
+ *
+ * On the heap, state (1) is represented as a closure with the following layout
+ * (embodied in the C runtime by the `StgThunk` struct):
+ *
+ *      thnk
+ *     ┌───────────────────────┐  ╮
+ *     │ blah_info             │  │
+ *     ├────────────┬──────────┤  │ StgThunkHeader
+ *     │ indirectee │ NULL     │  │
+ *     ├────────────┼──────────┤  ╯
+ *     │ payload[0] │ fv_0     │
+ *     ├────────────┼──────────┤
+ *     │ payload[1] │ fv_1     │
+ *     └────────────┴──────────┘
+ *
+ * Here `blah_info` is a pointer to the thunk's info table, which will be of
+ * type THUNK. The `indirectee` field (also known as the "SMP header") is
+ * initially NULL and is unused while the closure remains a thunk. However, as
+ * we will see, it will eventually point to the result of the thunk's
+ * evaluation.
+ *
+ *
+ * Entry
+ * -----
+ * As usual, to enter `thnk` (step (2) of the lifetime) a mutator thread
+ * `tso_0` will jump to its entry code (which is recorded in or next to its
+ * info table, depending upon whether tables-next-to-code is enabled). This
+ * entry code will push an "update frame" (namely, either `stg_upd_frame` or
+ * `stg_bh_upd_frame`) to `tso_0`'s stack and begin the evaluation of the
+ * thunk's RHS.
+ *
+ * However, before commencing evaluation, the entry code may also mark the
+ * thunk as being under evaluation; this process is known a "blackholing".
+ * Whether this happens depends upon which of GHC's two blackholing strategies
+ * which was selected during the compilation of the defining module. We will
+ * discuss the simpler "eager blackholing" case first and later introduce the
+ * more-efficient "lazy blackholing" strategy.
+ *
+ *
+ * Eager blackholing
+ * -----------------
+ * Under the eager blackholing strategy (which is enabled via the
+ * `-feager-blackholing` flag), a thunk's entry code (generated by
+ * `GHC.StgToCmm.Bind.emitBlackHoleCode`) will immediately turn the thunk into
+ * a blackhole, indicating that the thunk is under evaluation. Additionally,
+ * the indirectee field will be updated to point to the thread performing the
+ * evaluation, `tso_0`. Since we know statically that the thunk is now a
+ * `BLACKHOLE`, the thunk entry code will push an `stg_bh_upd_frame` to the
+ * stack in this case (in contrast to the lazy strategy, as we will see later).
+ *
+ * After this, `thnk` will look like,
+ *
+ *      thnk
+ *     ┌───────────────────────┐
+ *     │ EAGER_BLACKHOLE_info  │
+ *     ├────────────┬──────────┤           tso_0
+ *     │ indirectee │ tso_0    │─────────►┌──────┐
+ *     ├────────────┼──────────┤          │ ...  │
+ *     │ payload[0] │ fv_0     │          └──────┘
+ *     ├────────────┼──────────┤
+ *     │ payload[1] │ fv_1     │
+ *     └────────────┴──────────┘
+ *
+ * Note that blackholing in this way does not guarantee mutual exclusion: two
+ * threads may indeed race to enter `thnk`. This will result in both threads
+ * performing evaluation and, in some cases, the blackhole being updated
+ * multiple times.
+ *
+ *
+ * Updating a thunk
+ * ----------------
+ * When `tso_0` finishes the evaluation of `thnk`, it will return to the entry
+ * code of the update frame pushed when the thunk was entered (e.g.
+ * `stg_bh_upd_frame`). This code first checks whether the blackhole
+ * has already been updated by another thread; if it has then `tso_0` will
+ * throw out its result and reuse that which the earlier thread recorded in the
+ * blackhole's `indirectee` field.
+ *
+ * If the blackhole has not yet been updated then `tso_0` will:
+ *
+ *  2A. set `thnk`'s `indirectee` field to point to the result of its
+ *     evaluation, and
+ *  2B. set its info table to `BLACKHOLE_info`
+ *
+ * N.B. You might notice that step (2B) here appears to be redundant as we
+ * already set the info table pointer to `EAGER_BLACKHOLE_info` above. However,
+ * as we will see below, this may not be the case when lazy blackholing is in
+ * use.
+ *
+ * After these updates, we will have the following:
+ *
+ *      thnk
+ *     ┌───────────────────────┐
+ *     │ BLACKHOLE_info        │
+ *     ├────────────┬──────────┤           result
+ *     │ indirectee │ result   │─────────►┌────────┐
+ *     ├────────────┼──────────┤          │ ...    │
+ *     │ payload[0] │ fv_0     │          └────────┘
+ *     ├────────────┼──────────┤
+ *     │ payload[1] │ fv_1     │
+ *     └────────────┴──────────┘
+ *
+ * In addition, the code will check the blocking queues that were added to the
+ * blackhole (recorded in the `indirectee` field, as we will see below) and
+ * wake them up (see `Threads.c:updateThunk`).
+ *
+ * Note that we are using `BLACKHOLE_info` to represent two distinct states of
+ * a thunk:
+ *
+ *  - if the indirectee points to a `TSO` or `BLOCKING_QUEUE`, then the
+ *    `BLACKHOLE` represents a thunk currently being evaluated
+ *
+ *  - otherwise `BLACKHOLE` is merely representing an evaluated thunk and
+ *    serves as an indirection to the final result.
+ *
+ * This overloading may seem odd given that we also have `stg_IND_info`, which
+ * also represents an indirection.  However, the overloading serves a purpose:
+ * it means that safely updating a blackhole (step (3) of the lifetime above)
+ * requires only a single store (namely the store to the `indirectee` field).
+ *
+ * If we were to instead use `stg_IND` to represent the updated thunk, we would
+ * require two stores and consequently have an awkward period where the info
+ * table and indirectee fields are inconsistent:
+ *
+ *  - if we were to update the info table first, there would be a period where
+ *    the `indirectee` field pointed to the TSO which did the evaluation and
+ *    not the result as one would expect.
+ *
+ *  - if we were to update the indirectee first, there would be a period where
+ *    the closure is still a `BLACKHOLE_info` yet the indirectee points to the
+ *    evaluation result.
+ *
+ * For this reason, it is simpler to use `BLACKHOLE` to represent both states
+ * (2) and (3), distinguishing them using the identity of the indirectee. The
+ * uses of `stg_IND` are few and will be discussed below.
+ *
+ *
+ * Lazy blackholing
+ * ----------------
+ * While the strict blackholing strategy described above is simple and is
+ * faithful to the semantics of the STG machine, it is fairly costly on modern
+ * hardware. Specifically, thunk entry can be extremely common and in a
+ * parallel program blackholing may induce considerable pressure on the
+ * machine's memory subsystem.
+ *
+ * To mitigate this, GHC uses a lazy blackholing strategy by default. Here we
+ * take advantage of the fact that redundant evaluation of a thunk is
+ * acceptable, and defer blackholing until the thread returns to the scheduler.
+ * This is an optimisation, because we will often finish evaluation *before*
+ * yielding; in this case we avoid incurring the memory writes necessary to
+ * blackhole the thunk (step (2)), and rather update the thunk straight to an
+ * indirection.
+ *
+ * When entering a thunk compiled with lazy blackholing, we push an
+ * `stg_upd_frame` (not `stg_upd_bh_frame`) frame to the stack and do not
+ * modify the thunk closure itself.
+ *
+ * If the thread yields before finishing evaluation, the thunk will be turned
+ * into a `BLACKHOLE` in `ThreadPaused.c:threadPaused`. This function traverses
+ * the stack of the yielding thread looking for update frames; when such a
+ * frame is encountered, it checks the info table of the updatee and:
+ *
+ *  - if it is `BLACKHOLE`, then the thunk has already been claimed for evaluation
+ *    by another thread, and the yielding thread is instead added to the
+ *    `BLACKHOLE`'s blocking queue (see Note [suspend duplicate work] in
+ *    `ThreadPaused.c`).
+ *
+ *  - if not, then it blackholes the thunk as done in eager blackholing (but
+ *    using the `BLACKHOLE_info` info table instead of `EAGER_BLACKHOLE_info`).
+ *
+ * Update frames processed in this manner are rewritten to become
+ * `stg_marked_upd_frame`s. The stack traversal continues until a
+ * `stg_marked_upd_frame_info` frame is encountered, at which point we know
+ * that all subsequent frames have already been processed in a previous yield.
+ *
+ * The entry code of `stg_upd_frame` is considerably simpler than that of
+ * `stg_bh_upd_frame` since we know that the thunk has not accumulated any
+ * `BLOCKING_QUEUE`s in need of waking (since it was never blackhole'd). This
+ * is itself a small optimisation for the common case of uncontended thunk
+ * update. By contrast, the entry code of `stg_marked_upd_frame` is identical
+ * to that of `stg_bh_upd_frame` and must deal with waking of blocked threads.
+ *
+ * See `Note [suspend duplicate work]` in `ThreadPaused.c` for a subtle case
+ * involving the interaction between lazy and eager blackholing.
+ *
+ * See `Note [upd-black-hole]` in `Scav.c` for another subtle case.
+ *
+ *
+ * Blocking on a blackhole'd thunk
+ * -------------------------------
+ * If another thread `tso_1` enters `thnk` while it is blackholed by `tso_0`,
+ * the entry code of `BLACKHOLE` will allocate a `MSG_BLACKHOLE` object
+ * `msg_bh_0`. This message will be sent to the capability where the thread
+ * owning the thunk resides (see `Messages.c:messageBlackHole`). This
+ * capability will allocate a `BLOCKING_QUEUE` object `bq_0` recording the fact
+ * that `tso_1` is waiting for the result of `thnk`'s evaluation, and link it to
+ * `thnk` as follows:
+ *
+ *         thnk
+ *     ┌─►┌───────────────────────┐
+ *     │  │ EAGER_BLACKHOLE_info  │
+ *     │  ├────────────┬──────────┤
+ *     │  │ indirectee │ bq_0     ├──────┐
+ *     │  ├────────────┼──────────┤      │
+ *     │  │ payload[0] │ fv_0     │      │
+ *     │  ├────────────┼──────────┤      │
+ *     │  │ payload[1] │ fv_1     │      │
+ *     │  └────────────┴──────────┘      │
+ *     │                                 │       msg_bh_0
+ *     │     ┌───────────────────────────┘  ┌──►┌───────────────────────────┐
+ *     │     │                              │   │ MSG_BLACKHOLE_info        │
+ *     │     │                              │   ├───────────┬───────────────┤
+ *     │     │   bq_0                       │   │ link      │ END_TSO_QUEUE │
+ *     │     └─►┌──────────────────────┐    │   ├───────────┼───────────────┤
+ *     │        │ BLOCKING_QUEUE_info  │    │   │ result    │ NULL          │
+ *     │        ├───────────┬──────────┤    │   ├───────────┼───────────────┤      tso_1
+ *     │        │ link      │ NULL     │    │   │ tso       │ tso_1         ├────►┌──────┐
+ *     │        ├───────────┼──────────┤    │   └───────────┴───────────────┘     │ ...  │
+ *     │        │ queue     │ msg_bh_0 ├────┘                                     └──────┘
+ *     │        ├───────────┼──────────┤         tso_0
+ *     │        │ owner     │ tso_0    ├───────►┌──────┐
+ *     │        ├───────────┼──────────┤        │ ...  │
+ *     │        │ bh        │ thnk     ├────┐   └──────┘
+ *     │        └───────────┴──────────┘    │
+ *     │                                    │
+ *     └────────────────────────────────────┘
+ *
+ * Additionally, the `BLOCKING_QUEUE` is added to the `bq` list of the owning
+ * TSO, which collects all blocking queue objects which are blocked on thunks
+ * owned by the thread.
+ *
+ * In addition to this book-keeping, the `MSG_BLACKHOLE` message results in
+ * `tso_0` being promoted in its capability's run queue in the hope that
+ * `tso_1` and other blocked threads may be unblocked more quickly.
+ *
+ *
+ * Exception handling
+ * ------------------
+ * When an exception is thrown to a thread which is evaluating a thunk, it is
+ * important that we put things back into a state in which evaluation can
+ * be resumed by another thread. This is done by
+ * `RaiseAsync.c:raiseAsync` which walks the stack looking for update
+ * frames and rewrites the updatees into indirections pointing to an
+ * `AP_STACK` closure recording the aborted execution state.
+ * See `RaiseAsync.c:raiseAsync` for details.
+ *
+ *
+ * CAFs
+ * ----
+ * Top-level thunks (CAFs) reuse much of this machinery. The only differences
+ * are:
+ *
+ *  - CAF entry ensures mutual exclusion (see `Note [atomic CAF entry]`
+ *    in `Storage.c` for why)
+ *
+ *  - we have a distinct blackhole type, `stg_CAF_BLACKHOLE_info`; it is not
+ *    clear that maintaining this distinction from `stg_EAGER_BLACKHOLE_info`
+ *    is strictly necessary.
+ *
+ * See `Note [CAF management]` in `Storage.c` .
+ *
+ *
+ * Memory ordering
+ * ---------------
+ * The memory orderings necessary for safe concurrent thunk evaluation
+ * are rather subtle and described in Note [Heap memory barriers] in `SMP.h`.
+ *
+ *
+ * The uses of `stg_IND`
+ * ---------------------
+ * As noted above, `stg_IND_info` is not used for thunk evaluation. Instead, it
+ * merely serves as a general-purpose indirection in a few miscellaneous cases:
+ *
+ *  * it is used to "short-out" `BLOCKING_QUEUE`s and `MVAR_TSO_QUEUES` that have
+ *    already been woken-up. See Note [BLACKHOLE pointing to IND] in `Evac.c`.
+ *
+ *  * It is used to perform indirection of selector thunks (see
+ *    `Evac.c:unchain_thunk_selectors`).
+ *
+ *
+ * Indirection shortcutting
+ * ------------------------
+ * Note that the garbage collector can "shortcut" both `IND` and
+ * `BLACKHOLE` indirections. That is, the heap:
+ *
+ *  ref
+ * ┌──────────┐          evald_thunk
+ * │          ├────────►┌───────────────┬──────┐
+ * └──────────┘         │ stg_IND_info  │      │
+ *                      ├───────────────┼──────┤          x
+ *                      │ indirectee    │      ├────────►┌──────────────┐
+ *                      └───────────────┴──────┘         │ ...          │
+ *                                                       │              │
+ *                                                       └──────────────┘
+ * Can be rewritten to:
+ *
+ *  ref            ┌────────────────────────────────────┐
+ * ┌──────────┐    │     evald_thunk                    │
+ * │          ├────┘    ┌───────────────┬──────┐        │
+ * └──────────┘         │ stg_IND_info  │      │        │
+ *                      ├───────────────┼──────┤        ▼ x
+ *                      │ indirectee    │      ├────────►┌──────────────┐
+ *                      └───────────────┴──────┘         │ ...          │
+ *                                                       │              │
+ *                                                       └──────────────┘
+ *
+ *
+ * Selector optimisation
+ * ---------------------
+ * In addition to shortcutting indirections, the garbage collector can do a
+ * limited form of evaluation known as the "selector optimisation"
+ * [Wadler1987]. Specifically, the GC knows to rewrite a certain class of thunk
+ * (so-called "selector thunks", which are applications of selector functions
+ * like `fst`) into their result. For instance, given
+ *
+ *     x = (a,b)
+ *     y = fst x
+ *
+ * the GC can rewrite `y` into an indirection to `a`. References to `y` can
+ * then be further shortcutted via indirection shortcutting as described above.
+ *
+ *
+ * [Wadler1987]:
+ *   Wadler, P. (1987), Fixing some space leaks with a garbage collector.
+ *   Softw: Pract. Exper., 17: 595-608. https://doi.org/10.1002/spe.4380170904
+ */
+
+
 /* -----------------------------------------------------------------------------
   Updates
   -------------------------------------------------------------------------- */