Commit c1c09854 authored by Ben Gamari's avatar Ben Gamari Committed by Ben Gamari

Eagerly blackhole AP_STACKs

This fixes #13615. See the rather lengthy Note [AP_STACKs must be
eagerly blackholed] for details.

Reviewers: simonmar, austin, erikd, dfeuer

Subscribers: duog, dfeuer, hsyl20, rwbarton, thomie

GHC Trac Issues: #13615

Differential Revision: https://phabricator.haskell.org/D3695

(cherry picked from commit fd7a7a63)
parent d94aebd3
......@@ -454,6 +454,172 @@ for:
directly to a function, so we have to enter it using stg_ap_0.
-------------------------------------------------------------------------- */
/*
Note [AP_STACKs must be eagerly blackholed]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Trac #13615 describes a nasty concurrency issue where we can enter into the
middle of an ST action multiple times, resulting in duplication of effects.
In short, the construction of an AP_STACK allows us to suspend a computation
which should not be duplicated. When running with lazy blackholing, we can then
enter this AP_STACK multiple times, duplicating the computation with potentially
disasterous consequences.
For instance, consider the case of a simple ST program which computes a sum
using in─place mutation,
inplaceSum :: Num a => [a] ─> a
inplaceSum xs0 = runST $ do
y <─ newSTRef 0
let go [] = readSTRef y
go (x : xs) = do
modifySTRef y (+x)
go xs
go xs0
Of course, it is fine if we enter an inplaceSum thunk more than once: the two
threads will inhabit different worlds with different STRefs. However, if we
suspend some part of inplaceSum (for instance, due to the heap check at the
beginning of go) and then multiple threads resume that suspension (as is safe in
pure computation) we will have multiple threads concurrently mutating the same
STRef. Disaster!
Let's consider how this might happen: Consider this situation,
┌─────────┐ ┌───────┐ ┌───────┐ ┌─────────┐
│ TSO 1 │ ╭────→│ go │ │ fun │ │ TSO 2 │
└─────────┘ │ └───────┘ └───────┘ └─────────┘
│ │
┌─────────┐ │ │ ┌─────────┐
│ │──────╯ ╰──────────────│ │
├─────────┤ ┌─────────┐ ├─────────┤
│ UPDATE_ │──────────→│ THUNK A │ ╭────────│ UPDATE_ │
│ FRAME │ updatee └─────────┘ │updatee │ FRAME │
├─────────┤ │ ├─────────┤
│ ... │ │ │ etc. │
├─────────┤ updatee ┌─────────┐ │
│ UPDATE_ │─────────────────────→│ THUNK B │←───╯
│ FRAME │ └─────────┘
├─────────┤
│ etc. │
Here we have two threads (TSO 1 and TSO 2) which are in currently pausing (e.g.
in threadPaused). Since they are pausing, their stacks are headed by a pointer
to the continuation code which we will run on resumption (go and fun,
respectively). We also see that there are two thunks on the heap: THUNK A and
THUNK B where THUNK A is reachable from THUNK B (for instance, as an argument or
free variable). We see that thread 1 has THUNK A under evaluation, and both
threads have THUNK B under evaluation.
As each thread enters threadPaused, threadPaused will walk its stack looking for
duplicate computation (see Note [suspend duplicate work], although there is some
background described below as well). Let's consider what this check does:
Say that TSO 2 begins this check first. The check will walk TSO 2's stack, until
it finds the first update frame, which updates THUNK B. Upon finding this frame,
it will try to lock THUNK B, replacing it with a BLACKHOLE owned by its TSO. We
now have,
┌─────────┐ ┌───────┐ ┌───────┐ ┌─────────┐
│ TSO 1 │ ╭────→│ go │ │ fun │ ╭────────→│ TSO 2 │
└─────────┘ │ └───────┘ └───────┘ │ └─────────┘
│ ↑ ╭─────╯
┌─────────┐ │ │ │ ┌─────────┐
│ │──────╯ ╰─────────────────│ │
├─────────┤ updatee ┌─────────┐ │ ├─────────┤
│ UPDATE_ │──────────→│ THUNK A │ │ ╭──────────│ UPDATE_ │
│ FRAME │ └─────────┘ │ │ updatee │ FRAME │
├─────────┤ │ │ ├─────────┤
│ ... │ owner│ │ │ etc. │
├─────────┤ updatee ┌────────────┐ │
│ UPDATE_ │──────────────────→│ BLACKHOLE │←─╯
│ FRAME │ └────────────┘
├─────────┤
│ etc. │
Now consider what happens when TSO 1 runs its duplicate-computation check.
Again, we start walking the stack from the top, where we we find the update
frame updating THUNK A. We will lock this thunk, replacing it with a BLACKHOLE
owned by its TSO. We now have,
┌─────────┐ ┌───────┐ ┌───────┐ ┌─────────┐
│ TSO 1 │←──╮ ╭────→│ go │ │ fun │ ╭────────→│ TSO 2 │
└─────────┘ │ │ └───────┘ └───────┘ │ └─────────┘
│ │ ↑ ╭─────╯
┌─────────┐ ╰──│─────────╮ │ │ ┌─────────┐
│ │──────╯ │owner ╰─────────────────│ │
├─────────┤ ┌───────────┐ │ ├─────────┤
│ UPDATE_ │──────────→│ BLACKHOLE │ │ ╭──────────│ UPDATE_ │
│ FRAME │ updatee └───────────┘ │ │ updatee │ FRAME │
├─────────┤ │ │ ├─────────┤
│ ... │ owner│ │ │ etc. │
├─────────┤ updatee ┌────────────┐ │
│ UPDATE_ │──────────────────→│ BLACKHOLE │←─╯
│ FRAME │ └────────────┘
├─────────┤
│ etc. │
Now we will continue walking down TSO 1's stack, next coming across the second
update frame, pointing to the now-BLACKHOLE'd THUNK B. At this point
threadPaused will correctly conclude that TSO 1 is duplicating a computation
being carried out by TSO 2 and attempt to suspend it.
The suspension process proceeds by invoking raiseAsync, which walks the stack
from the top looking for update frames. For each update frame we take any stack
frames preceeding it and construct an AP_STACK heap object from them. We then
replace the updatee of the frame with an indirection pointing to the AP_STACK.
So, after suspending the first update frame we have,
┌─────────┐ ┌───────┐ ┌───────┐ ┌─────────┐
│ TSO 1 │ ╭────────→│ go │←─╮ │ fun │ ╭───────→│ TSO 2 │
└─────────┘ │ └───────┘ │ └───────┘ │ └─────────┘
│ ┌───────────┐ │ ↑ ╭─────╯
┌─────────┐ │ │ AP_STACK │ │ │ │ ┌─────────┐
│ │──╯ ├───────────┤ │ ╰────────────────│ │
├─────────┤ │ │─╯ │ ├─────────┤
│ UPDATE_ │───────╮ └───────────┘ │ ╭──────────│ UPDATE_ │
│ FRAME │updatee│ ↑ │ │ updatee │ FRAME │
├─────────┤ │ │indirectee │ │ ├─────────┤
│ ... │ ╰→┌───────────┐ │ │ │ etc. │
├─────────┤updatee │ BLACKHOLE │ │ │
│ UPDATE_ │──╮ └───────────┘ owner│ │
│ FRAME │ │ ┌────────────┐ │
├─────────┤ ╰───────────────→│ BLACKHOLE │←─╯
│ etc. │ └────────────┘
Finally, we will replace the second update frame with a blackhole so that TSO 1
will block on TSO 2's computation of THUNK B,
┌─────────┐ ┌───────┐ ┌───────┐ ┌─────────┐
│ TSO 1 │ ╭────────→│ go │←─╮ │ fun │ ╭───────→│ TSO 2 │
└─────────┘ │ └───────┘ │ └───────┘ │ └─────────┘
│ ┌───────────┐ │ ↑ ╭─────╯
┌─────────┐ │ │ AP_STACK │ │ │ │ ┌─────────┐
│ │──╯ ├───────────┤ │ ╰────────────────│ │
├─────────┤ │ │─╯ │ ├─────────┤
│ UPDATE_ │───────╮ └───────────┘ │ ╭──────────│ UPDATE_ │
│ FRAME │updatee│ ↑ │ │ updatee │ FRAME │
├─────────┤ │ │indirectee │ │ ├─────────┤
│ ... │ ╰→┌───────────┐ │ │ │ etc. │
├─────────┤ │ BLACKHOLE │ │ │
│ BLACK │ └───────────┘ owner│ │
│ HOLE │───────────╮ ┌────────────┐ │
├─────────┤indirectee ╰──────→│ BLACKHOLE │←─╯
│ etc. │ └────────────┘
At first glance there's still nothing terribly alarming here. However, consider
what would happen if some other closure held a reference to THUNK A. We would
now have leaked an AP_STACK capturing the state of a potentially
non-duplicatable computation to heap. Even worse, if two threads had references
to THUNK A and both attempted to enter at the same time, they would both succeed
if we allowed AP_STACKs to be lazily blackholed. This is the reason why we must
be very careful when entering AP_STACKS: they introduce the possibility that we
duplicate a computation which could never otherwise be duplicated.
For this reason we employ an atomic blackholing strategy when entering AP_STACK
closures.
*/
INFO_TABLE(stg_AP_STACK,/*special layout*/0,0,AP_STACK,"AP_STACK","AP_STACK")
/* no args => explicit stack */
{
......@@ -477,6 +643,20 @@ INFO_TABLE(stg_AP_STACK,/*special layout*/0,0,AP_STACK,"AP_STACK","AP_STACK")
PUSH_UPD_FRAME(Sp - SIZEOF_StgUpdateFrame, R1);
Sp = Sp - SIZEOF_StgUpdateFrame - WDS(Words);
/*
* It is imperative that we blackhole lest we may duplicate computation which
* must not be duplicated. See Note [AP_STACKs must be eagerly blackholed].
*/
W_ old_info;
(old_info) = prim %cmpxchgW(ap, stg_AP_STACK_info, stg_WHITEHOLE_info);
if (old_info != stg_AP_STACK_info) {
/* someone else beat us to it */
jump ENTRY_LBL(stg_WHITEHOLE) (ap);
}
StgInd_indirectee(ap) = CurrentTSO;
prim_write_barrier;
SET_INFO(ap, __stg_EAGER_BLACKHOLE_info);
TICK_ENT_AP();
LDV_ENTER(ap);
ENTER_CCS_THUNK(ap);
......
......@@ -872,6 +872,7 @@ raiseAsync(Capability *cap, StgTSO *tso, StgClosure *exception,
ap->size = words;
ap->fun = (StgClosure *)sp[0];
sp++;
for(i=0; i < words; ++i) {
ap->payload[i] = (StgClosure *)*sp++;
......
......@@ -275,10 +275,12 @@ threadPaused(Capability *cap, StgTSO *tso)
// deadlocked on itself. See #5226 for an instance of
// this bug.
//
if ((bh_info == &stg_WHITEHOLE_info ||
bh_info == &stg_BLACKHOLE_info)
&&
((StgInd*)bh)->indirectee != (StgClosure*)tso)
// Note that great care is required when entering computations
// suspended by this mechanism. See Note [AP_STACKs must be eagerly
// blackholed] for details.
if (((bh_info == &stg_BLACKHOLE_info)
&& ((StgInd*)bh)->indirectee != (StgClosure*)tso)
|| (bh_info == &stg_WHITEHOLE_info))
{
debugTrace(DEBUG_squeeze,
"suspending duplicate work: %ld words of stack",
......@@ -304,6 +306,12 @@ threadPaused(Capability *cap, StgTSO *tso)
continue;
}
// We should never have made it here in the event of blackholes that
// we already own; they should have been marked when we blackholed
// them and consequently we should have stopped our stack walk
// above.
ASSERT(!((bh_info == &stg_BLACKHOLE_info)
&& (((StgInd*)bh)->indirectee == (StgClosure*)tso)));
// zero out the slop so that the sanity checker can tell
// where the next closure is.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment