Commit 11bac115 authored by Travis Whitaker's avatar Travis Whitaker Committed by Ben Gamari

Correct closure observation, construction, and mutation on weak memory machines.

Here the following changes are introduced:
    - A read barrier machine op is added to Cmm.
    - The order in which a closure's fields are read and written is changed.
    - Memory barriers are added to RTS code to ensure correctness on
      out-or-order machines with weak memory ordering.

Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this
is lowered to an instruction that ensures memory reads that occur after said
instruction in program order are not performed before reads coming before said
instruction in program order. On machines with strong memory ordering properties
(e.g. X86, SPARC in TSO mode) no such instruction is necessary, so
MO_ReadBarrier is simply erased. However, such an instruction is necessary on
weakly ordered machines, e.g. ARM and PowerPC.

Weam memory ordering has consequences for how closures are observed and mutated.
For example, consider a closure that needs to be updated to an indirection. In
order for the indirection to be safe for concurrent observers to enter, said
observers must read the indirection's info table before they read the
indirectee. Furthermore, the entering observer makes assumptions about the
closure based on its info table contents, e.g. an INFO_TYPE of IND imples the
closure has an indirectee pointer that is safe to follow.

When a closure is updated with an indirection, both its info table and its
indirectee must be written. With weak memory ordering, these two writes can be
arbitrarily reordered, and perhaps even interleaved with other threads' reads
and writes (in the absence of memory barrier instructions). Consider this
example of a bad reordering:

- An updater writes to a closure's info table (INFO_TYPE is now IND).
- A concurrent observer branches upon reading the closure's INFO_TYPE as IND.
- A concurrent observer reads the closure's indirectee and enters it. (!!!)
- An updater writes the closure's indirectee.

Here the update to the indirectee comes too late and the concurrent observer has
jumped off into the abyss. Speculative execution can also cause us issues,
consider:

- An observer is about to case on a value in closure's info table.
- The observer speculatively reads one or more of closure's fields.
- An updater writes to closure's info table.
- The observer takes a branch based on the new info table value, but with the
  old closure fields!
- The updater writes to the closure's other fields, but its too late.

Because of these effects, reads and writes to a closure's info table must be
ordered carefully with respect to reads and writes to the closure's other
fields, and memory barriers must be placed to ensure that reads and writes occur
in program order. Specifically, updates to a closure must follow the following
pattern:

- Update the closure's (non-info table) fields.
- Write barrier.
- Update the closure's info table.

Observing a closure's fields must follow the following pattern:

- Read the closure's info pointer.
- Read barrier.
- Read the closure's (non-info table) fields.

This patch updates RTS code to obey this pattern. This should fix long-standing
SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting
out-of-order execution) and PowerPC. This fixes issue #15449.
Co-Authored-By: Ben Gamari's avatarBen Gamari <ben@well-typed.com>
parent ef6d9a50
......@@ -593,6 +593,7 @@ data CallishMachOp
| MO_SubIntC Width
| MO_U_Mul2 Width
| MO_ReadBarrier
| MO_WriteBarrier
| MO_Touch -- Keep variables live (when using interior pointers)
......
......@@ -1001,6 +1001,7 @@ machOps = listToUFM $
callishMachOps :: UniqFM ([CmmExpr] -> (CallishMachOp, [CmmExpr]))
callishMachOps = listToUFM $
map (\(x, y) -> (mkFastString x, y)) [
( "read_barrier", (MO_ReadBarrier,)),
( "write_barrier", (MO_WriteBarrier,)),
( "memcpy", memcpyLikeTweakArgs MO_Memcpy ),
( "memset", memcpyLikeTweakArgs MO_Memset ),
......
......@@ -812,6 +812,7 @@ pprCallishMachOp_for_C mop
MO_F32_ExpM1 -> text "expm1f"
MO_F32_Sqrt -> text "sqrtf"
MO_F32_Fabs -> text "fabsf"
MO_ReadBarrier -> text "load_load_barrier"
MO_WriteBarrier -> text "write_barrier"
MO_Memcpy _ -> text "memcpy"
MO_Memset _ -> text "memset"
......
......@@ -632,6 +632,7 @@ emitBlackHoleCode node = do
when eager_blackholing $ do
emitStore (cmmOffsetW dflags node (fixedHdrSizeW dflags)) currentTSOExpr
-- See Note [Heap memory barriers] in SMP.h.
emitPrimCall [] MO_WriteBarrier []
emitStore node (CmmReg (CmmGlobal EagerBlackholeInfo))
......
......@@ -169,17 +169,25 @@ barrier = do
let s = Fence False SyncSeqCst
return (unitOL s, [])
-- | Insert a 'barrier', unless the target platform is in the provided list of
-- exceptions (where no code will be emitted instead).
barrierUnless :: [Arch] -> LlvmM StmtData
barrierUnless exs = do
platform <- getLlvmPlatform
if platformArch platform `elem` exs
then return (nilOL, [])
else barrier
-- | Foreign Calls
genCall :: ForeignTarget -> [CmmFormal] -> [CmmActual]
-> LlvmM StmtData
-- Write barrier needs to be handled specially as it is implemented as an LLVM
-- intrinsic function.
-- Barriers need to be handled specially as they are implemented as LLVM
-- intrinsic functions.
genCall (PrimTarget MO_ReadBarrier) _ _ =
barrierUnless [ArchX86, ArchX86_64, ArchSPARC]
genCall (PrimTarget MO_WriteBarrier) _ _ = do
platform <- getLlvmPlatform
if platformArch platform `elem` [ArchX86, ArchX86_64, ArchSPARC]
then return (nilOL, [])
else barrier
barrierUnless [ArchX86, ArchX86_64, ArchSPARC]
genCall (PrimTarget MO_Touch) _ _
= return (nilOL, [])
......@@ -831,6 +839,7 @@ cmmPrimOpFunctions mop = do
-- We support MO_U_Mul2 through ordinary LLVM mul instruction, see the
-- appropriate case of genCall.
MO_U_Mul2 {} -> unsupported
MO_ReadBarrier -> unsupported
MO_WriteBarrier -> unsupported
MO_Touch -> unsupported
MO_UF_Conv _ -> unsupported
......
......@@ -1123,6 +1123,8 @@ genCCall :: ForeignTarget -- function to call
-> [CmmFormal] -- where to put the result
-> [CmmActual] -- arguments (of mixed type)
-> NatM InstrBlock
genCCall (PrimTarget MO_ReadBarrier) _ _
= return $ unitOL LWSYNC
genCCall (PrimTarget MO_WriteBarrier) _ _
= return $ unitOL LWSYNC
......@@ -2030,6 +2032,7 @@ genCCall' dflags gcp target dest_regs args
MO_AddIntC {} -> unsupported
MO_SubIntC {} -> unsupported
MO_U_Mul2 {} -> unsupported
MO_ReadBarrier -> unsupported
MO_WriteBarrier -> unsupported
MO_Touch -> unsupported
MO_Prefetch_Data _ -> unsupported
......
......@@ -401,6 +401,8 @@ genCCall
--
-- In the SPARC case we don't need a barrier.
--
genCCall (PrimTarget MO_ReadBarrier) _ _
= return $ nilOL
genCCall (PrimTarget MO_WriteBarrier) _ _
= return $ nilOL
......@@ -691,6 +693,7 @@ outOfLineMachOp_table mop
MO_AddIntC {} -> unsupported
MO_SubIntC {} -> unsupported
MO_U_Mul2 {} -> unsupported
MO_ReadBarrier -> unsupported
MO_WriteBarrier -> unsupported
MO_Touch -> unsupported
(MO_Prefetch_Data _) -> unsupported
......
......@@ -1891,8 +1891,9 @@ genCCall dflags _ (PrimTarget (MO_Memset align)) _
possibleWidth = minimum [left, sizeBytes]
dst_addr = AddrBaseIndex (EABaseReg dst) EAIndexNone (ImmInteger (n - left))
genCCall _ _ (PrimTarget MO_ReadBarrier) _ _ _ = return nilOL
genCCall _ _ (PrimTarget MO_WriteBarrier) _ _ _ = return nilOL
-- write barrier compiles to no code on x86/x86-64;
-- barriers compile to no code on x86/x86-64;
-- we keep it this long in order to prevent earlier optimisations.
genCCall _ _ (PrimTarget MO_Touch) _ _ _ = return nilOL
......@@ -2948,6 +2949,7 @@ outOfLineCmmOp bid mop res args
MO_AddWordC {} -> unsupported
MO_SubWordC {} -> unsupported
MO_U_Mul2 {} -> unsupported
MO_ReadBarrier -> unsupported
MO_WriteBarrier -> unsupported
MO_Touch -> unsupported
(MO_Prefetch_Data _ ) -> unsupported
......
......@@ -308,7 +308,9 @@
#define ENTER_(ret,x) \
again: \
W_ info; \
LOAD_INFO(ret,x) \
LOAD_INFO(ret,x) \
/* See Note [Heap memory barriers] in SMP.h */ \
prim_read_barrier; \
switch [INVALID_OBJECT .. N_CLOSURE_TYPES] \
(TO_W_( %INFO_TYPE(%STD_INFO(info)) )) { \
case \
......@@ -631,6 +633,14 @@
#define OVERWRITING_CLOSURE_OFS(c,n) /* nothing */
#endif
// Memory barriers.
// For discussion of how these are used to fence heap object
// accesses see Note [Heap memory barriers] in SMP.h.
#if defined(THREADED_RTS)
#define prim_read_barrier prim %read_barrier()
#else
#define prim_read_barrier /* nothing */
#endif
#if defined(THREADED_RTS)
#define prim_write_barrier prim %write_barrier()
#else
......
......@@ -96,6 +96,151 @@ EXTERN_INLINE void write_barrier(void);
EXTERN_INLINE void store_load_barrier(void);
EXTERN_INLINE void load_load_barrier(void);
/*
* Note [Heap memory barriers]
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~
*
* Machines with weak memory ordering semantics have consequences for how
* closures are observed and mutated. For example, consider a thunk that needs
* to be updated to an indirection. In order for the indirection to be safe for
* concurrent observers to enter, said observers must read the indirection's
* info table before they read the indirectee. Furthermore, the indirectee must
* be set before the info table pointer. This ensures that if the observer sees
* an IND info table then the indirectee is valid.
*
* When a closure is updated with an indirection, both its info table and its
* indirectee must be written. With weak memory ordering, these two writes can
* be arbitrarily reordered, and perhaps even interleaved with other threads'
* reads and writes (in the absence of memory barrier instructions). Consider
* this example of a bad reordering:
*
* - An updater writes to a closure's info table (INFO_TYPE is now IND).
* - A concurrent observer branches upon reading the closure's INFO_TYPE as IND.
* - A concurrent observer reads the closure's indirectee and enters it.
* - An updater writes the closure's indirectee.
*
* Here the update to the indirectee comes too late and the concurrent observer
* has jumped off into the abyss. Speculative execution can also cause us
* issues, consider:
*
* - an observer is about to case on a value in closure's info table.
* - the observer speculatively reads one or more of closure's fields.
* - an updater writes to closure's info table.
* - the observer takes a branch based on the new info table value, but with the
* old closure fields!
* - the updater writes to the closure's other fields, but its too late.
*
* Because of these effects, reads and writes to a closure's info table must be
* ordered carefully with respect to reads and writes to the closure's other
* fields, and memory barriers must be placed to ensure that reads and writes
* occur in program order. Specifically, updates to an already existing closure
* must follow the following pattern:
*
* - Update the closure's (non-info table) fields.
* - Write barrier.
* - Update the closure's info table.
*
* Observing the fields of an updateable closure (e.g. a THUNK) must follow the
* following pattern:
*
* - Read the closure's info pointer.
* - Read barrier.
* - Read the closure's (non-info table) fields.
*
* We must also take care when we expose a newly-allocated closure to other cores
* by writing a pointer to it to some shared data structure (e.g. an MVar#, a Message,
* or MutVar#). Specifically, we need to ensure that all writes constructing the
* closure are visible *before* the write exposing the new closure is made visible:
*
* - Allocate memory for the closure
* - Write the closure's info pointer and fields (ordering betweeen this doesn't
* matter since the closure isn't yet visible to anyone else).
* - Write barrier
* - Make closure visible to other cores
*
* Note that thread stacks are inherently thread-local and consequently allocating an
* object and introducing a reference to it to our stack needs no barrier.
*
* There are several ways in which the mutator may make a newly-allocated
* closure visible to other cores:
*
* - Eager blackholing a THUNK:
* This is protected by an explicit write barrier in the eager blackholing
* code produced by the codegen. See StgCmmBind.emitBlackHoleCode.
*
* - Lazy blackholing a THUNK:
* This is is protected by an explicit write barrier in the thread suspension
* code. See ThreadPaused.c:threadPaused.
*
* - Updating a BLACKHOLE:
* This case is protected by explicit write barriers in the the update frame
* entry code (see rts/Updates.h).
*
* - Blocking on an MVar# (e.g. takeMVar#):
* In this case the appropriate MVar primops (e.g. stg_takeMVarzh). include
* explicit memory barriers to ensure that the the newly-allocated
* MVAR_TSO_QUEUE is visible to other cores.
*
* - Write to an MVar# (e.g. putMVar#):
* This protected by the full barrier implied by the CAS in putMVar#.
*
* - Write to a TVar#:
* This is protected by the full barrier implied by the CAS in STM.c:lock_stm.
*
* - Write to an Array#, ArrayArray#, or SmallArray#:
* This case is protected by an explicit write barrier in the code produced
* for this primop by the codegen. See StgCmmPrim.doWritePtrArrayOp and
* StgCmmPrim.doWriteSmallPtrArrayOp. Relevant issue: #12469.
*
* - Write to MutVar# via writeMutVar#:
* This case is protected by an explicit write barrier in the code produced
* for this primop by the codegen.
*
* - Write to MutVar# via atomicModifyMutVar# or casMutVar#:
* This is protected by the full barrier implied by the cmpxchg operations
* in this primops.
*
* - Sending a Message to another capability:
* This is protected by the acquition and release of the target capability's
* lock in Messages.c:sendMessage.
*
* Finally, we must ensure that we flush all cores store buffers before
* entering and leaving GC, since stacks may be read by other cores. This
* happens as a side-effect of taking and release mutexes (which implies
* acquire and release barriers, respectively).
*
* N.B. recordClosureMutated places a reference to the mutated object on
* the capability-local mut_list. Consequently this does not require any memory
* barrier.
*
* During parallel GC we need to be careful during evacuation: before replacing
* a closure with a forwarding pointer we must commit a write barrier to ensure
* that the copy we made in to-space is visible to other cores.
*
* However, we can be a bit lax when *reading* during GC. Specifically, the GC
* can only make a very limited set of changes to existing closures:
*
* - it can replace a closure's info table with stg_WHITEHOLE.
* - it can replace a previously-whitehole'd closure's info table with a
* forwarding pointer
* - it can replace a previously-whitehole'd closure's info table with a
* valid info table pointer (done in eval_thunk_selector)
* - it can update the value of a pointer field after evacuating it
*
* This is quite nice since we don't need to worry about an interleaving
* of writes producing an invalid state: a closure's fields remain valid after
* an update of its info table pointer and vice-versa.
*
* After a round of parallel scavenging we must also ensure that any writes the
* GC thread workers made are visible to the main GC thread. This is ensured by
* the full barrier implied by the atomic decrement in
* GC.c:scavenge_until_all_done.
*
* The work-stealing queue (WSDeque) also requires barriers; these are
* documented in WSDeque.c.
*
*/
/* ----------------------------------------------------------------------------
Implementations
------------------------------------------------------------------------- */
......
......@@ -62,6 +62,8 @@ again:
W_ info;
P_ untaggedfun;
W_ arity;
// We must obey the correct heap object observation pattern in
// Note [Heap memory barriers] in SMP.h.
untaggedfun = UNTAG(fun);
info = %INFO_PTR(untaggedfun);
switch [INVALID_OBJECT .. N_CLOSURE_TYPES]
......
......@@ -53,6 +53,9 @@ import CLOSURE base_GHCziIOziException_cannotCompactPinned_closure;
// data structure. It takes the location to store the address of the
// compacted object as an argument, so that it can be tail-recursive.
//
// N.B. No memory barrier (see Note [Heap memory barriers] in SMP.h) is needed
// here since this is essentially an allocation of a new object which won't
// be visible to other cores until after we return.
stg_compactAddWorkerzh (
P_ compact, // The Compact# object
P_ p, // The object to compact
......
......@@ -266,7 +266,6 @@ StgClosure * copyPAP (Capability *cap, StgPAP *oldpap)
uint32_t size = PAP_sizeW(oldpap->n_args);
StgPAP *pap = (StgPAP *)allocate(cap, size);
enterFunCCS(&cap->r, oldpap->header.prof.ccs);
SET_HDR(pap, &stg_PAP_info, cap->r.rCCCS);
pap->arity = oldpap->arity;
pap->n_args = oldpap->n_args;
pap->fun = oldpap->fun;
......@@ -274,6 +273,8 @@ StgClosure * copyPAP (Capability *cap, StgPAP *oldpap)
for (i = 0; i < ((StgPAP *)pap)->n_args; i++) {
pap->payload[i] = oldpap->payload[i];
}
// No write barrier is needed here as this is a new allocation
SET_HDR(pap, &stg_PAP_info, cap->r.rCCCS);
return (StgClosure *)pap;
}
......@@ -799,7 +800,6 @@ do_apply:
// build a new PAP and return it.
StgPAP *new_pap;
new_pap = (StgPAP *)allocate(cap, PAP_sizeW(pap->n_args + m));
SET_HDR(new_pap,&stg_PAP_info,cap->r.rCCCS);
new_pap->arity = pap->arity - n;
new_pap->n_args = pap->n_args + m;
new_pap->fun = pap->fun;
......@@ -809,6 +809,8 @@ do_apply:
for (i = 0; i < m; i++) {
new_pap->payload[pap->n_args + i] = (StgClosure *)SpW(i);
}
// No write barrier is needed here as this is a new allocation
SET_HDR(new_pap,&stg_PAP_info,cap->r.rCCCS);
tagged_obj = (StgClosure *)new_pap;
Sp_addW(m);
goto do_return;
......@@ -844,13 +846,14 @@ do_apply:
StgPAP *pap;
uint32_t i;
pap = (StgPAP *)allocate(cap, PAP_sizeW(m));
SET_HDR(pap, &stg_PAP_info,cap->r.rCCCS);
pap->arity = arity - n;
pap->fun = obj;
pap->n_args = m;
for (i = 0; i < m; i++) {
pap->payload[i] = (StgClosure *)SpW(i);
}
// No write barrier is needed here as this is a new allocation
SET_HDR(pap, &stg_PAP_info,cap->r.rCCCS);
tagged_obj = (StgClosure *)pap;
Sp_addW(m);
goto do_return;
......@@ -1081,7 +1084,6 @@ run_BCO:
// the BCO
size_words = BCO_BITMAP_SIZE(obj) + 2;
new_aps = (StgAP_STACK *) allocate(cap, AP_STACK_sizeW(size_words));
SET_HDR(new_aps,&stg_AP_STACK_info,cap->r.rCCCS);
new_aps->size = size_words;
new_aps->fun = &stg_dummy_ret_closure;
......@@ -1095,6 +1097,9 @@ run_BCO:
new_aps->payload[i] = (StgClosure *)SpW(i-2);
}
// No write barrier is needed here as this is a new allocation
SET_HDR(new_aps,&stg_AP_STACK_info,cap->r.rCCCS);
// Arrange the stack to call the breakpoint IO action, and
// continue execution of this BCO when the IO action returns.
//
......@@ -1423,6 +1428,8 @@ run_BCO:
ap = (StgAP*)allocate(cap, AP_sizeW(n_payload));
SpW(-1) = (W_)ap;
ap->n_args = n_payload;
// No write barrier is needed here as this is a new allocation
// visible only from our stack
SET_HDR(ap, &stg_AP_info, cap->r.rCCCS)
Sp_subW(1);
goto nextInsn;
......@@ -1434,6 +1441,8 @@ run_BCO:
ap = (StgAP*)allocate(cap, AP_sizeW(n_payload));
SpW(-1) = (W_)ap;
ap->n_args = n_payload;
// No write barrier is needed here as this is a new allocation
// visible only from our stack
SET_HDR(ap, &stg_AP_NOUPD_info, cap->r.rCCCS)
Sp_subW(1);
goto nextInsn;
......@@ -1447,6 +1456,8 @@ run_BCO:
SpW(-1) = (W_)pap;
pap->n_args = n_payload;
pap->arity = arity;
// No write barrier is needed here as this is a new allocation
// visible only from our stack
SET_HDR(pap, &stg_PAP_info, cap->r.rCCCS)
Sp_subW(1);
goto nextInsn;
......@@ -1522,12 +1533,14 @@ run_BCO:
itbl->layout.payload.nptrs );
StgClosure* con = (StgClosure*)allocate_NONUPD(cap,request);
ASSERT( itbl->layout.payload.ptrs + itbl->layout.payload.nptrs > 0);
SET_HDR(con, (StgInfoTable*)BCO_LIT(o_itbl), cap->r.rCCCS);
for (i = 0; i < n_words; i++) {
con->payload[i] = (StgClosure*)SpW(i);
}
Sp_addW(n_words);
Sp_subW(1);
// No write barrier is needed here as this is a new allocation
// visible only from our stack
SET_HDR(con, (StgInfoTable*)BCO_LIT(o_itbl), cap->r.rCCCS);
SpW(0) = (W_)con;
IF_DEBUG(interpreter,
debugBelch("\tBuilt ");
......
......@@ -173,6 +173,7 @@ uint32_t messageBlackHole(Capability *cap, MessageBlackHole *msg)
"blackhole %p", (W_)msg->tso->id, msg->bh);
info = bh->header.info;
load_load_barrier(); // See Note [Heap memory barriers] in SMP.h
// If we got this message in our inbox, it might be that the
// BLACKHOLE has already been updated, and GC has shorted out the
......@@ -196,6 +197,7 @@ loop:
// and turns this into an infinite loop.
p = UNTAG_CLOSURE((StgClosure*)VOLATILE_LOAD(&((StgInd*)bh)->indirectee));
info = p->header.info;
load_load_barrier(); // See Note [Heap memory barriers] in SMP.h
if (info == &stg_IND_info)
{
......@@ -226,7 +228,6 @@ loop:
bq = (StgBlockingQueue*)allocate(cap, sizeofW(StgBlockingQueue));
// initialise the BLOCKING_QUEUE object
SET_HDR(bq, &stg_BLOCKING_QUEUE_DIRTY_info, CCS_SYSTEM);
bq->bh = bh;
bq->queue = msg;
bq->owner = owner;
......@@ -238,6 +239,11 @@ loop:
// a collision to update a BLACKHOLE and a BLOCKING_QUEUE
// becomes orphaned (see updateThunk()).
bq->link = owner->bq;
SET_HDR(bq, &stg_BLOCKING_QUEUE_DIRTY_info, CCS_SYSTEM);
// We are about to make the newly-constructed message visible to other cores;
// a barrier is necessary to ensure that all writes are visible.
// See Note [Heap memory barriers] in SMP.h.
write_barrier();
owner->bq = bq;
dirty_TSO(cap, owner); // we modified owner->bq
......@@ -255,7 +261,7 @@ loop:
}
// point to the BLOCKING_QUEUE from the BLACKHOLE
write_barrier(); // make the BQ visible
write_barrier(); // make the BQ visible, see Note [Heap memory barriers].
((StgInd*)bh)->indirectee = (StgClosure *)bq;
recordClosureMutated(cap,bh); // bh was mutated
......@@ -286,10 +292,14 @@ loop:
msg->link = bq->queue;
bq->queue = msg;
// No barrier is necessary here: we are only exposing the
// closure to the GC. See Note [Heap memory barriers] in SMP.h.
recordClosureMutated(cap,(StgClosure*)msg);
if (info == &stg_BLOCKING_QUEUE_CLEAN_info) {
bq->header.info = &stg_BLOCKING_QUEUE_DIRTY_info;
// No barrier is necessary here: we are only exposing the
// closure to the GC. See Note [Heap memory barriers] in SMP.h.
recordClosureMutated(cap,(StgClosure*)bq);
}
......
......@@ -102,6 +102,7 @@ stg_newPinnedByteArrayzh ( W_ n )
to BA_ALIGN bytes: */
p = p + ((-p - SIZEOF_StgArrBytes) & BA_MASK);
/* No write barrier needed since this is a new allocation. */
SET_HDR(p, stg_ARR_WORDS_info, CCCS);
StgArrBytes_bytes(p) = n;
return (p);
......@@ -144,6 +145,7 @@ stg_newAlignedPinnedByteArrayzh ( W_ n, W_ alignment )
<alignment> is a power of 2, which is technically not guaranteed */
p = p + ((-p - SIZEOF_StgArrBytes) & (alignment - 1));
/* No write barrier needed since this is a new allocation. */
SET_HDR(p, stg_ARR_WORDS_info, CCCS);
StgArrBytes_bytes(p) = n;
return (p);
......@@ -254,6 +256,7 @@ stg_newArrayzh ( W_ n /* words */, gcptr init )
}
TICK_ALLOC_PRIM(SIZEOF_StgMutArrPtrs, WDS(size), 0);
/* No write barrier needed since this is a new allocation. */
SET_HDR(arr, stg_MUT_ARR_PTRS_DIRTY_info, CCCS);
StgMutArrPtrs_ptrs(arr) = n;
StgMutArrPtrs_size(arr) = size;
......@@ -405,6 +408,7 @@ stg_newSmallArrayzh ( W_ n /* words */, gcptr init )
}
TICK_ALLOC_PRIM(SIZEOF_StgSmallMutArrPtrs, WDS(n), 0);
/* No write barrier needed since this is a new allocation. */
SET_HDR(arr, stg_SMALL_MUT_ARR_PTRS_DIRTY_info, CCCS);
StgSmallMutArrPtrs_ptrs(arr) = n;
......@@ -522,6 +526,7 @@ stg_newMutVarzh ( gcptr init )
ALLOC_PRIM_P (SIZEOF_StgMutVar, stg_newMutVarzh, init);
mv = Hp - SIZEOF_StgMutVar + WDS(1);
/* No write barrier needed since this is a new allocation. */
SET_HDR(mv,stg_MUT_VAR_DIRTY_info,CCCS);
StgMutVar_var(mv) = init;
......@@ -700,6 +705,7 @@ stg_mkWeakzh ( gcptr key,
ALLOC_PRIM (SIZEOF_StgWeak)
w = Hp - SIZEOF_StgWeak + WDS(1);
// No memory barrier needed as this is a new allocation.
SET_HDR(w, stg_WEAK_info, CCCS);
StgWeak_key(w) = key;
......@@ -815,6 +821,7 @@ stg_deRefWeakzh ( gcptr w )
gcptr val;
info = GET_INFO(w);
prim_read_barrier;
if (info == stg_WHITEHOLE_info) {
// w is locked by another thread. Now it's not immediately clear if w is
......@@ -1385,11 +1392,13 @@ stg_readTVarzh (P_ tvar)
stg_readTVarIOzh ( P_ tvar /* :: TVar a */ )
{
W_ result;
W_ result, resultinfo;
again:
result = StgTVar_current_value(tvar);
if (%INFO_PTR(result) == stg_TREC_HEADER_info) {
resultinfo = %INFO_PTR(result);
prim_read_barrier;
if (resultinfo == stg_TREC_HEADER_info) {
goto again;
}
return (result);
......@@ -1458,6 +1467,7 @@ stg_newMVarzh ()
ALLOC_PRIM_ (SIZEOF_StgMVar, stg_newMVarzh);
mvar = Hp - SIZEOF_StgMVar + WDS(1);
// No memory barrier needed as this is a new allocation.
SET_HDR(mvar,stg_MVAR_DIRTY_info,CCCS);
// MVARs start dirty: generation 0 has no mutable list
StgMVar_head(mvar) = stg_END_TSO_QUEUE_closure;
......@@ -1482,7 +1492,7 @@ stg_newMVarzh ()
stg_takeMVarzh ( P_ mvar /* :: MVar a */ )
{
W_ val, info, tso, q;
W_ val, info, tso, q, qinfo;
LOCK_CLOSURE(mvar, info);
......@@ -1504,9 +1514,12 @@ stg_takeMVarzh ( P_ mvar /* :: MVar a */ )
q = Hp - SIZEOF_StgMVarTSOQueue + WDS(1);
SET_HDR(q, stg_MVAR_TSO_QUEUE_info, CCS_SYSTEM);
StgMVarTSOQueue_link(q) = END_TSO_QUEUE;
StgMVarTSOQueue_tso(q) = CurrentTSO;
SET_HDR(q, stg_MVAR_TSO_QUEUE_info, CCS_SYSTEM);
// Write barrier before we make the new MVAR_TSO_QUEUE
// visible to other cores.
prim_write_barrier;
if (StgMVar_head(mvar) == stg_END_TSO_QUEUE_closure) {
StgMVar_head(mvar) = q;
......@@ -1536,8 +1549,10 @@ loop:
unlockClosure(mvar, info);
return (val);
}
if (StgHeader_info(q) == stg_IND_info ||
StgHeader_info(q) == stg_MSG_NULL_info) {
qinfo = StgHeader_info(q);
prim_read_barrier;
if (qinfo == stg_IND_info ||
qinfo == stg_MSG_NULL_info) {
q = StgInd_indirectee(q);
goto loop;
}
......@@ -1575,7 +1590,7 @@ loop:
stg_tryTakeMVarzh ( P_ mvar /* :: MVar a */ )
{
W_ val, info, tso, q;
W_ val, info, tso, q, qinfo;
LOCK_CLOSURE(mvar, info);
......@@ -1602,8 +1617,11 @@ loop:
return (1, val);
}
if (StgHeader_info(q) == stg_IND_info ||
StgHeader_info(q) == stg_MSG_NULL_info) {
qinfo = StgHeader_info(q);
prim_read_barrier;
if (qinfo == stg_IND_info ||
qinfo == stg_MSG_NULL_info) {
q = StgInd_indirectee(q);
goto loop;
}
......@@ -1642,7 +1660,7 @@ loop:
stg_putMVarzh ( P_ mvar, /* :: MVar a */
P_ val, /* :: a */ )
{
W_ info, tso, q;
W_ info, tso, q, qinfo;
LOCK_CLOSURE(mvar, info);
......@@ -1662,10 +1680,12 @@ stg_putMVarzh ( P_ mvar, /* :: MVar a */
q = Hp - SIZEOF_StgMVarTSOQueue + WDS(1);
SET_HDR(q, stg_MVAR_TSO_QUEUE_info, CCS_SYSTEM);
StgMVarTSOQueue_link(q) = END_TSO_QUEUE;
StgMVarTSOQueue_tso(q) = CurrentTSO;
SET_HDR(q, stg_MVAR_TSO_QUEUE_info, CCS_SYSTEM);
prim_write_barrier;
if (StgMVar_head(mvar) == stg_END_TSO_QUEUE_closure) {
StgMVar_head(mvar) = q;