Commit d7017446 authored by Ben Gamari's avatar Ben Gamari 🐢 Committed by Ben Gamari

rts: Implement concurrent collection in the nonmoving collector

This extends the non-moving collector to allow concurrent collection.

The full design of the collector implemented here is described in detail
in a technical note

    B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
    Compiler" (2018)

This extension involves the introduction of a capability-local
remembered set, known as the /update remembered set/, which tracks
objects which may no longer be visible to the collector due to mutation.
To maintain this remembered set we introduce a write barrier on
mutations which is enabled while a concurrent mark is underway.

The update remembered set representation is similar to that of the
nonmoving mark queue, being a chunked array of `MarkEntry`s. Each
`Capability` maintains a single accumulator chunk, which it flushed
when it (a) is filled, or (b) when the nonmoving collector enters its
post-mark synchronization phase.

While the write barrier touches a significant amount of code it is
conceptually straightforward: the mutator must ensure that the referee
of any pointer it overwrites is added to the update remembered set.
However, there are a few details:

 * In the case of objects with a dirty flag (e.g. `MVar`s) we can
   exploit the fact that only the *first* mutation requires a write
   barrier.

 * Weak references, as usual, complicate things. In particular, we must
   ensure that the referee of a weak object is marked if dereferenced by
   the mutator. For this we (unfortunately) must introduce a read
   barrier, as described in Note [Concurrent read barrier on deRefWeak#]
   (in `NonMovingMark.c`).

 * Stable names are also a bit tricky as described in Note [Sweeping
   stable names in the concurrent collector] (`NonMovingSweep.c`).

We take quite some pains to ensure that the high thread count often seen
in parallel Haskell applications doesn't affect pause times. To this end
we allow thread stacks to be marked either by the thread itself (when it
is executed or stack-underflows) or the concurrent mark thread (if the
thread owning the stack is never scheduled). There is a non-trivial
handshake to ensure that this happens without racing which is described
in Note [StgStack dirtiness flags and concurrent marking].
Co-Authored-by: Ömer Sinan Ağacan's avatarÖmer Sinan Ağacan <omer@well-typed.com>
parent 26ffa5a2
......@@ -631,6 +631,7 @@ emitBlackHoleCode node = do
-- work with profiling.
when eager_blackholing $ do
whenUpdRemSetEnabled dflags $ emitUpdRemSetPushThunk node
emitStore (cmmOffsetW dflags node (fixedHdrSizeW dflags)) currentTSOExpr
-- See Note [Heap memory barriers] in SMP.h.
emitPrimCall [] MO_WriteBarrier []
......
......@@ -42,6 +42,7 @@ import BlockId
import MkGraph
import StgSyn
import Cmm
import Module ( rtsUnitId )
import Type ( Type, tyConAppTyCon )
import TyCon
import CLabel
......@@ -339,14 +340,20 @@ dispatchPrimop dflags = \case
emitAssign (CmmLocal res) (cmmLoadIndexW dflags mutv (fixedHdrSizeW dflags) (gcWord dflags))
WriteMutVarOp -> \[mutv, var] -> OpDest_AllDone $ \res@[] -> do
old_val <- CmmLocal <$> newTemp (cmmExprType dflags var)
emitAssign old_val (cmmLoadIndexW dflags mutv (fixedHdrSizeW dflags) (gcWord dflags))
-- Without this write barrier, other CPUs may see this pointer before
-- the writes for the closure it points to have occurred.
-- Note that this also must come after we read the old value to ensure
-- that the read of old_val comes before another core's write to the
-- MutVar's value.
emitPrimCall res MO_WriteBarrier []
emitStore (cmmOffsetW dflags mutv (fixedHdrSizeW dflags)) var
emitCCall
[{-no results-}]
(CmmLit (CmmLabel mkDirty_MUT_VAR_Label))
[(baseExpr, AddrHint), (mutv,AddrHint)]
[(baseExpr, AddrHint), (mutv, AddrHint), (CmmReg old_val, AddrHint)]
-- #define sizzeofByteArrayzh(r,a) \
-- r = ((StgArrBytes *)(a))->bytes
......@@ -1983,17 +1990,21 @@ doWritePtrArrayOp :: CmmExpr
doWritePtrArrayOp addr idx val
= do dflags <- getDynFlags
let ty = cmmExprType dflags val
hdr_size = arrPtrsHdrSize dflags
-- Update remembered set for non-moving collector
whenUpdRemSetEnabled dflags
$ emitUpdRemSetPush (cmmLoadIndexOffExpr dflags hdr_size ty addr ty idx)
-- This write barrier is to ensure that the heap writes to the object
-- referred to by val have happened before we write val into the array.
-- See #12469 for details.
emitPrimCall [] MO_WriteBarrier []
mkBasicIndexedWrite (arrPtrsHdrSize dflags) Nothing addr ty idx val
mkBasicIndexedWrite hdr_size Nothing addr ty idx val
emit (setInfo addr (CmmLit (CmmLabel mkMAP_DIRTY_infoLabel)))
-- the write barrier. We must write a byte into the mark table:
-- bits8[a + header_size + StgMutArrPtrs_size(a) + x >> N]
-- the write barrier. We must write a byte into the mark table:
-- bits8[a + header_size + StgMutArrPtrs_size(a) + x >> N]
emit $ mkStore (
cmmOffsetExpr dflags
(cmmOffsetExprW dflags (cmmOffsetB dflags addr (arrPtrsHdrSize dflags))
(cmmOffsetExprW dflags (cmmOffsetB dflags addr hdr_size)
(loadArrPtrsSize dflags addr))
(CmmMachOp (mo_wordUShr dflags) [idx,
mkIntExpr dflags (mUT_ARR_PTRS_CARD_BITS dflags)])
......@@ -2584,6 +2595,9 @@ emitCopyArray copy src0 src_off dst0 dst_off0 n =
dst <- assignTempE dst0
dst_off <- assignTempE dst_off0
-- Nonmoving collector write barrier
emitCopyUpdRemSetPush dflags (arrPtrsHdrSizeW dflags) dst dst_off n
-- Set the dirty bit in the header.
emit (setInfo dst (CmmLit (CmmLabel mkMAP_DIRTY_infoLabel)))
......@@ -2646,6 +2660,9 @@ emitCopySmallArray copy src0 src_off dst0 dst_off n =
src <- assignTempE src0
dst <- assignTempE dst0
-- Nonmoving collector write barrier
emitCopyUpdRemSetPush dflags (smallArrPtrsHdrSizeW dflags) dst dst_off n
-- Set the dirty bit in the header.
emit (setInfo dst (CmmLit (CmmLabel mkSMAP_DIRTY_infoLabel)))
......@@ -2774,6 +2791,12 @@ doWriteSmallPtrArrayOp :: CmmExpr
doWriteSmallPtrArrayOp addr idx val = do
dflags <- getDynFlags
let ty = cmmExprType dflags val
-- Update remembered set for non-moving collector
tmp <- newTemp ty
mkBasicIndexedRead (smallArrPtrsHdrSize dflags) Nothing ty tmp addr ty idx
whenUpdRemSetEnabled dflags $ emitUpdRemSetPush (CmmReg (CmmLocal tmp))
emitPrimCall [] MO_WriteBarrier [] -- #12469
mkBasicIndexedWrite (smallArrPtrsHdrSize dflags) Nothing addr ty idx val
emit (setInfo addr (CmmLit (CmmLabel mkSMAP_DIRTY_infoLabel)))
......@@ -2953,3 +2976,31 @@ emitCtzCall res x width = do
[ res ]
(MO_Ctz width)
[ x ]
---------------------------------------------------------------------------
-- Pushing to the update remembered set
---------------------------------------------------------------------------
-- | Push a range of pointer-array elements that are about to be copied over to
-- the update remembered set.
emitCopyUpdRemSetPush :: DynFlags
-> WordOff -- ^ array header size
-> CmmExpr -- ^ destination array
-> CmmExpr -- ^ offset in destination array (in words)
-> Int -- ^ number of elements to copy
-> FCode ()
emitCopyUpdRemSetPush _dflags _hdr_size _dst _dst_off 0 = return ()
emitCopyUpdRemSetPush dflags hdr_size dst dst_off n =
whenUpdRemSetEnabled dflags $ do
updfr_off <- getUpdFrameOff
graph <- mkCall lbl (NativeNodeCall,NativeReturn) [] args updfr_off []
emit graph
where
lbl = mkLblExpr $ mkPrimCallLabel
$ PrimCall (fsLit "stg_copyArray_barrier") rtsUnitId
args =
[ mkIntExpr dflags hdr_size
, dst
, dst_off
, mkIntExpr dflags n
]
......@@ -39,6 +39,11 @@ module GHC.StgToCmm.Utils (
mkWordCLit,
newStringCLit, newByteStringCLit,
blankWord,
-- * Update remembered set operations
whenUpdRemSetEnabled,
emitUpdRemSetPush,
emitUpdRemSetPushThunk,
) where
#include "HsVersions.h"
......@@ -576,3 +581,40 @@ assignTemp' e
let reg = CmmLocal lreg
emitAssign reg e
return (CmmReg reg)
---------------------------------------------------------------------------
-- Pushing to the update remembered set
---------------------------------------------------------------------------
whenUpdRemSetEnabled :: DynFlags -> FCode a -> FCode ()
whenUpdRemSetEnabled dflags code = do
do_it <- getCode code
the_if <- mkCmmIfThenElse' is_enabled do_it mkNop (Just False)
emit the_if
where
enabled = CmmLoad (CmmLit $ CmmLabel mkNonmovingWriteBarrierEnabledLabel) (bWord dflags)
zero = zeroExpr dflags
is_enabled = cmmNeWord dflags enabled zero
-- | Emit code to add an entry to a now-overwritten pointer to the update
-- remembered set.
emitUpdRemSetPush :: CmmExpr -- ^ value of pointer which was overwritten
-> FCode ()
emitUpdRemSetPush ptr = do
emitRtsCall
rtsUnitId
(fsLit "updateRemembSetPushClosure_")
[(CmmReg (CmmGlobal BaseReg), AddrHint),
(ptr, AddrHint)]
False
emitUpdRemSetPushThunk :: CmmExpr -- ^ the thunk
-> FCode ()
emitUpdRemSetPushThunk ptr = do
emitRtsCall
rtsUnitId
(fsLit "updateRemembSetPushThunk_")
[(CmmReg (CmmGlobal BaseReg), AddrHint),
(ptr, AddrHint)]
False
......@@ -40,6 +40,7 @@ module CLabel (
mkAsmTempDieLabel,
mkDirty_MUT_VAR_Label,
mkNonmovingWriteBarrierEnabledLabel,
mkUpdInfoLabel,
mkBHUpdInfoLabel,
mkIndStaticInfoLabel,
......@@ -484,7 +485,9 @@ mkBlockInfoTableLabel name c = IdLabel name c BlockInfoTable
-- See Note [Proc-point local block entry-point].
-- Constructing Cmm Labels
mkDirty_MUT_VAR_Label, mkUpdInfoLabel,
mkDirty_MUT_VAR_Label,
mkNonmovingWriteBarrierEnabledLabel,
mkUpdInfoLabel,
mkBHUpdInfoLabel, mkIndStaticInfoLabel, mkMainCapabilityLabel,
mkMAP_FROZEN_CLEAN_infoLabel, mkMAP_FROZEN_DIRTY_infoLabel,
mkMAP_DIRTY_infoLabel,
......@@ -494,6 +497,8 @@ mkDirty_MUT_VAR_Label, mkUpdInfoLabel,
mkSMAP_FROZEN_CLEAN_infoLabel, mkSMAP_FROZEN_DIRTY_infoLabel,
mkSMAP_DIRTY_infoLabel, mkBadAlignmentLabel :: CLabel
mkDirty_MUT_VAR_Label = mkForeignLabel (fsLit "dirty_MUT_VAR") Nothing ForeignLabelInExternalPackage IsFunction
mkNonmovingWriteBarrierEnabledLabel
= CmmLabel rtsUnitId (fsLit "nonmoving_write_barrier_enabled") CmmData
mkUpdInfoLabel = CmmLabel rtsUnitId (fsLit "stg_upd_frame") CmmInfo
mkBHUpdInfoLabel = CmmLabel rtsUnitId (fsLit "stg_bh_upd_frame" ) CmmInfo
mkIndStaticInfoLabel = CmmLabel rtsUnitId (fsLit "stg_IND_STATIC") CmmInfo
......
......@@ -842,6 +842,10 @@
__gen = TO_W_(bdescr_gen_no(__bd)); \
if (__gen > 0) { recordMutableCap(__p, __gen); }
/* -----------------------------------------------------------------------------
Update remembered set write barrier
-------------------------------------------------------------------------- */
/* -----------------------------------------------------------------------------
Arrays
-------------------------------------------------------------------------- */
......@@ -944,3 +948,21 @@
prim %memcpy(dst_p, src_p, n * SIZEOF_W, SIZEOF_W); \
\
return (dst);
#if defined(THREADED_RTS)
#define IF_WRITE_BARRIER_ENABLED \
if (W_[nonmoving_write_barrier_enabled] != 0) (likely: False)
#else
// A similar measure is also taken in rts/NonMoving.h, but that isn't visible from C--
#define IF_WRITE_BARRIER_ENABLED \
if (0)
#define nonmoving_write_barrier_enabled 0
#endif
// A useful helper for pushing a pointer to the update remembered set.
// See Note [Update remembered set] in NonMovingMark.c.
#define updateRemembSetPushPtr(p) \
IF_WRITE_BARRIER_ENABLED { \
ccall updateRemembSetPushClosure_(BaseReg "ptr", p "ptr"); \
}
......@@ -197,6 +197,7 @@ void _assertFail(const char *filename, unsigned int linenum)
#include "rts/storage/ClosureMacros.h"
#include "rts/storage/MBlock.h"
#include "rts/storage/GC.h"
#include "rts/NonMoving.h"
/* Other RTS external APIs */
#include "rts/Parallel.h"
......
/* -----------------------------------------------------------------------------
*
* (c) The GHC Team, 2018-2019
*
* Non-moving garbage collector
*
* Do not #include this file directly: #include "Rts.h" instead.
*
* To understand the structure of the RTS headers, see the wiki:
* http://ghc.haskell.org/trac/ghc/wiki/Commentary/SourceTree/Includes
*
* -------------------------------------------------------------------------- */
#pragma once
/* This is called by the code generator */
extern DLL_IMPORT_RTS
void updateRemembSetPushClosure_(StgRegTable *reg, StgClosure *p);
void updateRemembSetPushClosure(Capability *cap, StgClosure *p);
void updateRemembSetPushThunk_(StgRegTable *reg, StgThunk *p);
extern StgWord DLL_IMPORT_DATA_VAR(nonmoving_write_barrier_enabled);
......@@ -107,6 +107,20 @@ INLINE_HEADER const StgConInfoTable *get_con_itbl(const StgClosure *c)
return CON_INFO_PTR_TO_STRUCT((c)->header.info);
}
/* Used when we expect another thread to be mutating the info table pointer of
* a closure (e.g. when busy-waiting on a WHITEHOLE).
*/
INLINE_HEADER const StgInfoTable *get_volatile_itbl(StgClosure *c) {
// The volatile here is import to ensure that the compiler does not
// optimise away multiple loads, e.g. in a busy-wait loop. Note that
// we can't use VOLATILE_LOAD here as the casts result in strict aliasing
// rule violations and this header may be compiled outside of the RTS
// (where we use -fno-strict-aliasing).
StgInfoTable * *volatile p = (StgInfoTable * *volatile) &c->header.info;
return INFO_PTR_TO_STRUCT(*p);
}
INLINE_HEADER StgHalfWord GET_TAG(const StgClosure *con)
{
return get_itbl(con)->srt;
......
......@@ -234,7 +234,7 @@ void setKeepCAFs (void);
and is put on the mutable list.
-------------------------------------------------------------------------- */
void dirty_MUT_VAR(StgRegTable *reg, StgClosure *p);
void dirty_MUT_VAR(StgRegTable *reg, StgMutVar *mv, StgClosure *old);
/* set to disable CAF garbage collection in GHCi. */
/* (needed when dynamic libraries are used). */
......
......@@ -185,6 +185,53 @@ typedef struct StgTSO_ {
} *StgTSOPtr; // StgTSO defined in rts/Types.h
/* Note [StgStack dirtiness flags and concurrent marking]
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*
* Without concurrent collection by the nonmoving collector the stack dirtiness story
* is quite simple: The stack is either STACK_DIRTY (meaning it has been added to mut_list)
* or not.
*
* However, things are considerably more complicated with concurrent collection
* (namely, when nonmoving_write_barrier_enabled is set): In addition to adding
* the stack to mut_list and flagging it as STACK_DIRTY, we also must ensure
* that stacks are marked in accordance with the nonmoving collector's snapshot
* invariant. This is: every stack alive at the time the snapshot is taken must
* be marked at some point after the moment the snapshot is taken and before it
* is mutated or the commencement of the sweep phase.
*
* This marking may be done by the concurrent mark phase (in the case of a
* thread that never runs during the concurrent mark) or by the mutator when
* dirtying the stack. However, it is unsafe for the concurrent collector to
* traverse the stack while it is under mutation. Consequently, the following
* handshake is obeyed by the mutator's write barrier and the concurrent mark to
* ensure this doesn't happen:
*
* 1. The entity seeking to mark first checks that the stack lives in the nonmoving
* generation; if not then the stack was not alive at the time the snapshot
* was taken and therefore we need not mark it.
*
* 2. The entity seeking to mark checks the stack's mark bit. If it is set then
* no mark is necessary.
*
* 3. The entity seeking to mark tries to lock the stack for marking by
* atomically setting its `marking` field to the current non-moving mark
* epoch:
*
* a. If the mutator finds the concurrent collector has already locked the
* stack then it waits until it is finished (indicated by the mark bit
* being set) before proceeding with execution.
*
* b. If the concurrent collector finds that the mutator has locked the stack
* then it moves on, leaving the mutator to mark it. There is no need to wait;
* the mark is guaranteed to finish before sweep due to the post-mark
* synchronization with mutators.
*
* c. Whoever succeeds in locking the stack is responsible for marking it and
* setting the stack's mark bit (either the BF_MARKED bit for large objects
* or otherwise its bit in its segment's mark bitmap).
*
*/
#define STACK_DIRTY 1
// used by sanity checker to verify that all dirty stacks are on the mutable list
......@@ -193,7 +240,8 @@ typedef struct StgTSO_ {
typedef struct StgStack_ {
StgHeader header;
StgWord32 stack_size; // stack size in *words*
StgWord32 dirty; // non-zero => dirty
StgWord dirty; // non-zero => dirty
StgWord marking; // non-zero => someone is currently marking the stack
StgPtr sp; // current stack pointer
StgWord stack[];
} StgStack;
......
......@@ -542,5 +542,6 @@ void * pushCostCentre (void *ccs, void *cc);
// Capability.c
extern unsigned int n_capabilities;
extern void updateRemembSetPushThunk_(void *reg, void *p1);
#endif
......@@ -654,6 +654,8 @@ INFO_TABLE(stg_AP_STACK,/*special layout*/0,0,AP_STACK,"AP_STACK","AP_STACK")
/* someone else beat us to it */
jump ENTRY_LBL(stg_WHITEHOLE) (ap);
}
// Can't add StgInd_indirectee(ap) to UpdRemSet here because the old value is
// not reachable.
StgInd_indirectee(ap) = CurrentTSO;
prim_write_barrier;
SET_INFO(ap, __stg_EAGER_BLACKHOLE_info);
......
......@@ -292,6 +292,11 @@ initCapability (Capability *cap, uint32_t i)
RtsFlags.GcFlags.generations,
"initCapability");
// At this point storage manager is not initialized yet, so this will be
// initialized in initStorage().
cap->upd_rem_set.queue.blocks = NULL;
for (g = 0; g < RtsFlags.GcFlags.generations; g++) {
cap->mut_lists[g] = NULL;
}
......@@ -861,16 +866,27 @@ yieldCapability (Capability** pCap, Task *task, bool gcAllowed)
{
PendingSync *sync = pending_sync;
if (sync && sync->type == SYNC_GC_PAR) {
if (! sync->idle[cap->no]) {
traceEventGcStart(cap);
gcWorkerThread(cap);
traceEventGcEnd(cap);
traceSparkCounters(cap);
// See Note [migrated bound threads 2]
if (task->cap == cap) {
return true;
if (sync) {
switch (sync->type) {
case SYNC_GC_PAR:
if (! sync->idle[cap->no]) {
traceEventGcStart(cap);
gcWorkerThread(cap);
traceEventGcEnd(cap);
traceSparkCounters(cap);
// See Note [migrated bound threads 2]
if (task->cap == cap) {
return true;
}
}
break;
case SYNC_FLUSH_UPD_REM_SET:
debugTrace(DEBUG_nonmoving_gc, "Flushing update remembered set blocks...");
break;
default:
break;
}
}
}
......
......@@ -85,6 +85,9 @@ struct Capability_ {
bdescr **mut_lists;
bdescr **saved_mut_lists; // tmp use during GC
// The update remembered set for the non-moving collector
UpdRemSet upd_rem_set;
// block for allocating pinned objects into
bdescr *pinned_object_block;
// full pinned object blocks allocated since the last GC
......@@ -257,7 +260,8 @@ extern Capability **capabilities;
typedef enum {
SYNC_OTHER,
SYNC_GC_SEQ,
SYNC_GC_PAR
SYNC_GC_PAR,
SYNC_FLUSH_UPD_REM_SET
} SyncType;
//
......
......@@ -318,6 +318,7 @@ stg_killThreadzh (P_ target, P_ exception)
return ();
} else {
StgTSO_why_blocked(CurrentTSO) = BlockedOnMsgThrowTo;
updateRemembSetPushPtr(StgTSO_block_info(CurrentTSO));
StgTSO_block_info(CurrentTSO) = msg;
// we must block, and unlock the message before returning
jump stg_block_throwto (target, exception);
......@@ -489,6 +490,8 @@ retry_pop_stack:
ccall stmAbortTransaction(MyCapability() "ptr", trec "ptr");
ccall stmFreeAbortedTRec(MyCapability() "ptr", trec "ptr");
// No need to push `trec` to update remembered set; it will be no longer
// reachable after we overwrite StgTSO.trec.
StgTSO_trec(CurrentTSO) = NO_TREC;
if (r != 0) {
// Transaction was valid: continue searching for a catch frame
......@@ -607,6 +610,8 @@ retry_pop_stack:
outer = StgTRecHeader_enclosing_trec(trec);
ccall stmAbortTransaction(MyCapability() "ptr", trec "ptr");
ccall stmFreeAbortedTRec(MyCapability() "ptr", trec "ptr");
// No need to push `trec` to update remembered set since we just freed
// it; it is no longer reachable.
StgTSO_trec(CurrentTSO) = outer;
Sp = Sp + SIZEOF_StgCatchSTMFrame;
}
......
......@@ -244,8 +244,8 @@ loop:
// a barrier is necessary to ensure that all writes are visible.
// See Note [Heap memory barriers] in SMP.h.
write_barrier();
dirty_TSO(cap, owner); // we will modify owner->bq
owner->bq = bq;
dirty_TSO(cap, owner); // we modified owner->bq
// If the owner of the blackhole is currently runnable, then
// bump it to the front of the run queue. This gives the
......@@ -262,6 +262,9 @@ loop:
// point to the BLOCKING_QUEUE from the BLACKHOLE
write_barrier(); // make the BQ visible, see Note [Heap memory barriers].
if (RTS_UNLIKELY(nonmoving_write_barrier_enabled)) {
updateRemembSetPushClosure(cap, (StgClosure*)p);
}
((StgInd*)bh)->indirectee = (StgClosure *)bq;
recordClosureMutated(cap,bh); // bh was mutated
......@@ -290,6 +293,11 @@ loop:
}
#endif
if (RTS_UNLIKELY(nonmoving_write_barrier_enabled)) {
// We are about to overwrite bq->queue; make sure its current value
// makes it into the update remembered set
updateRemembSetPushClosure(cap, (StgClosure*)bq->queue);
}
msg->link = bq->queue;
bq->queue = msg;
// No barrier is necessary here: we are only exposing the
......
<
......@@ -349,8 +349,13 @@ stg_casArrayzh ( gcptr arr, W_ ind, gcptr old, gcptr new )
// Compare and Swap Succeeded:
SET_HDR(arr, stg_MUT_ARR_PTRS_DIRTY_info, CCCS);
len = StgMutArrPtrs_ptrs(arr);
// The write barrier. We must write a byte into the mark table:
I8[arr + SIZEOF_StgMutArrPtrs + WDS(len) + (ind >> MUT_ARR_PTRS_CARD_BITS )] = 1;
// Concurrent GC write barrier
updateRemembSetPushPtr(old);
return (0,new);
}
}
......@@ -462,16 +467,45 @@ stg_thawSmallArrayzh ( gcptr src, W_ offset, W_ n )
cloneSmallArray(stg_SMALL_MUT_ARR_PTRS_DIRTY_info, src, offset, n)
}
// Concurrent GC write barrier for pointer array copies
//
// hdr_size in bytes. dst_off in words, n in words.
stg_copyArray_barrier ( W_ hdr_size, gcptr dst, W_ dst_off, W_ n)
{
W_ end, p;
ASSERT(n > 0); // Assumes n==0 is handled by caller
p = dst + hdr_size + WDS(dst_off);
end = p + WDS(n);
again:
IF_WRITE_BARRIER_ENABLED {
ccall updateRemembSetPushClosure_(BaseReg "ptr", W_[p] "ptr");
}
p = p + WDS(1);
if (p < end) {
goto again;
}
return ();
}
stg_copySmallArrayzh ( gcptr src, W_ src_off, gcptr dst, W_ dst_off, W_ n)
{
W_ dst_p, src_p, bytes;
SET_INFO(dst, stg_SMALL_MUT_ARR_PTRS_DIRTY_info);
if (n > 0) {
IF_WRITE_BARRIER_ENABLED {
call stg_copyArray_barrier(SIZEOF_StgSmallMutArrPtrs,
dst, dst_off, n);
}
dst_p = dst + SIZEOF_StgSmallMutArrPtrs + WDS(dst_off);
src_p = src + SIZEOF_StgSmallMutArrPtrs + WDS(src_off);
bytes = WDS(n);
prim %memcpy(dst_p, src_p, bytes, SIZEOF_W);
SET_INFO(dst, stg_SMALL_MUT_ARR_PTRS_DIRTY_info);
dst_p = dst + SIZEOF_StgSmallMutArrPtrs + WDS(dst_off);
src_p = src + SIZEOF_StgSmallMutArrPtrs + WDS(src_off);
bytes = WDS(n);
prim %memcpy(dst_p, src_p, bytes, SIZEOF_W);
}
return ();
}
......@@ -480,15 +514,22 @@ stg_copySmallMutableArrayzh ( gcptr src, W_ src_off, gcptr dst, W_ dst_off, W_ n
{
W_ dst_p, src_p, bytes;
SET_INFO(dst, stg_SMALL_MUT_ARR_PTRS_DIRTY_info);
if (n > 0) {
IF_WRITE_BARRIER_ENABLED {
call stg_copyArray_barrier(SIZEOF_StgSmallMutArrPtrs,
dst, dst_off, n);
}
dst_p = dst + SIZEOF_StgSmallMutArrPtrs + WDS(dst_off);
src_p = src + SIZEOF_StgSmallMutArrPtrs + WDS(src_off);
bytes = WDS(n);
if (src == dst) {
prim %memmove(dst_p, src_p, bytes, SIZEOF_W);
} else {
prim %memcpy(dst_p, src_p, bytes, SIZEOF_W);
SET_INFO(dst, stg_SMALL_MUT_ARR_PTRS_DIRTY_info);
dst_p = dst + SIZEOF_StgSmallMutArrPtrs + WDS(dst_off);
src_p = src + SIZEOF_StgSmallMutArrPtrs + WDS(src_off);
bytes = WDS(n);
if (src == dst) {
prim %memmove(dst_p, src_p, bytes, SIZEOF_W);
} else {
prim %memcpy(dst_p, src_p, bytes, SIZEOF_W);
}
}
return ();
......@@ -510,6 +551,10 @@ stg_casSmallArrayzh ( gcptr arr, W_ ind, gcptr old, gcptr new )
} else {
// Compare and Swap Succeeded:
SET_HDR(arr, stg_SMALL_MUT_ARR_PTRS_DIRTY_info, CCCS);
// Concurrent GC write barrier
updateRemembSetPushPtr(old);
return (0,new);
}
}
......@@ -549,7 +594,7 @@ stg_casMutVarzh ( gcptr mv, gcptr old, gcptr new )
return (1,h);
} else {
if (GET_INFO(mv) == stg_MUT_VAR_CLEAN_info) {
ccall dirty_MUT_VAR(BaseReg "ptr", mv "ptr");
ccall dirty_MUT_VAR(BaseReg "ptr", mv "ptr", old);
}
return (0,new);
}
......@@ -562,7 +607,7 @@ stg_casMutVarzh ( gcptr mv, gcptr old, gcptr new )
} else {
StgMutVar_var(mv) = new;
if (GET_INFO(mv) == stg_MUT_VAR_CLEAN_info) {
ccall dirty_MUT_VAR(BaseReg "ptr", mv "ptr");
ccall dirty_MUT_VAR(BaseReg "ptr", mv "ptr", old);
}
return (0,new);
}
......@@ -629,11 +674,12 @@ stg_atomicModifyMutVar2zh ( gcptr mv, gcptr f )
(h) = prim %cmpxchgW(mv + SIZEOF_StgHeader + OFFSET_StgMutVar_var, x, y);
if (h != x) { goto retry; }
#else
h = StgMutVar_var(mv);
StgMutVar_var(mv) = y;
#endif
if (GET_INFO(mv) == stg_MUT_VAR_CLEAN_info) {
ccall dirty_MUT_VAR(BaseReg "ptr", mv "ptr");
ccall dirty_MUT_VAR(BaseReg "ptr", mv "ptr", h);
}
return (x,z);
......@@ -755,6 +801,9 @@ stg_addCFinalizzerToWeakzh ( W_ fptr, // finalizer
return (0);
}
// Write barrier for concurrent non-moving collector
updateRemembSetPushPtr(StgWeak_cfinalizers(w))
StgCFinalizerList_link(c) = StgWeak_cfinalizers(w);
StgWeak_cfinalizers(w) = c;
......@@ -835,6 +884,8 @@ stg_deRefWeakzh ( gcptr w )
if (info == stg_WEAK_info) {
code = 1;
val = StgWeak_value(w);
// See Note [Concurrent read barrier on deRefWeak#] in NonMovingMark.c
updateRemembSetPushPtr(val);
} else {
code = 0;
val = w;
......@@ -1501,7 +1552,7 @@ stg_takeMVarzh ( P_ mvar /* :: MVar a */ )
*/
if (StgMVar_value(mvar) == stg_END_TSO_QUEUE_closure) {
if (info == stg_MVAR_CLEAN_info) {
ccall dirty_MVAR(BaseReg "ptr", mvar "ptr");
ccall dirty_MVAR(BaseReg "ptr", mvar "ptr", StgMVar_value(mvar) "ptr");
}
// We want to put the heap check down here in the slow path,
......@@ -1547,6 +1598,9 @@ loop:
// If the MVar is not already dirty, then we don't need to make
// it dirty, as it is empty with nothing blocking on it.
unlockClosure(mvar, info);
// However, we do need to ensure that the nonmoving collector
// knows about the reference to the value that we just removed...
updateRemembSetPushPtr(val);
return (val);
}
qinfo = StgHeader_info(q);
......@@ -1560,7 +1614,7 @@ loop:
// There are putMVar(s) waiting... wake up the first thread on the queue
if (info == stg_MVAR_CLEAN_info) {
ccall dirty_MVAR(BaseReg "ptr", mvar "ptr");
ccall dirty_MVAR(BaseReg "ptr", mvar "ptr", val "ptr");
}
tso = StgMVarTSOQueue_tso(q);
......@@ -1629,7 +1683,7 @@ loop:
// There are putMVar(s) waiting... wake up the first thread on the queue
if (info == stg_MVAR_CLEAN_info) {
ccall dirty_MVAR(BaseReg "ptr", mvar "ptr");
ccall dirty_MVAR(BaseReg "ptr", mvar "ptr", val "ptr");
}
tso = StgMVarTSOQueue_tso(q);
......@@ -1667,7 +1721,7 @@ stg_putMVarzh ( P_ mvar, /* :: MVar a */
if (StgMVar_value(mvar) != stg_END_TSO_QUEUE_closure) {
if (info == stg_MVAR_CLEAN_info) {
ccall dirty_MVAR(BaseReg "ptr", mvar "ptr");
ccall dirty_MVAR(BaseReg "ptr", mvar "ptr", StgMVar_value(mvar) "ptr");
}
// We want to put the heap check down here in the slow path,
......@@ -1701,14 +1755,20 @@ stg_putMVarzh ( P_ mvar, /* :: MVar a */
jump stg_block_putmvar(mvar,val);
}