Commit d7017446 authored by Ben Gamari's avatar Ben Gamari 🐢 Committed by Ben Gamari

rts: Implement concurrent collection in the nonmoving collector

This extends the non-moving collector to allow concurrent collection.

The full design of the collector implemented here is described in detail
in a technical note

    B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
    Compiler" (2018)

This extension involves the introduction of a capability-local
remembered set, known as the /update remembered set/, which tracks
objects which may no longer be visible to the collector due to mutation.
To maintain this remembered set we introduce a write barrier on
mutations which is enabled while a concurrent mark is underway.

The update remembered set representation is similar to that of the
nonmoving mark queue, being a chunked array of `MarkEntry`s. Each
`Capability` maintains a single accumulator chunk, which it flushed
when it (a) is filled, or (b) when the nonmoving collector enters its
post-mark synchronization phase.

While the write barrier touches a significant amount of code it is
conceptually straightforward: the mutator must ensure that the referee
of any pointer it overwrites is added to the update remembered set.
However, there are a few details:

 * In the case of objects with a dirty flag (e.g. `MVar`s) we can
   exploit the fact that only the *first* mutation requires a write
   barrier.

 * Weak references, as usual, complicate things. In particular, we must
   ensure that the referee of a weak object is marked if dereferenced by
   the mutator. For this we (unfortunately) must introduce a read
   barrier, as described in Note [Concurrent read barrier on deRefWeak#]
   (in `NonMovingMark.c`).

 * Stable names are also a bit tricky as described in Note [Sweeping
   stable names in the concurrent collector] (`NonMovingSweep.c`).

We take quite some pains to ensure that the high thread count often seen
in parallel Haskell applications doesn't affect pause times. To this end
we allow thread stacks to be marked either by the thread itself (when it
is executed or stack-underflows) or the concurrent mark thread (if the
thread owning the stack is never scheduled). There is a non-trivial
handshake to ensure that this happens without racing which is described
in Note [StgStack dirtiness flags and concurrent marking].
Co-Authored-by: Ömer Sinan Ağacan's avatarÖmer Sinan Ağacan <omer@well-typed.com>
parent 26ffa5a2
...@@ -631,6 +631,7 @@ emitBlackHoleCode node = do ...@@ -631,6 +631,7 @@ emitBlackHoleCode node = do
-- work with profiling. -- work with profiling.
when eager_blackholing $ do when eager_blackholing $ do
whenUpdRemSetEnabled dflags $ emitUpdRemSetPushThunk node
emitStore (cmmOffsetW dflags node (fixedHdrSizeW dflags)) currentTSOExpr emitStore (cmmOffsetW dflags node (fixedHdrSizeW dflags)) currentTSOExpr
-- See Note [Heap memory barriers] in SMP.h. -- See Note [Heap memory barriers] in SMP.h.
emitPrimCall [] MO_WriteBarrier [] emitPrimCall [] MO_WriteBarrier []
......
...@@ -42,6 +42,7 @@ import BlockId ...@@ -42,6 +42,7 @@ import BlockId
import MkGraph import MkGraph
import StgSyn import StgSyn
import Cmm import Cmm
import Module ( rtsUnitId )
import Type ( Type, tyConAppTyCon ) import Type ( Type, tyConAppTyCon )
import TyCon import TyCon
import CLabel import CLabel
...@@ -339,14 +340,20 @@ dispatchPrimop dflags = \case ...@@ -339,14 +340,20 @@ dispatchPrimop dflags = \case
emitAssign (CmmLocal res) (cmmLoadIndexW dflags mutv (fixedHdrSizeW dflags) (gcWord dflags)) emitAssign (CmmLocal res) (cmmLoadIndexW dflags mutv (fixedHdrSizeW dflags) (gcWord dflags))
WriteMutVarOp -> \[mutv, var] -> OpDest_AllDone $ \res@[] -> do WriteMutVarOp -> \[mutv, var] -> OpDest_AllDone $ \res@[] -> do
old_val <- CmmLocal <$> newTemp (cmmExprType dflags var)
emitAssign old_val (cmmLoadIndexW dflags mutv (fixedHdrSizeW dflags) (gcWord dflags))
-- Without this write barrier, other CPUs may see this pointer before -- Without this write barrier, other CPUs may see this pointer before
-- the writes for the closure it points to have occurred. -- the writes for the closure it points to have occurred.
-- Note that this also must come after we read the old value to ensure
-- that the read of old_val comes before another core's write to the
-- MutVar's value.
emitPrimCall res MO_WriteBarrier [] emitPrimCall res MO_WriteBarrier []
emitStore (cmmOffsetW dflags mutv (fixedHdrSizeW dflags)) var emitStore (cmmOffsetW dflags mutv (fixedHdrSizeW dflags)) var
emitCCall emitCCall
[{-no results-}] [{-no results-}]
(CmmLit (CmmLabel mkDirty_MUT_VAR_Label)) (CmmLit (CmmLabel mkDirty_MUT_VAR_Label))
[(baseExpr, AddrHint), (mutv,AddrHint)] [(baseExpr, AddrHint), (mutv, AddrHint), (CmmReg old_val, AddrHint)]
-- #define sizzeofByteArrayzh(r,a) \ -- #define sizzeofByteArrayzh(r,a) \
-- r = ((StgArrBytes *)(a))->bytes -- r = ((StgArrBytes *)(a))->bytes
...@@ -1983,17 +1990,21 @@ doWritePtrArrayOp :: CmmExpr ...@@ -1983,17 +1990,21 @@ doWritePtrArrayOp :: CmmExpr
doWritePtrArrayOp addr idx val doWritePtrArrayOp addr idx val
= do dflags <- getDynFlags = do dflags <- getDynFlags
let ty = cmmExprType dflags val let ty = cmmExprType dflags val
hdr_size = arrPtrsHdrSize dflags
-- Update remembered set for non-moving collector
whenUpdRemSetEnabled dflags
$ emitUpdRemSetPush (cmmLoadIndexOffExpr dflags hdr_size ty addr ty idx)
-- This write barrier is to ensure that the heap writes to the object -- This write barrier is to ensure that the heap writes to the object
-- referred to by val have happened before we write val into the array. -- referred to by val have happened before we write val into the array.
-- See #12469 for details. -- See #12469 for details.
emitPrimCall [] MO_WriteBarrier [] emitPrimCall [] MO_WriteBarrier []
mkBasicIndexedWrite (arrPtrsHdrSize dflags) Nothing addr ty idx val mkBasicIndexedWrite hdr_size Nothing addr ty idx val
emit (setInfo addr (CmmLit (CmmLabel mkMAP_DIRTY_infoLabel))) emit (setInfo addr (CmmLit (CmmLabel mkMAP_DIRTY_infoLabel)))
-- the write barrier. We must write a byte into the mark table: -- the write barrier. We must write a byte into the mark table:
-- bits8[a + header_size + StgMutArrPtrs_size(a) + x >> N] -- bits8[a + header_size + StgMutArrPtrs_size(a) + x >> N]
emit $ mkStore ( emit $ mkStore (
cmmOffsetExpr dflags cmmOffsetExpr dflags
(cmmOffsetExprW dflags (cmmOffsetB dflags addr (arrPtrsHdrSize dflags)) (cmmOffsetExprW dflags (cmmOffsetB dflags addr hdr_size)
(loadArrPtrsSize dflags addr)) (loadArrPtrsSize dflags addr))
(CmmMachOp (mo_wordUShr dflags) [idx, (CmmMachOp (mo_wordUShr dflags) [idx,
mkIntExpr dflags (mUT_ARR_PTRS_CARD_BITS dflags)]) mkIntExpr dflags (mUT_ARR_PTRS_CARD_BITS dflags)])
...@@ -2584,6 +2595,9 @@ emitCopyArray copy src0 src_off dst0 dst_off0 n = ...@@ -2584,6 +2595,9 @@ emitCopyArray copy src0 src_off dst0 dst_off0 n =
dst <- assignTempE dst0 dst <- assignTempE dst0
dst_off <- assignTempE dst_off0 dst_off <- assignTempE dst_off0
-- Nonmoving collector write barrier
emitCopyUpdRemSetPush dflags (arrPtrsHdrSizeW dflags) dst dst_off n
-- Set the dirty bit in the header. -- Set the dirty bit in the header.
emit (setInfo dst (CmmLit (CmmLabel mkMAP_DIRTY_infoLabel))) emit (setInfo dst (CmmLit (CmmLabel mkMAP_DIRTY_infoLabel)))
...@@ -2646,6 +2660,9 @@ emitCopySmallArray copy src0 src_off dst0 dst_off n = ...@@ -2646,6 +2660,9 @@ emitCopySmallArray copy src0 src_off dst0 dst_off n =
src <- assignTempE src0 src <- assignTempE src0
dst <- assignTempE dst0 dst <- assignTempE dst0
-- Nonmoving collector write barrier
emitCopyUpdRemSetPush dflags (smallArrPtrsHdrSizeW dflags) dst dst_off n
-- Set the dirty bit in the header. -- Set the dirty bit in the header.
emit (setInfo dst (CmmLit (CmmLabel mkSMAP_DIRTY_infoLabel))) emit (setInfo dst (CmmLit (CmmLabel mkSMAP_DIRTY_infoLabel)))
...@@ -2774,6 +2791,12 @@ doWriteSmallPtrArrayOp :: CmmExpr ...@@ -2774,6 +2791,12 @@ doWriteSmallPtrArrayOp :: CmmExpr
doWriteSmallPtrArrayOp addr idx val = do doWriteSmallPtrArrayOp addr idx val = do
dflags <- getDynFlags dflags <- getDynFlags
let ty = cmmExprType dflags val let ty = cmmExprType dflags val
-- Update remembered set for non-moving collector
tmp <- newTemp ty
mkBasicIndexedRead (smallArrPtrsHdrSize dflags) Nothing ty tmp addr ty idx
whenUpdRemSetEnabled dflags $ emitUpdRemSetPush (CmmReg (CmmLocal tmp))
emitPrimCall [] MO_WriteBarrier [] -- #12469 emitPrimCall [] MO_WriteBarrier [] -- #12469
mkBasicIndexedWrite (smallArrPtrsHdrSize dflags) Nothing addr ty idx val mkBasicIndexedWrite (smallArrPtrsHdrSize dflags) Nothing addr ty idx val
emit (setInfo addr (CmmLit (CmmLabel mkSMAP_DIRTY_infoLabel))) emit (setInfo addr (CmmLit (CmmLabel mkSMAP_DIRTY_infoLabel)))
...@@ -2953,3 +2976,31 @@ emitCtzCall res x width = do ...@@ -2953,3 +2976,31 @@ emitCtzCall res x width = do
[ res ] [ res ]
(MO_Ctz width) (MO_Ctz width)
[ x ] [ x ]
---------------------------------------------------------------------------
-- Pushing to the update remembered set
---------------------------------------------------------------------------
-- | Push a range of pointer-array elements that are about to be copied over to
-- the update remembered set.
emitCopyUpdRemSetPush :: DynFlags
-> WordOff -- ^ array header size
-> CmmExpr -- ^ destination array
-> CmmExpr -- ^ offset in destination array (in words)
-> Int -- ^ number of elements to copy
-> FCode ()
emitCopyUpdRemSetPush _dflags _hdr_size _dst _dst_off 0 = return ()
emitCopyUpdRemSetPush dflags hdr_size dst dst_off n =
whenUpdRemSetEnabled dflags $ do
updfr_off <- getUpdFrameOff
graph <- mkCall lbl (NativeNodeCall,NativeReturn) [] args updfr_off []
emit graph
where
lbl = mkLblExpr $ mkPrimCallLabel
$ PrimCall (fsLit "stg_copyArray_barrier") rtsUnitId
args =
[ mkIntExpr dflags hdr_size
, dst
, dst_off
, mkIntExpr dflags n
]
...@@ -39,6 +39,11 @@ module GHC.StgToCmm.Utils ( ...@@ -39,6 +39,11 @@ module GHC.StgToCmm.Utils (
mkWordCLit, mkWordCLit,
newStringCLit, newByteStringCLit, newStringCLit, newByteStringCLit,
blankWord, blankWord,
-- * Update remembered set operations
whenUpdRemSetEnabled,
emitUpdRemSetPush,
emitUpdRemSetPushThunk,
) where ) where
#include "HsVersions.h" #include "HsVersions.h"
...@@ -576,3 +581,40 @@ assignTemp' e ...@@ -576,3 +581,40 @@ assignTemp' e
let reg = CmmLocal lreg let reg = CmmLocal lreg
emitAssign reg e emitAssign reg e
return (CmmReg reg) return (CmmReg reg)
---------------------------------------------------------------------------
-- Pushing to the update remembered set
---------------------------------------------------------------------------
whenUpdRemSetEnabled :: DynFlags -> FCode a -> FCode ()
whenUpdRemSetEnabled dflags code = do
do_it <- getCode code
the_if <- mkCmmIfThenElse' is_enabled do_it mkNop (Just False)
emit the_if
where
enabled = CmmLoad (CmmLit $ CmmLabel mkNonmovingWriteBarrierEnabledLabel) (bWord dflags)
zero = zeroExpr dflags
is_enabled = cmmNeWord dflags enabled zero
-- | Emit code to add an entry to a now-overwritten pointer to the update
-- remembered set.
emitUpdRemSetPush :: CmmExpr -- ^ value of pointer which was overwritten
-> FCode ()
emitUpdRemSetPush ptr = do
emitRtsCall
rtsUnitId
(fsLit "updateRemembSetPushClosure_")
[(CmmReg (CmmGlobal BaseReg), AddrHint),
(ptr, AddrHint)]
False
emitUpdRemSetPushThunk :: CmmExpr -- ^ the thunk
-> FCode ()
emitUpdRemSetPushThunk ptr = do
emitRtsCall
rtsUnitId
(fsLit "updateRemembSetPushThunk_")
[(CmmReg (CmmGlobal BaseReg), AddrHint),
(ptr, AddrHint)]
False
...@@ -40,6 +40,7 @@ module CLabel ( ...@@ -40,6 +40,7 @@ module CLabel (
mkAsmTempDieLabel, mkAsmTempDieLabel,
mkDirty_MUT_VAR_Label, mkDirty_MUT_VAR_Label,
mkNonmovingWriteBarrierEnabledLabel,
mkUpdInfoLabel, mkUpdInfoLabel,
mkBHUpdInfoLabel, mkBHUpdInfoLabel,
mkIndStaticInfoLabel, mkIndStaticInfoLabel,
...@@ -484,7 +485,9 @@ mkBlockInfoTableLabel name c = IdLabel name c BlockInfoTable ...@@ -484,7 +485,9 @@ mkBlockInfoTableLabel name c = IdLabel name c BlockInfoTable
-- See Note [Proc-point local block entry-point]. -- See Note [Proc-point local block entry-point].
-- Constructing Cmm Labels -- Constructing Cmm Labels
mkDirty_MUT_VAR_Label, mkUpdInfoLabel, mkDirty_MUT_VAR_Label,
mkNonmovingWriteBarrierEnabledLabel,
mkUpdInfoLabel,
mkBHUpdInfoLabel, mkIndStaticInfoLabel, mkMainCapabilityLabel, mkBHUpdInfoLabel, mkIndStaticInfoLabel, mkMainCapabilityLabel,
mkMAP_FROZEN_CLEAN_infoLabel, mkMAP_FROZEN_DIRTY_infoLabel, mkMAP_FROZEN_CLEAN_infoLabel, mkMAP_FROZEN_DIRTY_infoLabel,
mkMAP_DIRTY_infoLabel, mkMAP_DIRTY_infoLabel,
...@@ -494,6 +497,8 @@ mkDirty_MUT_VAR_Label, mkUpdInfoLabel, ...@@ -494,6 +497,8 @@ mkDirty_MUT_VAR_Label, mkUpdInfoLabel,
mkSMAP_FROZEN_CLEAN_infoLabel, mkSMAP_FROZEN_DIRTY_infoLabel, mkSMAP_FROZEN_CLEAN_infoLabel, mkSMAP_FROZEN_DIRTY_infoLabel,
mkSMAP_DIRTY_infoLabel, mkBadAlignmentLabel :: CLabel mkSMAP_DIRTY_infoLabel, mkBadAlignmentLabel :: CLabel
mkDirty_MUT_VAR_Label = mkForeignLabel (fsLit "dirty_MUT_VAR") Nothing ForeignLabelInExternalPackage IsFunction mkDirty_MUT_VAR_Label = mkForeignLabel (fsLit "dirty_MUT_VAR") Nothing ForeignLabelInExternalPackage IsFunction
mkNonmovingWriteBarrierEnabledLabel
= CmmLabel rtsUnitId (fsLit "nonmoving_write_barrier_enabled") CmmData
mkUpdInfoLabel = CmmLabel rtsUnitId (fsLit "stg_upd_frame") CmmInfo mkUpdInfoLabel = CmmLabel rtsUnitId (fsLit "stg_upd_frame") CmmInfo
mkBHUpdInfoLabel = CmmLabel rtsUnitId (fsLit "stg_bh_upd_frame" ) CmmInfo mkBHUpdInfoLabel = CmmLabel rtsUnitId (fsLit "stg_bh_upd_frame" ) CmmInfo
mkIndStaticInfoLabel = CmmLabel rtsUnitId (fsLit "stg_IND_STATIC") CmmInfo mkIndStaticInfoLabel = CmmLabel rtsUnitId (fsLit "stg_IND_STATIC") CmmInfo
......
...@@ -842,6 +842,10 @@ ...@@ -842,6 +842,10 @@
__gen = TO_W_(bdescr_gen_no(__bd)); \ __gen = TO_W_(bdescr_gen_no(__bd)); \
if (__gen > 0) { recordMutableCap(__p, __gen); } if (__gen > 0) { recordMutableCap(__p, __gen); }
/* -----------------------------------------------------------------------------
Update remembered set write barrier
-------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------------- /* -----------------------------------------------------------------------------
Arrays Arrays
-------------------------------------------------------------------------- */ -------------------------------------------------------------------------- */
...@@ -944,3 +948,21 @@ ...@@ -944,3 +948,21 @@
prim %memcpy(dst_p, src_p, n * SIZEOF_W, SIZEOF_W); \ prim %memcpy(dst_p, src_p, n * SIZEOF_W, SIZEOF_W); \
\ \
return (dst); return (dst);
#if defined(THREADED_RTS)
#define IF_WRITE_BARRIER_ENABLED \
if (W_[nonmoving_write_barrier_enabled] != 0) (likely: False)
#else
// A similar measure is also taken in rts/NonMoving.h, but that isn't visible from C--
#define IF_WRITE_BARRIER_ENABLED \
if (0)
#define nonmoving_write_barrier_enabled 0
#endif
// A useful helper for pushing a pointer to the update remembered set.
// See Note [Update remembered set] in NonMovingMark.c.
#define updateRemembSetPushPtr(p) \
IF_WRITE_BARRIER_ENABLED { \
ccall updateRemembSetPushClosure_(BaseReg "ptr", p "ptr"); \
}
...@@ -197,6 +197,7 @@ void _assertFail(const char *filename, unsigned int linenum) ...@@ -197,6 +197,7 @@ void _assertFail(const char *filename, unsigned int linenum)
#include "rts/storage/ClosureMacros.h" #include "rts/storage/ClosureMacros.h"
#include "rts/storage/MBlock.h" #include "rts/storage/MBlock.h"
#include "rts/storage/GC.h" #include "rts/storage/GC.h"
#include "rts/NonMoving.h"
/* Other RTS external APIs */ /* Other RTS external APIs */
#include "rts/Parallel.h" #include "rts/Parallel.h"
......
/* -----------------------------------------------------------------------------
*
* (c) The GHC Team, 2018-2019
*
* Non-moving garbage collector
*
* Do not #include this file directly: #include "Rts.h" instead.
*
* To understand the structure of the RTS headers, see the wiki:
* http://ghc.haskell.org/trac/ghc/wiki/Commentary/SourceTree/Includes
*
* -------------------------------------------------------------------------- */
#pragma once
/* This is called by the code generator */
extern DLL_IMPORT_RTS
void updateRemembSetPushClosure_(StgRegTable *reg, StgClosure *p);
void updateRemembSetPushClosure(Capability *cap, StgClosure *p);
void updateRemembSetPushThunk_(StgRegTable *reg, StgThunk *p);
extern StgWord DLL_IMPORT_DATA_VAR(nonmoving_write_barrier_enabled);
...@@ -107,6 +107,20 @@ INLINE_HEADER const StgConInfoTable *get_con_itbl(const StgClosure *c) ...@@ -107,6 +107,20 @@ INLINE_HEADER const StgConInfoTable *get_con_itbl(const StgClosure *c)
return CON_INFO_PTR_TO_STRUCT((c)->header.info); return CON_INFO_PTR_TO_STRUCT((c)->header.info);
} }
/* Used when we expect another thread to be mutating the info table pointer of
* a closure (e.g. when busy-waiting on a WHITEHOLE).
*/
INLINE_HEADER const StgInfoTable *get_volatile_itbl(StgClosure *c) {
// The volatile here is import to ensure that the compiler does not
// optimise away multiple loads, e.g. in a busy-wait loop. Note that
// we can't use VOLATILE_LOAD here as the casts result in strict aliasing
// rule violations and this header may be compiled outside of the RTS
// (where we use -fno-strict-aliasing).
StgInfoTable * *volatile p = (StgInfoTable * *volatile) &c->header.info;
return INFO_PTR_TO_STRUCT(*p);
}
INLINE_HEADER StgHalfWord GET_TAG(const StgClosure *con) INLINE_HEADER StgHalfWord GET_TAG(const StgClosure *con)
{ {
return get_itbl(con)->srt; return get_itbl(con)->srt;
......
...@@ -234,7 +234,7 @@ void setKeepCAFs (void); ...@@ -234,7 +234,7 @@ void setKeepCAFs (void);
and is put on the mutable list. and is put on the mutable list.
-------------------------------------------------------------------------- */ -------------------------------------------------------------------------- */
void dirty_MUT_VAR(StgRegTable *reg, StgClosure *p); void dirty_MUT_VAR(StgRegTable *reg, StgMutVar *mv, StgClosure *old);
/* set to disable CAF garbage collection in GHCi. */ /* set to disable CAF garbage collection in GHCi. */
/* (needed when dynamic libraries are used). */ /* (needed when dynamic libraries are used). */
......
...@@ -185,6 +185,53 @@ typedef struct StgTSO_ { ...@@ -185,6 +185,53 @@ typedef struct StgTSO_ {
} *StgTSOPtr; // StgTSO defined in rts/Types.h } *StgTSOPtr; // StgTSO defined in rts/Types.h
/* Note [StgStack dirtiness flags and concurrent marking]
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*
* Without concurrent collection by the nonmoving collector the stack dirtiness story
* is quite simple: The stack is either STACK_DIRTY (meaning it has been added to mut_list)
* or not.
*
* However, things are considerably more complicated with concurrent collection
* (namely, when nonmoving_write_barrier_enabled is set): In addition to adding
* the stack to mut_list and flagging it as STACK_DIRTY, we also must ensure
* that stacks are marked in accordance with the nonmoving collector's snapshot
* invariant. This is: every stack alive at the time the snapshot is taken must
* be marked at some point after the moment the snapshot is taken and before it
* is mutated or the commencement of the sweep phase.
*
* This marking may be done by the concurrent mark phase (in the case of a
* thread that never runs during the concurrent mark) or by the mutator when
* dirtying the stack. However, it is unsafe for the concurrent collector to
* traverse the stack while it is under mutation. Consequently, the following
* handshake is obeyed by the mutator's write barrier and the concurrent mark to
* ensure this doesn't happen:
*
* 1. The entity seeking to mark first checks that the stack lives in the nonmoving
* generation; if not then the stack was not alive at the time the snapshot
* was taken and therefore we need not mark it.
*
* 2. The entity seeking to mark checks the stack's mark bit. If it is set then
* no mark is necessary.
*
* 3. The entity seeking to mark tries to lock the stack for marking by
* atomically setting its `marking` field to the current non-moving mark
* epoch:
*
* a. If the mutator finds the concurrent collector has already locked the
* stack then it waits until it is finished (indicated by the mark bit
* being set) before proceeding with execution.
*
* b. If the concurrent collector finds that the mutator has locked the stack
* then it moves on, leaving the mutator to mark it. There is no need to wait;
* the mark is guaranteed to finish before sweep due to the post-mark
* synchronization with mutators.
*
* c. Whoever succeeds in locking the stack is responsible for marking it and
* setting the stack's mark bit (either the BF_MARKED bit for large objects
* or otherwise its bit in its segment's mark bitmap).
*
*/
#define STACK_DIRTY 1 #define STACK_DIRTY 1
// used by sanity checker to verify that all dirty stacks are on the mutable list // used by sanity checker to verify that all dirty stacks are on the mutable list
...@@ -193,7 +240,8 @@ typedef struct StgTSO_ { ...@@ -193,7 +240,8 @@ typedef struct StgTSO_ {
typedef struct StgStack_ { typedef struct StgStack_ {
StgHeader header; StgHeader header;
StgWord32 stack_size; // stack size in *words* StgWord32 stack_size; // stack size in *words*
StgWord32 dirty; // non-zero => dirty StgWord dirty; // non-zero => dirty
StgWord marking; // non-zero => someone is currently marking the stack
StgPtr sp; // current stack pointer StgPtr sp; // current stack pointer
StgWord stack[]; StgWord stack[];
} StgStack; } StgStack;
......
...@@ -542,5 +542,6 @@ void * pushCostCentre (void *ccs, void *cc); ...@@ -542,5 +542,6 @@ void * pushCostCentre (void *ccs, void *cc);
// Capability.c // Capability.c
extern unsigned int n_capabilities; extern unsigned int n_capabilities;
extern void updateRemembSetPushThunk_(void *reg, void *p1);
#endif #endif
...@@ -654,6 +654,8 @@ INFO_TABLE(stg_AP_STACK,/*special layout*/0,0,AP_STACK,"AP_STACK","AP_STACK") ...@@ -654,6 +654,8 @@ INFO_TABLE(stg_AP_STACK,/*special layout*/0,0,AP_STACK,"AP_STACK","AP_STACK")
/* someone else beat us to it */ /* someone else beat us to it */
jump ENTRY_LBL(stg_WHITEHOLE) (ap); jump ENTRY_LBL(stg_WHITEHOLE) (ap);
} }
// Can't add StgInd_indirectee(ap) to UpdRemSet here because the old value is
// not reachable.
StgInd_indirectee(ap) = CurrentTSO; StgInd_indirectee(ap) = CurrentTSO;
prim_write_barrier; prim_write_barrier;
SET_INFO(ap, __stg_EAGER_BLACKHOLE_info); SET_INFO(ap, __stg_EAGER_BLACKHOLE_info);
......
...@@ -292,6 +292,11 @@ initCapability (Capability *cap, uint32_t i) ...@@ -292,6 +292,11 @@ initCapability (Capability *cap, uint32_t i)
RtsFlags.GcFlags.generations, RtsFlags.GcFlags.generations,
"initCapability"); "initCapability");
// At this point storage manager is not initialized yet, so this will be
// initialized in initStorage().
cap->upd_rem_set.queue.blocks = NULL;
for (g = 0; g < RtsFlags.GcFlags.generations; g++) { for (g = 0; g < RtsFlags.GcFlags.generations; g++) {
cap->mut_lists[g] = NULL; cap->mut_lists[g] = NULL;
} }
...@@ -861,16 +866,27 @@ yieldCapability (Capability** pCap, Task *task, bool gcAllowed) ...@@ -861,16 +866,27 @@ yieldCapability (Capability** pCap, Task *task, bool gcAllowed)
{ {
PendingSync *sync = pending_sync; PendingSync *sync = pending_sync;
if (sync && sync->type == SYNC_GC_PAR) { if (sync) {
if (! sync->idle[cap->no]) { switch (sync->type) {
traceEventGcStart(cap); case SYNC_GC_PAR:
gcWorkerThread(cap); if (! sync->idle[cap->no]) {
traceEventGcEnd(cap); traceEventGcStart(cap);
traceSparkCounters(cap); gcWorkerThread(cap);
// See Note [migrated bound threads 2] traceEventGcEnd(cap);
if (task->cap == cap) { traceSparkCounters(cap);
return true; // See Note [migrated bound threads 2]
if (task->cap == cap) {
return true;
}
} }
break;
case SYNC_FLUSH_UPD_REM_SET:
debugTrace(DEBUG_nonmoving_gc, "Flushing update remembered set blocks...");
break;
default:
break;
} }
} }
} }
......
...@@ -85,6 +85,9 @@ struct Capability_ { ...@@ -85,6 +85,9 @@ struct Capability_ {
bdescr **mut_lists; bdescr **mut_lists;
bdescr **saved_mut_lists; // tmp use during GC bdescr **saved_mut_lists; // tmp use during GC
// The update remembered set for the non-moving collector
UpdRemSet upd_rem_set;
// block for allocating pinned objects into // block for allocating pinned objects into
bdescr *pinned_object_block; bdescr *pinned_object_block;
// full pinned object blocks allocated since the last GC // full pinned object blocks allocated since the last GC
...@@ -257,7 +260,8 @@ extern Capability **capabilities; ...@@ -257,7 +260,8 @@ extern Capability **capabilities;
typedef enum { typedef enum {
SYNC_OTHER, SYNC_OTHER,
SYNC_GC_SEQ, SYNC_GC_SEQ,
SYNC_GC_PAR SYNC_GC_PAR,
SYNC_FLUSH_UPD_REM_SET
} SyncType; } SyncType;
// //
......
...@@ -318,6 +318,7 @@ stg_killThreadzh (P_ target, P_ exception) ...@@ -318,6 +318,7 @@ stg_killThreadzh (P_ target, P_ exception)
return (); return ();
} else { } else {
StgTSO_why_blocked(CurrentTSO) = BlockedOnMsgThrowTo; StgTSO_why_blocked(CurrentTSO) = BlockedOnMsgThrowTo;
updateRemembSetPushPtr(StgTSO_block_info(CurrentTSO));
StgTSO_block_info(CurrentTSO) = msg; StgTSO_block_info(CurrentTSO) = msg;
// we must block, and unlock the message before returning // we must block, and unlock the message before returning
jump stg_block_throwto (target, exception); jump stg_block_throwto (target, exception);
...@@ -489,6 +490,8 @@ retry_pop_stack: ...@@ -489,6 +490,8 @@ retry_pop_stack:
ccall stmAbortTransaction(MyCapability() "ptr", trec "ptr"); ccall stmAbortTransaction(MyCapability() "ptr", trec "ptr");
ccall stmFreeAbortedTRec(MyCapability() "ptr", trec "ptr"); ccall stmFreeAbortedTRec(MyCapability() "ptr", trec "ptr");
// No need to push `trec` to update remembered set; it will be no longer
// reachable after we overwrite StgTSO.trec.
StgTSO_trec(CurrentTSO) = NO_TREC; StgTSO_trec(CurrentTSO) = NO_TREC;
if (r != 0) { if (r != 0) {
// Transaction was valid: continue searching for a catch frame // Transaction was valid: continue searching for a catch frame
...@@ -607,6 +610,8 @@ retry_pop_stack: ...@@ -607,6 +610,8 @@ retry_pop_stack:
outer = StgTRecHeader_enclosing_trec(trec); outer = StgTRecHeader_enclosing_trec(trec);
ccall stmAbortTransaction(MyCapability() "ptr", trec "ptr"); ccall stmAbortTransaction(MyCapability() "ptr", trec "ptr");
ccall stmFreeAbortedTRec(MyCapability() "ptr", trec "ptr"); ccall stmFreeAbortedTRec(MyCapability() "ptr", trec "ptr");
// No need to push `trec` to update remembered set since we just freed
// it; it is no longer reachable.
StgTSO_trec(CurrentTSO) = outer; StgTSO_trec(CurrentTSO) = outer;
Sp = Sp + SIZEOF_StgCatchSTMFrame; Sp = Sp + SIZEOF_StgCatchSTMFrame;
} }
......
...@@ -244,8 +244,8 @@ loop: ...@@ -244,8 +244,8 @@ loop:
// a barrier is necessary to ensure that all writes are visible. // a barrier is necessary to ensure that all writes are visible.
// See Note [Heap memory barriers] in SMP.h. // See Note [Heap memory barriers] in SMP.h.
write_barrier(); write_barrier();
dirty_TSO(cap, owner); // we will modify owner->bq
owner->bq = bq; owner->bq = bq;
dirty_TSO(cap, owner); // we modified owner->bq
// If the owner of the blackhole is currently runnable, then // If the owner of the blackhole is currently runnable, then
// bump it to the front of the run queue. This gives the // bump it to the front of the run queue. This gives the
...@@ -262,6 +262,9 @@ loop: ...@@ -262,6 +262,9 @@ loop:
// point to the BLOCKING_QUEUE from the BLACKHOLE // point to the BLOCKING_QUEUE from the BLACKHOLE
write_barrier(); // make the BQ visible, see Note [Heap memory barriers]. write_barrier(); // make the BQ visible, see Note [Heap memory barriers].
if (RTS_UNLIKELY(nonmoving_write_barrier_enabled)) {
updateRemembSetPushClosure(cap, (StgClosure*)p);
}
((StgInd*)bh)->indirectee = (StgClosure *)bq; ((StgInd*)bh)->indirectee = (StgClosure *)bq;
recordClosureMutated(cap,bh); // bh was mutated recordClosureMutated(cap,bh); // bh was mutated
...@@ -290,6 +293,11 @@ loop: ...@@ -290,6 +293,11 @@ loop:
} }
#endif #endif
if (RTS_UNLIKELY(nonmoving_write_barrier_enabled)) {
// We are about to overwrite bq->queue; make sure its current value
// makes it into the update remembered set
updateRemembSetPushClosure(cap, (StgClosure*)bq->queue);
}
msg->link = bq->queue; msg->link = bq->queue;
bq->queue = msg; bq->queue = msg;
// No barrier is necessary here: we are only exposing the // No barrier is necessary here: we are only exposing the
......
...@@ -349,8 +349,13 @@ stg_casArrayzh ( gcptr arr, W_ ind, gcptr old, gcptr new ) ...@@ -349,8 +349,13 @@ stg_casArrayzh ( gcptr arr, W_ ind, gcptr old, gcptr new )
// Compare and Swap Succeeded: // Compare and Swap Succeeded:
SET_HDR(arr, stg_MUT_ARR_PTRS_DIRTY_info, CCCS); SET_HDR(arr, stg_MUT_ARR_PTRS_DIRTY_info, CCCS);
len = StgMutArrPtrs_ptrs(arr); len = StgMutArrPtrs_ptrs(arr);
// The write barrier. We must write a byte into the mark table: // The write barrier. We must write a byte into the mark table:
I8[arr + SIZEOF_StgMutArrPtrs + WDS(len) + (ind >> MUT_ARR_PTRS_CARD_BITS )] = 1; I8[arr + SIZEOF_StgMutArrPtrs + WDS(len) + (ind >> MUT_ARR_PTRS_CARD_BITS )] = 1;
// Concurrent GC write barrier
updateRemembSetPushPtr(old);
return (0,new); return (0,new);
} }
} }
...@@ -462,16 +467,45 @@ stg_thawSmallArrayzh ( gcptr src, W_ offset, W_ n ) ...@@ -462,16 +467,45 @@ stg_thawSmallArrayzh ( gcptr src, W_ offset, W_ n )
cloneSmallArray(stg_SMALL_MUT_ARR_PTRS_DIRTY_info, src, offset, n) cloneSmallArray(stg_SMALL_MUT_ARR_PTRS_DIRTY_info, src, offset, n)
} }
// Concurrent GC write barrier for pointer array copies
//
// hdr_size in bytes. dst_off in words, n in words.
stg_copyArray_barrier ( W_ hdr_size, gcptr dst, W_ dst_off, W_ n)
{
W_ end, p;
ASSERT(n > 0); // Assumes n==0 is handled by caller
p = dst + hdr_size + WDS(dst_off);
end = p + WDS(n);
again:
IF_WRITE_BARRIER_ENABLED {
ccall updateRemembSetPushClosure_(BaseReg "ptr", W_[p] "ptr");
}
p = p + WDS(1);
if (p < end) {
goto again;
}