Commit 7f72b540 authored by Ben Gamari's avatar Ben Gamari 🐢

Merge non-moving garbage collector

This introduces a concurrent mark & sweep garbage collector to manage the old
generation. The concurrent nature of this collector typically results in
significantly reduced maximum and mean pause times in applications with large
working sets.

Due to the large and intricate nature of the change I have opted to
preserve the fully-buildable history, including merge commits, which is
described in the "Branch overview" section below.

Collector design
================

The full design of the collector implemented here is described in detail
in a technical note

> B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
> Compiler" (2018)

This document can be requested from @bgamari.
The basic heap structure used in this design is heavily inspired by

> K. Ueno & A. Ohori. "A fully concurrent garbage collector for
> functional programs on multicore processors." /ACM SIGPLAN Notices/
> Vol. 51. No. 9 (presented at ICFP 2016)

This design is intended to allow both marking and sweeping
concurrent to execution of a multi-core mutator. Unlike the Ueno design,
which requires no global synchronization pauses, the collector
introduced here requires a stop-the-world pause at the beginning and end
of the mark phase.

To avoid heap fragmentation, the allocator consists of a number of
fixed-size /sub-allocators/. Each of these sub-allocators allocators into
its own set of /segments/, themselves allocated from the block
allocator. Each segment is broken into a set of fixed-size allocation
blocks (which back allocations) in addition to a bitmap (used to track
the liveness of blocks) and some additional metadata (used also used
to track liveness).

This heap structure enables collection via mark-and-sweep, which can be
performed concurrently via a snapshot-at-the-beginning scheme (although
concurrent collection is not implemented in this patch).

Implementation structure
========================

The majority of the collector is implemented in a handful of files:

 * `rts/Nonmoving.c` is the heart of the beast. It implements the entry-point
   to the nonmoving collector (`nonmoving_collect`), as well as the allocator
   (`nonmoving_allocate`) and a number of utilities for manipulating the heap.

 * `rts/NonmovingMark.c` implements the mark queue functionality, update
   remembered set, and mark loop.

 * `rts/NonmovingSweep.c` implements the sweep loop.

 * `rts/NonmovingScav.c` implements the logic necessary to scavenge the
   nonmoving heap.

Branch overview
===============

```
 * wip/gc/opt-pause:
 |   A variety of small optimisations to further reduce pause times.
 |
 * wip/gc/compact-nfdata:
 |   Introduce support for compact regions into the non-moving
 |\  collector
 | \
 |  \
 | | * wip/gc/segment-header-to-bdescr:
 | | |   Another optimization that we are considering, pushing
 | | |   some segment metadata into the segment descriptor for
 | | |   the sake of locality during mark
 | | |
 | * | wip/gc/shortcutting:
 | | |   Support for indirection shortcutting and the selector optimization
 | | |   in the non-moving heap.
 | | |
 * | | wip/gc/docs:
 | |/    Work on implementation documentation.
 | /
 |/
 * wip/gc/everything:
 |   A roll-up of everything below.
 |\
 | \
 | |\
 | | \
 | | * wip/gc/optimize:
 | | |   A variety of optimizations, primarily to the mark loop.
 | | |   Some of these are microoptimizations but a few are quite
 | | |   significant. In particular, the prefetch patches have
 | | |   produced a nontrivial improvement in mark performance.
 | | |
 | | * wip/gc/aging:
 | | |   Enable support for aging in major collections.
 | | |
 | * | wip/gc/test:
 | | |   Fix up the testsuite to more or less pass.
 | | |
 * | | wip/gc/instrumentation:
 | | |   A variety of runtime instrumentation including statistics
 | | /   support, the nonmoving census, and eventlog support.
 | |/
 | /
 |/
 * wip/gc/nonmoving-concurrent:
 |   The concurrent write barriers.
 |
 * wip/gc/nonmoving-nonconcurrent:
 |   The nonmoving collector without the write barriers necessary
 |   for concurrent collection.
 |
 * wip/gc/preparation:
 |   A merge of the various preparatory patches that aren't directly
 |   implementing the GC.
 |
 |
 * GHC HEAD
 .
 .
 .
```
parents 8abddac8 984745b0
Pipeline #11760 failed with stages
in 435 minutes and 8 seconds
......@@ -631,6 +631,7 @@ emitBlackHoleCode node = do
-- work with profiling.
when eager_blackholing $ do
whenUpdRemSetEnabled dflags $ emitUpdRemSetPushThunk node
emitStore (cmmOffsetW dflags node (fixedHdrSizeW dflags)) currentTSOExpr
-- See Note [Heap memory barriers] in SMP.h.
emitPrimCall [] MO_WriteBarrier []
......
......@@ -42,6 +42,7 @@ import BlockId
import MkGraph
import StgSyn
import Cmm
import Module ( rtsUnitId )
import Type ( Type, tyConAppTyCon )
import TyCon
import CLabel
......@@ -339,14 +340,20 @@ dispatchPrimop dflags = \case
emitAssign (CmmLocal res) (cmmLoadIndexW dflags mutv (fixedHdrSizeW dflags) (gcWord dflags))
WriteMutVarOp -> \[mutv, var] -> OpDest_AllDone $ \res@[] -> do
old_val <- CmmLocal <$> newTemp (cmmExprType dflags var)
emitAssign old_val (cmmLoadIndexW dflags mutv (fixedHdrSizeW dflags) (gcWord dflags))
-- Without this write barrier, other CPUs may see this pointer before
-- the writes for the closure it points to have occurred.
-- Note that this also must come after we read the old value to ensure
-- that the read of old_val comes before another core's write to the
-- MutVar's value.
emitPrimCall res MO_WriteBarrier []
emitStore (cmmOffsetW dflags mutv (fixedHdrSizeW dflags)) var
emitCCall
[{-no results-}]
(CmmLit (CmmLabel mkDirty_MUT_VAR_Label))
[(baseExpr, AddrHint), (mutv,AddrHint)]
[(baseExpr, AddrHint), (mutv, AddrHint), (CmmReg old_val, AddrHint)]
-- #define sizzeofByteArrayzh(r,a) \
-- r = ((StgArrBytes *)(a))->bytes
......@@ -1983,17 +1990,21 @@ doWritePtrArrayOp :: CmmExpr
doWritePtrArrayOp addr idx val
= do dflags <- getDynFlags
let ty = cmmExprType dflags val
hdr_size = arrPtrsHdrSize dflags
-- Update remembered set for non-moving collector
whenUpdRemSetEnabled dflags
$ emitUpdRemSetPush (cmmLoadIndexOffExpr dflags hdr_size ty addr ty idx)
-- This write barrier is to ensure that the heap writes to the object
-- referred to by val have happened before we write val into the array.
-- See #12469 for details.
emitPrimCall [] MO_WriteBarrier []
mkBasicIndexedWrite (arrPtrsHdrSize dflags) Nothing addr ty idx val
mkBasicIndexedWrite hdr_size Nothing addr ty idx val
emit (setInfo addr (CmmLit (CmmLabel mkMAP_DIRTY_infoLabel)))
-- the write barrier. We must write a byte into the mark table:
-- bits8[a + header_size + StgMutArrPtrs_size(a) + x >> N]
-- the write barrier. We must write a byte into the mark table:
-- bits8[a + header_size + StgMutArrPtrs_size(a) + x >> N]
emit $ mkStore (
cmmOffsetExpr dflags
(cmmOffsetExprW dflags (cmmOffsetB dflags addr (arrPtrsHdrSize dflags))
(cmmOffsetExprW dflags (cmmOffsetB dflags addr hdr_size)
(loadArrPtrsSize dflags addr))
(CmmMachOp (mo_wordUShr dflags) [idx,
mkIntExpr dflags (mUT_ARR_PTRS_CARD_BITS dflags)])
......@@ -2584,6 +2595,9 @@ emitCopyArray copy src0 src_off dst0 dst_off0 n =
dst <- assignTempE dst0
dst_off <- assignTempE dst_off0
-- Nonmoving collector write barrier
emitCopyUpdRemSetPush dflags (arrPtrsHdrSizeW dflags) dst dst_off n
-- Set the dirty bit in the header.
emit (setInfo dst (CmmLit (CmmLabel mkMAP_DIRTY_infoLabel)))
......@@ -2646,6 +2660,9 @@ emitCopySmallArray copy src0 src_off dst0 dst_off n =
src <- assignTempE src0
dst <- assignTempE dst0
-- Nonmoving collector write barrier
emitCopyUpdRemSetPush dflags (smallArrPtrsHdrSizeW dflags) dst dst_off n
-- Set the dirty bit in the header.
emit (setInfo dst (CmmLit (CmmLabel mkSMAP_DIRTY_infoLabel)))
......@@ -2774,6 +2791,12 @@ doWriteSmallPtrArrayOp :: CmmExpr
doWriteSmallPtrArrayOp addr idx val = do
dflags <- getDynFlags
let ty = cmmExprType dflags val
-- Update remembered set for non-moving collector
tmp <- newTemp ty
mkBasicIndexedRead (smallArrPtrsHdrSize dflags) Nothing ty tmp addr ty idx
whenUpdRemSetEnabled dflags $ emitUpdRemSetPush (CmmReg (CmmLocal tmp))
emitPrimCall [] MO_WriteBarrier [] -- #12469
mkBasicIndexedWrite (smallArrPtrsHdrSize dflags) Nothing addr ty idx val
emit (setInfo addr (CmmLit (CmmLabel mkSMAP_DIRTY_infoLabel)))
......@@ -2953,3 +2976,31 @@ emitCtzCall res x width = do
[ res ]
(MO_Ctz width)
[ x ]
---------------------------------------------------------------------------
-- Pushing to the update remembered set
---------------------------------------------------------------------------
-- | Push a range of pointer-array elements that are about to be copied over to
-- the update remembered set.
emitCopyUpdRemSetPush :: DynFlags
-> WordOff -- ^ array header size
-> CmmExpr -- ^ destination array
-> CmmExpr -- ^ offset in destination array (in words)
-> Int -- ^ number of elements to copy
-> FCode ()
emitCopyUpdRemSetPush _dflags _hdr_size _dst _dst_off 0 = return ()
emitCopyUpdRemSetPush dflags hdr_size dst dst_off n =
whenUpdRemSetEnabled dflags $ do
updfr_off <- getUpdFrameOff
graph <- mkCall lbl (NativeNodeCall,NativeReturn) [] args updfr_off []
emit graph
where
lbl = mkLblExpr $ mkPrimCallLabel
$ PrimCall (fsLit "stg_copyArray_barrier") rtsUnitId
args =
[ mkIntExpr dflags hdr_size
, dst
, dst_off
, mkIntExpr dflags n
]
......@@ -39,6 +39,11 @@ module GHC.StgToCmm.Utils (
mkWordCLit,
newStringCLit, newByteStringCLit,
blankWord,
-- * Update remembered set operations
whenUpdRemSetEnabled,
emitUpdRemSetPush,
emitUpdRemSetPushThunk,
) where
#include "HsVersions.h"
......@@ -576,3 +581,40 @@ assignTemp' e
let reg = CmmLocal lreg
emitAssign reg e
return (CmmReg reg)
---------------------------------------------------------------------------
-- Pushing to the update remembered set
---------------------------------------------------------------------------
whenUpdRemSetEnabled :: DynFlags -> FCode a -> FCode ()
whenUpdRemSetEnabled dflags code = do
do_it <- getCode code
the_if <- mkCmmIfThenElse' is_enabled do_it mkNop (Just False)
emit the_if
where
enabled = CmmLoad (CmmLit $ CmmLabel mkNonmovingWriteBarrierEnabledLabel) (bWord dflags)
zero = zeroExpr dflags
is_enabled = cmmNeWord dflags enabled zero
-- | Emit code to add an entry to a now-overwritten pointer to the update
-- remembered set.
emitUpdRemSetPush :: CmmExpr -- ^ value of pointer which was overwritten
-> FCode ()
emitUpdRemSetPush ptr = do
emitRtsCall
rtsUnitId
(fsLit "updateRemembSetPushClosure_")
[(CmmReg (CmmGlobal BaseReg), AddrHint),
(ptr, AddrHint)]
False
emitUpdRemSetPushThunk :: CmmExpr -- ^ the thunk
-> FCode ()
emitUpdRemSetPushThunk ptr = do
emitRtsCall
rtsUnitId
(fsLit "updateRemembSetPushThunk_")
[(CmmReg (CmmGlobal BaseReg), AddrHint),
(ptr, AddrHint)]
False
......@@ -40,6 +40,7 @@ module CLabel (
mkAsmTempDieLabel,
mkDirty_MUT_VAR_Label,
mkNonmovingWriteBarrierEnabledLabel,
mkUpdInfoLabel,
mkBHUpdInfoLabel,
mkIndStaticInfoLabel,
......@@ -484,7 +485,9 @@ mkBlockInfoTableLabel name c = IdLabel name c BlockInfoTable
-- See Note [Proc-point local block entry-point].
-- Constructing Cmm Labels
mkDirty_MUT_VAR_Label, mkUpdInfoLabel,
mkDirty_MUT_VAR_Label,
mkNonmovingWriteBarrierEnabledLabel,
mkUpdInfoLabel,
mkBHUpdInfoLabel, mkIndStaticInfoLabel, mkMainCapabilityLabel,
mkMAP_FROZEN_CLEAN_infoLabel, mkMAP_FROZEN_DIRTY_infoLabel,
mkMAP_DIRTY_infoLabel,
......@@ -494,6 +497,8 @@ mkDirty_MUT_VAR_Label, mkUpdInfoLabel,
mkSMAP_FROZEN_CLEAN_infoLabel, mkSMAP_FROZEN_DIRTY_infoLabel,
mkSMAP_DIRTY_infoLabel, mkBadAlignmentLabel :: CLabel
mkDirty_MUT_VAR_Label = mkForeignLabel (fsLit "dirty_MUT_VAR") Nothing ForeignLabelInExternalPackage IsFunction
mkNonmovingWriteBarrierEnabledLabel
= CmmLabel rtsUnitId (fsLit "nonmoving_write_barrier_enabled") CmmData
mkUpdInfoLabel = CmmLabel rtsUnitId (fsLit "stg_upd_frame") CmmInfo
mkBHUpdInfoLabel = CmmLabel rtsUnitId (fsLit "stg_bh_upd_frame" ) CmmInfo
mkIndStaticInfoLabel = CmmLabel rtsUnitId (fsLit "stg_IND_STATIC") CmmInfo
......
......@@ -313,6 +313,24 @@ collection. Hopefully, you won't need any of these in normal operation,
but there are several things that can be tweaked for maximum
performance.
.. rts-flag:: -xn
:default: off
:since: 8.8.1
  • @bgamari Maybe I am being silly as I am not familiar with how the release process works and if this document is mangled later on by the build pipeline, but shouldn’t this be 8.10.1? :)

  • Oh dear, thank you!

Please register or sign in to reply
.. index::
single: concurrent mark and sweep
Enable the concurrent mark-and-sweep garbage collector for old generation
collectors. Typically GHC uses a stop-the-world copying garbage collector
for all generations. This can cause long pauses in execution during major
garbage collections. :rts-flag:`-xn` enables the use of a concurrent
mark-and-sweep garbage collector for oldest generation collections.
Under this collection strategy oldest-generation garbage collection
can proceed concurrently with mutation.
Note that :rts-flag:`-xn` cannot be used with ``-G1`` nor :rts-flag:`-c`.
.. rts-flag:: -A ⟨size⟩
:default: 1MB
......
......@@ -842,6 +842,10 @@
__gen = TO_W_(bdescr_gen_no(__bd)); \
if (__gen > 0) { recordMutableCap(__p, __gen); }
/* -----------------------------------------------------------------------------
Update remembered set write barrier
-------------------------------------------------------------------------- */
/* -----------------------------------------------------------------------------
Arrays
-------------------------------------------------------------------------- */
......@@ -944,3 +948,25 @@
prim %memcpy(dst_p, src_p, n * SIZEOF_W, SIZEOF_W); \
\
return (dst);
//
// Nonmoving write barrier helpers
//
// See Note [Update remembered set] in NonMovingMark.c.
#if defined(THREADED_RTS)
#define IF_NONMOVING_WRITE_BARRIER_ENABLED \
if (W_[nonmoving_write_barrier_enabled] != 0) (likely: False)
#else
// A similar measure is also taken in rts/NonMoving.h, but that isn't visible from C--
#define IF_NONMOVING_WRITE_BARRIER_ENABLED \
if (0)
#define nonmoving_write_barrier_enabled 0
#endif
// A useful helper for pushing a pointer to the update remembered set.
#define updateRemembSetPushPtr(p) \
IF_NONMOVING_WRITE_BARRIER_ENABLED { \
ccall updateRemembSetPushClosure_(BaseReg "ptr", p "ptr"); \
}
......@@ -80,6 +80,10 @@ extern "C" {
#define RTS_UNREACHABLE abort()
#endif
/* Prefetch primitives */
#define prefetchForRead(ptr) __builtin_prefetch(ptr, 0)
#define prefetchForWrite(ptr) __builtin_prefetch(ptr, 1)
/* Fix for mingw stat problem (done here so it's early enough) */
#if defined(mingw32_HOST_OS)
#define __MSVCRT__ 1
......@@ -203,6 +207,7 @@ void _assertFail(const char *filename, unsigned int linenum)
#include "rts/storage/ClosureMacros.h"
#include "rts/storage/MBlock.h"
#include "rts/storage/GC.h"
#include "rts/NonMoving.h"
/* Other RTS external APIs */
#include "rts/Parallel.h"
......@@ -287,26 +292,27 @@ TICK_VAR(2)
#define IF_RTSFLAGS(c,s) if (RtsFlags.c) { s; } doNothing()
#if defined(DEBUG)
/* See Note [RtsFlags is a pointer in STG code] */
#if IN_STG_CODE
#define IF_DEBUG(c,s) if (RtsFlags[0].DebugFlags.c) { s; } doNothing()
#else
#define IF_DEBUG(c,s) if (RtsFlags.DebugFlags.c) { s; } doNothing()
#endif
#endif /* IN_STG_CODE */
#else
#define IF_DEBUG(c,s) doNothing()
#endif
#endif /* DEBUG */
#if defined(DEBUG)
#define DEBUG_ONLY(s) s
#else
#define DEBUG_ONLY(s) doNothing()
#endif
#endif /* DEBUG */
#if defined(DEBUG)
#define DEBUG_IS_ON 1
#else
#define DEBUG_IS_ON 0
#endif
#endif /* DEBUG */
/* -----------------------------------------------------------------------------
Useful macros and inline functions
......
......@@ -597,3 +597,4 @@ typedef union {
c; \
})
#endif
......@@ -185,12 +185,21 @@
#define EVENT_USER_BINARY_MSG 181
#define EVENT_CONC_MARK_BEGIN 200
#define EVENT_CONC_MARK_END 201
#define EVENT_CONC_SYNC_BEGIN 202
#define EVENT_CONC_SYNC_END 203
#define EVENT_CONC_SWEEP_BEGIN 204
#define EVENT_CONC_SWEEP_END 205
#define EVENT_CONC_UPD_REM_SET_FLUSH 206
#define EVENT_NONMOVING_HEAP_CENSUS 207
/*
* The highest event code +1 that ghc itself emits. Note that some event
* ranges higher than this are reserved but not currently emitted by ghc.
* This must match the size of the EventDesc[] array in EventLog.c
*/
#define NUM_GHC_EVENT_TAGS 182
#define NUM_GHC_EVENT_TAGS 208
#if 0 /* DEPRECATED EVENTS: */
/* we don't actually need to record the thread, it's implicit */
......
......@@ -52,6 +52,9 @@ typedef struct _GC_FLAGS {
double oldGenFactor;
double pcFreeHeap;
bool useNonmoving; // default = false
bool nonmovingSelectorOpt; // Do selector optimization in the
// non-moving heap, default = false
uint32_t generations;
bool squeezeUpdFrames;
......@@ -95,6 +98,7 @@ typedef struct _DEBUG_FLAGS {
bool weak; /* 'w' */
bool gccafs; /* 'G' */
bool gc; /* 'g' */
bool nonmoving_gc; /* 'n' */
bool block_alloc; /* 'b' */
bool sanity; /* 'S' warning: might be expensive! */
bool zero_on_gc; /* 'Z' */
......@@ -168,6 +172,7 @@ typedef struct _TRACE_FLAGS {
bool timestamp; /* show timestamp in stderr output */
bool scheduler; /* trace scheduler events */
bool gc; /* trace GC events */
bool nonmoving_gc; /* trace nonmoving GC events */
bool sparks_sampled; /* trace spark events by a sampled method */
bool sparks_full; /* trace spark events 100% accurately */
bool user; /* trace user events (emitted from Haskell code) */
......@@ -268,7 +273,11 @@ typedef struct _RTS_FLAGS {
#if defined(COMPILING_RTS_MAIN)
extern DLLIMPORT RTS_FLAGS RtsFlags;
#elif IN_STG_CODE
/* Hack because the C code generator can't generate '&label'. */
/* Note [RtsFlags is a pointer in STG code]
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* When compiling with IN_STG_CODE the RtsFlags symbol is defined as a pointer.
* This is necessary because the C code generator can't generate '&label'.
*/
extern RTS_FLAGS RtsFlags[];
#else
extern RTS_FLAGS RtsFlags;
......
/* -----------------------------------------------------------------------------
*
* (c) The GHC Team, 2018-2019
*
* Non-moving garbage collector
*
* Do not #include this file directly: #include "Rts.h" instead.
*
* To understand the structure of the RTS headers, see the wiki:
* http://ghc.haskell.org/trac/ghc/wiki/Commentary/SourceTree/Includes
*
* -------------------------------------------------------------------------- */
#pragma once
// Forward declaration for Stg.h
struct StgClosure_;
struct StgThunk_;
struct Capability_;
/* This is called by the code generator */
extern DLL_IMPORT_RTS
void updateRemembSetPushClosure_(StgRegTable *reg, struct StgClosure_ *p);
extern DLL_IMPORT_RTS
void updateRemembSetPushThunk_(StgRegTable *reg, struct StgThunk_ *p);
// Forward declaration for unregisterised backend.
EF_(stg_copyArray_barrier);
// Note that RTS code should not condition on this directly by rather
// use the IF_NONMOVING_WRITE_BARRIER_ENABLED macro to ensure that
// the barrier is eliminated in the non-threaded RTS.
extern StgWord DLL_IMPORT_DATA_VAR(nonmoving_write_barrier_enabled);
// A similar macro is defined in includes/Cmm.h for C-- code.
#if defined(THREADED_RTS)
#define IF_NONMOVING_WRITE_BARRIER_ENABLED \
if (RTS_UNLIKELY(nonmoving_write_barrier_enabled))
#else
#define IF_NONMOVING_WRITE_BARRIER_ENABLED \
if (0)
#endif
......@@ -88,15 +88,23 @@ typedef struct bdescr_ {
StgPtr start; // [READ ONLY] start addr of memory
StgPtr free; // First free byte of memory.
// allocGroup() sets this to the value of start.
// NB. during use this value should lie
// between start and start + blocks *
// BLOCK_SIZE. Values outside this
// range are reserved for use by the
// block allocator. In particular, the
// value (StgPtr)(-1) is used to
// indicate that a block is unallocated.
union {
StgPtr free; // First free byte of memory.
// allocGroup() sets this to the value of start.
// NB. during use this value should lie
// between start and start + blocks *
// BLOCK_SIZE. Values outside this
// range are reserved for use by the
// block allocator. In particular, the
// value (StgPtr)(-1) is used to
// indicate that a block is unallocated.
//
// Unused by the non-moving allocator.
struct NonmovingSegmentInfo {
StgWord8 log_block_size;
StgWord16 next_free_snap;
} nonmoving_segment;
};
struct bdescr_ *link; // used for chaining blocks together
......@@ -141,7 +149,8 @@ typedef struct bdescr_ {
#define BF_LARGE 2
/* Block is pinned */
#define BF_PINNED 4
/* Block is to be marked, not copied */
/* Block is to be marked, not copied. Also used for marked large objects in
* non-moving heap. */
#define BF_MARKED 8
/* Block is executable */
#define BF_EXEC 32
......@@ -153,6 +162,12 @@ typedef struct bdescr_ {
#define BF_SWEPT 256
/* Block is part of a Compact */
#define BF_COMPACT 512
/* A non-moving allocator segment (see NonMoving.c) */
#define BF_NONMOVING 1024
/* A large object which has been moved to off of oldest_gen->large_objects and
* onto nonmoving_large_objects. The mark phase ignores objects which aren't
* so-flagged */
#define BF_NONMOVING_SWEEPING 2048
/* Maximum flag value (do not define anything higher than this!) */
#define BF_FLAG_MAX (1 << 15)
......@@ -290,6 +305,13 @@ EXTERN_INLINE bdescr* allocBlock(void)
bdescr *allocGroupOnNode(uint32_t node, W_ n);
// Allocate n blocks, aligned at n-block boundary. The returned bdescr will
// have this invariant
//
// bdescr->start % BLOCK_SIZE*n == 0
//
bdescr *allocAlignedGroupOnNode(uint32_t node, W_ n);
EXTERN_INLINE bdescr* allocBlockOnNode(uint32_t node);
EXTERN_INLINE bdescr* allocBlockOnNode(uint32_t node)
{
......
......@@ -107,6 +107,20 @@ INLINE_HEADER const StgConInfoTable *get_con_itbl(const StgClosure *c)
return CON_INFO_PTR_TO_STRUCT((c)->header.info);
}
/* Used when we expect another thread to be mutating the info table pointer of
* a closure (e.g. when busy-waiting on a WHITEHOLE).
*/
INLINE_HEADER const StgInfoTable *get_volatile_itbl(StgClosure *c) {
// The volatile here is import to ensure that the compiler does not
// optimise away multiple loads, e.g. in a busy-wait loop. Note that
// we can't use VOLATILE_LOAD here as the casts result in strict aliasing
// rule violations and this header may be compiled outside of the RTS
// (where we use -fno-strict-aliasing).
StgInfoTable * *volatile p = (StgInfoTable * *volatile) &c->header.info;
return INFO_PTR_TO_STRUCT(*p);
}
INLINE_HEADER StgHalfWord GET_TAG(const StgClosure *con)
{
return get_itbl(con)->srt;
......
......@@ -94,7 +94,7 @@ typedef struct StgClosure_ {
struct StgClosure_ *payload[];
} *StgClosurePtr; // StgClosure defined in rts/Types.h
typedef struct {
typedef struct StgThunk_ {
StgThunkHeader header;
struct StgClosure_ *payload[];
} StgThunk;
......
......@@ -234,15 +234,23 @@ void setKeepCAFs (void);
and is put on the mutable list.
-------------------------------------------------------------------------- */
void dirty_MUT_VAR(StgRegTable *reg, StgClosure *p);
void dirty_MUT_VAR(StgRegTable *reg, StgMutVar *mv, StgClosure *old);
/* set to disable CAF garbage collection in GHCi. */
/* (needed when dynamic libraries are used). */
extern bool keepCAFs;
#include "rts/Flags.h"
INLINE_HEADER void initBdescr(bdescr *bd, generation *gen, generation *dest)
{
bd->gen = gen;
bd->gen_no = gen->no;
bd->dest_no = dest->no;
#if !IN_STG_CODE
/* See Note [RtsFlags is a pointer in STG code] */
ASSERT(gen->no < RtsFlags.GcFlags.generations);
ASSERT(dest->no < RtsFlags.GcFlags.generations);
#endif
}
......@@ -355,7 +355,7 @@ typedef struct StgConInfoTable_ {
*/
#if defined(TABLES_NEXT_TO_CODE)
#define GET_CON_DESC(info) \
((const char *)((StgWord)((info)+1) + (info->con_desc)))
((const char *)((StgWord)((info)+1) + ((info)->con_desc)))
#else
#define GET_CON_DESC(info) ((const char *)(info)->con_desc)
#endif
......
......@@ -185,10 +185,66 @@ typedef struct StgTSO_ {
} *StgTSOPtr; // StgTSO defined in rts/Types.h
/* Note [StgStack dirtiness flags and concurrent marking]
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*
* Without concurrent collection by the nonmoving collector the stack dirtiness story
* is quite simple: The stack is either STACK_DIRTY (meaning it has been added to mut_list)
* or not.
*
* However, things are considerably more complicated with concurrent collection
* (namely, when nonmoving_write_barrier_enabled is set): In addition to adding
* the stack to mut_list and flagging it as STACK_DIRTY, we also must ensure
* that stacks are marked in accordance with the nonmoving collector's snapshot
* invariant. This is: every stack alive at the time the snapshot is taken must
* be marked at some point after the moment the snapshot is taken and before it
* is mutated or the commencement of the sweep phase.
*
* This marking may be done by the concurrent mark phase (in the case of a
* thread that never runs during the concurrent mark) or by the mutator when
* dirtying the stack. However, it is unsafe for the concurrent collector to
* traverse the stack while it is under mutation. Consequently, the following
* handshake is obeyed by the mutator's write barrier and the concurrent mark to
* ensure this doesn't happen:
*
* 1. The entity seeking to mark first checks that the stack lives in the nonmoving
* generation; if not then the stack was not alive at the time the snapshot
* was taken and therefore we need not mark it.
*
* 2. The entity seeking to mark checks the stack's mark bit. If it is set then
* no mark is necessary.
*
* 3. The entity seeking to mark tries to lock the stack for marking by
* atomically setting its `marking` field to the current non-moving mark
* epoch:
*
* a. If the mutator finds the concurrent collector has already locked the
* stack then it waits until it is finished (indicated by the mark bit
* being set) before proceeding with execution.
*
* b. If the concurrent collector finds that the mutator has locked the stack
* then it moves on, leaving the mutator to mark it. There is no need to wait;
* the mark is guaranteed to finish before sweep due to the post-mark
* synchronization with mutators.
*
* c. Whoever succeeds in locking the stack is responsible for marking it and
* setting the stack's mark bit (either the BF_MARKED bit for large objects
* or otherwise its bit in its segment's mark bitmap).
*
* To ensure that mutation does not proceed until the stack is fully marked the
* mark phase must not set the mark bit until it has finished tracing.
*
*/
#define STACK_DIRTY 1
// used by sanity checker to verify that all dirty stacks are on the mutable list
#define STACK_SANE 64
typedef struct StgStack_ {
StgHeader header;
StgWord32 stack_size; // stack size in *words*
StgWord32 dirty; // non-zero => dirty
StgWord8 dirty; // non-zero => dirty
StgWord8 marking; // non-zero => someone is currently marking the stack
StgPtr sp; // current stack pointer
StgWord stack[];
} StgStack;
......
......@@ -543,4 +543,11 @@ void * pushCostCentre (void *ccs, void *cc);
// Capability.c
extern unsigned int n_capabilities;
/* -----------------------------------------------------------------------------
Nonmoving GC write barrier
-------------------------------------------------------------------------- */
#include <rts/NonMoving.h>
#endif
......@@ -49,6 +49,7 @@ EXTERN_INLINE StgWord xchg(StgPtr p, StgWord w);
* }
*/
EXTERN_INLINE StgWord cas(StgVolatilePtr p, StgWord o, StgWord n);
EXTERN_INLINE StgWord8 cas_word8(StgWord8 *volatile p, StgWord8 o, StgWord8 n);
/*
* Atomic addition by the provided quantity
......@@ -283,6 +284,12 @@ cas(StgVolatilePtr p, StgWord o, StgWord n)
return __sync_val_compare_and_swap(p, o, n);
}
EXTERN_INLINE StgWord8
cas_word8(StgWord8 *volatile p, StgWord8 o, StgWord8 n)
{
return __sync_val_compare_and_swap(p, o, n);
}
// RRN: Generalized to arbitrary increments to enable fetch-and-add in
// Haskell code (fetchAddIntArray#).
// PT: add-and-fetch, returns new value
......@@ -434,6 +441,18 @@ cas(StgVolatilePtr p, StgWord o, StgWord n)
return result;
}
EXTERN_INLINE StgWord8 cas_word8(StgWord8 *volatile p, StgWord8 o, StgWord8 n);
EXTERN_INLINE StgWord8
cas_word8(StgWord8 *volatile p, StgWord8 o, StgWord8 n)
{
StgWord8 result;
result = *p;
if (result == o) {
*p = n;
}
return result;
}
EXTERN_INLINE StgWord atomic_inc(StgVolatilePtr p, StgWord incr);
EXTERN_INLINE StgWord
atomic_inc(StgVolatilePtr p, StgWord incr)
......
......@@ -192,3 +192,10 @@ typedef StgWord8* StgByteArray;
typedef void *(*(*StgFunPtr)(void))(void);
typedef StgFunPtr StgFun(void);
// Forward declarations for the unregisterised backend, which
// only depends upon Stg.h and not the entirety of Rts.h, which
// is where these are defined.
struct StgClosure_;
struct StgThunk_;
struct Capability_;
......@@ -150,21 +150,22 @@ data MiscFlags = MiscFlags
--
-- @since 4.8.0.0
data DebugFlags = DebugFlags
{ scheduler :: Bool -- ^ @s@
, interpreter :: Bool -- ^ @i@
, weak :: Bool -- ^ @w@