Skip to content
Commits on Source (38)
  • Ömer Sinan Ağacan's avatar
    rts: Non-concurrent mark and sweep · b03ec7ea
    Ömer Sinan Ağacan authored and Ben Gamari's avatar Ben Gamari committed
    
    
    This implements the core heap structure and a serial mark/sweep
    collector which can be used to manage the oldest-generation heap.
    This is the first step towards a concurrent mark-and-sweep collector
    aimed at low-latency applications.
    
    The full design of the collector implemented here is described in detail
    in a technical note
    
        B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
        Compiler" (2018)
    
    The basic heap structure used in this design is heavily inspired by
    
        K. Ueno & A. Ohori. "A fully concurrent garbage collector for
        functional programs on multicore processors." /ACM SIGPLAN Notices/
        Vol. 51. No. 9 (presented by ICFP 2016)
    
    This design is intended to allow both marking and sweeping
    concurrent to execution of a multi-core mutator. Unlike the Ueno design,
    which requires no global synchronization pauses, the collector
    introduced here requires a stop-the-world pause at the beginning and end
    of the mark phase.
    
    To avoid heap fragmentation, the allocator consists of a number of
    fixed-size /sub-allocators/. Each of these sub-allocators allocators into
    its own set of /segments/, themselves allocated from the block
    allocator. Each segment is broken into a set of fixed-size allocation
    blocks (which back allocations) in addition to a bitmap (used to track
    the liveness of blocks) and some additional metadata (used also used
    to track liveness).
    
    This heap structure enables collection via mark-and-sweep, which can be
    performed concurrently via a snapshot-at-the-beginning scheme (although
    concurrent collection is not implemented in this patch).
    
    The mark queue is a fairly straightforward chunked-array structure.
    The representation is a bit more verbose than a typical mark queue to
    accomodate a combination of two features:
    
     * a mark FIFO, which improves the locality of marking, reducing one of
       the major overheads seen in mark/sweep allocators (see [1] for
       details)
    
     * the selector optimization and indirection shortcutting, which
       requires that we track where we found each reference to an object
       in case we need to update the reference at a later point (e.g. when
       we find that it is an indirection). See Note [Origin references in
       the nonmoving collector] (in `NonMovingMark.h`) for details.
    
    Beyond this the mark/sweep is fairly run-of-the-mill.
    
    [1] R. Garner, S.M. Blackburn, D. Frampton. "Effective Prefetch for
        Mark-Sweep Garbage Collection." ISMM 2007.
    
    Co-Authored-By: default avatarBen Gamari <ben@well-typed.com>
    b03ec7ea
  • Ben Gamari's avatar
    testsuite: Add nonmoving WAY · 9cd98caa
    Ben Gamari authored and Ben Gamari's avatar Ben Gamari committed
    This simply runs the compile_and_run tests with `-xn`, enabling the
    nonmoving oldest generation.
    9cd98caa
  • Ben Gamari's avatar
    rts: Implement concurrent collection in the nonmoving collector · d475002b
    Ben Gamari authored and Ben Gamari's avatar Ben Gamari committed
    
    
    This extends the non-moving collector to allow concurrent collection.
    
    The full design of the collector implemented here is described in detail
    in a technical note
    
        B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
        Compiler" (2018)
    
    This extension involves the introduction of a capability-local
    remembered set, known as the /update remembered set/, which tracks
    objects which may no longer be visible to the collector due to mutation.
    To maintain this remembered set we introduce a write barrier on
    mutations which is enabled while a concurrent mark is underway.
    
    The update remembered set representation is similar to that of the
    nonmoving mark queue, being a chunked array of `MarkEntry`s. Each
    `Capability` maintains a single accumulator chunk, which it flushed
    when it (a) is filled, or (b) when the nonmoving collector enters its
    post-mark synchronization phase.
    
    While the write barrier touches a significant amount of code it is
    conceptually straightforward: the mutator must ensure that the referee
    of any pointer it overwrites is added to the update remembered set.
    However, there are a few details:
    
     * In the case of objects with a dirty flag (e.g. `MVar`s) we can
       exploit the fact that only the *first* mutation requires a write
       barrier.
    
     * Weak references, as usual, complicate things. In particular, we must
       ensure that the referee of a weak object is marked if dereferenced by
       the mutator. For this we (unfortunately) must introduce a read
       barrier, as described in Note [Concurrent read barrier on deRefWeak#]
       (in `NonMovingMark.c`).
    
     * Stable names are also a bit tricky as described in Note [Sweeping
       stable names in the concurrent collector] (`NonMovingSweep.c`).
    
    We take quite some pains to ensure that the high thread count often seen
    in parallel Haskell applications doesn't affect pause times. To this end
    we allow thread stacks to be marked either by the thread itself (when it
    is executed or stack-underflows) or the concurrent mark thread (if the
    thread owning the stack is never scheduled). There is a non-trivial
    handshake to ensure that this happens without racing which is described
    in Note [StgStack dirtiness flags and concurrent marking].
    
    Co-Authored-by: default avatarÖmer Sinan Ağacan <omer@well-typed.com>
    d475002b
  • Ben Gamari's avatar
  • Ben Gamari's avatar
    rts: Tracing support for nonmoving collection events · 13a27dcd
    Ben Gamari authored and Ben Gamari's avatar Ben Gamari committed
    This introduces a few events to mark key points in the nonmoving
    garbage collection cycle. These include:
    
     * `EVENT_CONC_MARK_BEGIN`, denoting the beginning of a round of
       marking. This may happen more than once in a single major collection
       since we the major collector iterates until it hits a fixed point.
    
     * `EVENT_CONC_MARK_END`, denoting the end of a round of marking.
    
     * `EVENT_CONC_SYNC_BEGIN`, denoting the beginning of the post-mark
       synchronization phase
    
     * `EVENT_CONC_UPD_REM_SET_FLUSH`, indicating that a capability has
       flushed its update remembered set.
    
     * `EVENT_CONC_SYNC_END`, denoting that all mutators have flushed their
       update remembered sets.
    
     * `EVENT_CONC_SWEEP_BEGIN`, denoting the beginning of the sweep portion
       of the major collection.
    
     * `EVENT_CONC_SWEEP_END`, denoting the end of the sweep portion of the
       major collection.
    13a27dcd
  • Ben Gamari's avatar
    rts: Introduce non-moving heap census · 69794713
    Ben Gamari authored and Ben Gamari's avatar Ben Gamari committed
    This introduces a simple census of the non-moving heap (not to be
    confused with the heap census used by the heap profiler). This
    collects basic heap usage information (number of allocated and free
    blocks) which is useful when characterising fragmentation of the
    nonmoving heap.
    69794713
  • Ben Gamari's avatar
    rts/Eventlog: More descriptive error message · 6124d57d
    Ben Gamari authored
    6124d57d
  • Ben Gamari's avatar
    Allow census without live word count · c46d5d87
    Ben Gamari authored
    Otherwise the census is unsafe when mutators are running due to
    concurrent mutation.
    c46d5d87
  • Ben Gamari's avatar
    NonmovingCensus: Emit samples to eventlog · 4d802665
    Ben Gamari authored
    4d802665
  • Ben Gamari's avatar
    rts: Add GetMyThreadCPUTime helper · 5121e50d
    Ben Gamari authored
    5121e50d
  • Ben Gamari's avatar
    4c49e6da
  • Ben Gamari's avatar
    Nonmoving: Allow aging and refactor static objects logic · dfd014a4
    Ben Gamari authored
    This commit does two things:
    
     * Allow aging of objects during the preparatory minor GC
     * Refactor handling of static objects to avoid the use of a hashtable
    dfd014a4
  • Ben Gamari's avatar
    ebff426c
  • Ben Gamari's avatar
    More comments for aging · 3dad5792
    Ben Gamari authored
    3dad5792
  • Ben Gamari's avatar
    testsuite: Add nonmoving_thr way · d85d4b3d
    Ben Gamari authored
    d85d4b3d
  • Ben Gamari's avatar
    testsuite: Add nonmoving_thr_ghc way · c2b47db3
    Ben Gamari authored
    This uses the nonmoving collector when compiling the testcases.
    c2b47db3
  • Ben Gamari's avatar
    testsuite: Don't run T15892 in nonmoving ways · 3eafa1c6
    Ben Gamari authored
    The nonmoving GC doesn't support `+RTS -G1`, which this test insists on.
    3eafa1c6
  • Ben Gamari's avatar
    3d5ddefd
  • Ben Gamari's avatar
  • Ben Gamari's avatar
  • Ben Gamari's avatar
    testsuite: Skip T15892 in nonmoving_thr_ghc · dae8089b
    Ben Gamari authored
    dae8089b
  • Ben Gamari's avatar
    ghc-heap: Skip heap_all test with debugged RTS · 74f26f43
    Ben Gamari authored
    The debugged RTS initializes the heap with 0xaa, which breaks the
    (admittedly rather fragile) assumption that uninitialized fields are set
    to 0x00:
    ```
    Wrong exit code for heap_all(nonmoving)(expected 0 , actual 1 )
    Stderr ( heap_all ):
    heap_all: user error (assertClosuresEq: Closures do not match
    Expected: FunClosure {info = StgInfoTable {entry = Nothing, ptrs = 0, nptrs = 1, tipe = FUN_0_1, srtlen = 0, code = Nothing}, ptrArgs = [], dataArgs = [0]}
    Actual:   FunClosure {info = StgInfoTable {entry = Nothing, ptrs = 0, nptrs = 1, tipe = FUN_0_1, srtlen = 1032832, code = Nothing}, ptrArgs = [], dataArgs = [12297829382473034410]}
    
    CallStack (from HasCallStack):
      assertClosuresEq, called at heap_all.hs:230:9 in main:Main
    )
    ```
    74f26f43
  • Ben Gamari's avatar
    Skip ghc_heap_all test in nonmoving ways · 91745287
    Ben Gamari authored
    91745287
  • Ben Gamari's avatar
    NonMoving: Eliminate integer division in nonmovingBlockCount · 3af63ac7
    Ben Gamari authored
    Perf showed that the this single div was capturing up to 10% of samples
    in nonmovingMark. However, the overwhelming majority of cases is looking
    at small block sizes. These cases we can easily compute explicitly,
    allowing the compiler to turn the division into a significantly more
    efficient division-by-constant.
    
    While the increase in source code looks scary, this all optimises down
    to very nice looking assembler. At this point the only remaining
    hotspots in nonmovingBlockCount are due to memory access.
    3af63ac7
  • Ben Gamari's avatar
    Allocate mark queues in larger block groups · aadf70d0
    Ben Gamari authored
    aadf70d0
  • Ben Gamari's avatar
    NonMovingMark: Optimize representation of mark queue · 57ed3211
    Ben Gamari authored
    This shortens MarkQueueEntry by 30% (one word)
    57ed3211
  • Ben Gamari's avatar
    NonMoving: Optimize bitmap search during allocation · 05c68558
    Ben Gamari authored
    Use memchr instead of a open-coded loop. This is nearly twice as fast in
    a synthetic benchmark.
    05c68558
  • Ben Gamari's avatar
    rts: Add prefetch macros · 35ea3341
    Ben Gamari authored
    35ea3341
  • Ben Gamari's avatar
    NonMoving: Prefetch when clearing bitmaps · 92e76eba
    Ben Gamari authored
    Ensure that the bitmap of the segmentt that we will clear next is in
    cache by the time we reach it.
    92e76eba
  • Ben Gamari's avatar
    NonMoving: Inline nonmovingClearAllBitmaps · f65d3e77
    Ben Gamari authored
    f65d3e77
  • Ben Gamari's avatar
    b117a9e4
  • Ben Gamari's avatar
    NonMoving: Pre-fetch during mark · 73a58b2c
    Ben Gamari authored
    This improved overall runtime on nofib's constraints test by nearly 10%.
    73a58b2c
  • Ben Gamari's avatar
    NonMoving: Prefetch segment header · 4f7323a1
    Ben Gamari authored
    4f7323a1
  • Ben Gamari's avatar
    NonMoving: Optimise allocator cache behavior · d0e4ca99
    Ben Gamari authored
    Previously we would look at the segment header to determine the block
    size despite the fact that we already had the block size at hand.
    d0e4ca99
  • Ben Gamari's avatar
    b06d9731
  • Ben Gamari's avatar
    NonMoving: Don't do major GC if one is already running · 24b3946d
    Ben Gamari authored
    Previously we would perform a preparatory moving collection, resulting
    in many things being added to the mark queue. When we finished with this
    we would realize in nonmovingCollect that there was already a collection
    running, in which case we would simply not run the nonmoving collector.
    
    However, it was very easy to end up in a "treadmilling" situation: all
    subsequent GC following the first failed major GC would be scheduled as
    major GCs. Consequently we would continuously feed the concurrent
    collector with more mark queue entries and it would never finish.
    
    This patch aborts the major collection far earlier, meaning that we
    avoid adding nonmoving objects to the mark queue and allowing the
    concurrent collector to finish.
    24b3946d
  • Ben Gamari's avatar
  • Ben Gamari's avatar
......@@ -40,6 +40,7 @@ module CLabel (
mkAsmTempDieLabel,
mkDirty_MUT_VAR_Label,
mkNonmovingWriteBarrierEnabledLabel,
mkUpdInfoLabel,
mkBHUpdInfoLabel,
mkIndStaticInfoLabel,
......@@ -484,7 +485,9 @@ mkBlockInfoTableLabel name c = IdLabel name c BlockInfoTable
-- See Note [Proc-point local block entry-point].
-- Constructing Cmm Labels
mkDirty_MUT_VAR_Label, mkUpdInfoLabel,
mkDirty_MUT_VAR_Label,
mkNonmovingWriteBarrierEnabledLabel,
mkUpdInfoLabel,
mkBHUpdInfoLabel, mkIndStaticInfoLabel, mkMainCapabilityLabel,
mkMAP_FROZEN_CLEAN_infoLabel, mkMAP_FROZEN_DIRTY_infoLabel,
mkMAP_DIRTY_infoLabel,
......@@ -494,6 +497,8 @@ mkDirty_MUT_VAR_Label, mkUpdInfoLabel,
mkSMAP_FROZEN_CLEAN_infoLabel, mkSMAP_FROZEN_DIRTY_infoLabel,
mkSMAP_DIRTY_infoLabel, mkBadAlignmentLabel :: CLabel
mkDirty_MUT_VAR_Label = mkForeignLabel (fsLit "dirty_MUT_VAR") Nothing ForeignLabelInExternalPackage IsFunction
mkNonmovingWriteBarrierEnabledLabel
= CmmLabel rtsUnitId (fsLit "nonmoving_write_barrier_enabled") CmmData
mkUpdInfoLabel = CmmLabel rtsUnitId (fsLit "stg_upd_frame") CmmInfo
mkBHUpdInfoLabel = CmmLabel rtsUnitId (fsLit "stg_bh_upd_frame" ) CmmInfo
mkIndStaticInfoLabel = CmmLabel rtsUnitId (fsLit "stg_IND_STATIC") CmmInfo
......
......@@ -631,6 +631,7 @@ emitBlackHoleCode node = do
-- work with profiling.
when eager_blackholing $ do
whenUpdRemSetEnabled dflags $ emitUpdRemSetPushThunk node
emitStore (cmmOffsetW dflags node (fixedHdrSizeW dflags)) currentTSOExpr
emitPrimCall [] MO_WriteBarrier []
emitStore node (CmmReg (CmmGlobal EagerBlackholeInfo))
......
......@@ -37,6 +37,7 @@ import BlockId
import MkGraph
import StgSyn
import Cmm
import Module ( rtsUnitId )
import Type ( Type, tyConAppTyCon )
import TyCon
import CLabel
......@@ -314,14 +315,21 @@ emitPrimOp dflags [res] ReadMutVarOp [mutv]
= emitAssign (CmmLocal res) (cmmLoadIndexW dflags mutv (fixedHdrSizeW dflags) (gcWord dflags))
emitPrimOp dflags res@[] WriteMutVarOp [mutv,var]
= do -- Without this write barrier, other CPUs may see this pointer before
= do old_val <- CmmLocal <$> newTemp (cmmExprType dflags var)
emitAssign old_val (cmmLoadIndexW dflags mutv (fixedHdrSizeW dflags) (gcWord dflags))
-- Without this write barrier, other CPUs may see this pointer before
-- the writes for the closure it points to have occurred.
-- Note that this also must come after we read the old value to ensure
-- that the read of old_val comes before another core's write to the
-- MutVar's value.
emitPrimCall res MO_WriteBarrier []
emitStore (cmmOffsetW dflags mutv (fixedHdrSizeW dflags)) var
emitCCall
[{-no results-}]
(CmmLit (CmmLabel mkDirty_MUT_VAR_Label))
[(baseExpr, AddrHint), (mutv,AddrHint)]
[(baseExpr, AddrHint), (mutv, AddrHint), (CmmReg old_val, AddrHint)]
-- #define sizzeofByteArrayzh(r,a) \
-- r = ((StgArrBytes *)(a))->bytes
......@@ -1622,17 +1630,21 @@ doWritePtrArrayOp :: CmmExpr
doWritePtrArrayOp addr idx val
= do dflags <- getDynFlags
let ty = cmmExprType dflags val
hdr_size = arrPtrsHdrSize dflags
-- Update remembered set for non-moving collector
whenUpdRemSetEnabled dflags
$ emitUpdRemSetPush (cmmLoadIndexOffExpr dflags hdr_size ty addr ty idx)
-- This write barrier is to ensure that the heap writes to the object
-- referred to by val have happened before we write val into the array.
-- See #12469 for details.
emitPrimCall [] MO_WriteBarrier []
mkBasicIndexedWrite (arrPtrsHdrSize dflags) Nothing addr ty idx val
mkBasicIndexedWrite hdr_size Nothing addr ty idx val
emit (setInfo addr (CmmLit (CmmLabel mkMAP_DIRTY_infoLabel)))
-- the write barrier. We must write a byte into the mark table:
-- bits8[a + header_size + StgMutArrPtrs_size(a) + x >> N]
-- the write barrier. We must write a byte into the mark table:
-- bits8[a + header_size + StgMutArrPtrs_size(a) + x >> N]
emit $ mkStore (
cmmOffsetExpr dflags
(cmmOffsetExprW dflags (cmmOffsetB dflags addr (arrPtrsHdrSize dflags))
(cmmOffsetExprW dflags (cmmOffsetB dflags addr hdr_size)
(loadArrPtrsSize dflags addr))
(CmmMachOp (mo_wordUShr dflags) [idx,
mkIntExpr dflags (mUT_ARR_PTRS_CARD_BITS dflags)])
......@@ -2223,6 +2235,8 @@ emitCopyArray copy src0 src_off dst0 dst_off0 n =
dst <- assignTempE dst0
dst_off <- assignTempE dst_off0
emitCopyUpdRemSetPush dflags (arrPtrsHdrSizeW dflags) dst dst_off n
-- Set the dirty bit in the header.
emit (setInfo dst (CmmLit (CmmLabel mkMAP_DIRTY_infoLabel)))
......@@ -2285,6 +2299,8 @@ emitCopySmallArray copy src0 src_off dst0 dst_off n =
src <- assignTempE src0
dst <- assignTempE dst0
emitCopyUpdRemSetPush dflags (smallArrPtrsHdrSizeW dflags) dst dst_off n
-- Set the dirty bit in the header.
emit (setInfo dst (CmmLit (CmmLabel mkSMAP_DIRTY_infoLabel)))
......@@ -2413,6 +2429,12 @@ doWriteSmallPtrArrayOp :: CmmExpr
doWriteSmallPtrArrayOp addr idx val = do
dflags <- getDynFlags
let ty = cmmExprType dflags val
-- Update remembered set for non-moving collector
tmp <- newTemp ty
mkBasicIndexedRead (smallArrPtrsHdrSize dflags) Nothing ty tmp addr ty idx
whenUpdRemSetEnabled dflags $ emitUpdRemSetPush (CmmReg (CmmLocal tmp))
emitPrimCall [] MO_WriteBarrier [] -- #12469
mkBasicIndexedWrite (smallArrPtrsHdrSize dflags) Nothing addr ty idx val
emit (setInfo addr (CmmLit (CmmLabel mkSMAP_DIRTY_infoLabel)))
......@@ -2592,3 +2614,31 @@ emitCtzCall res x width = do
[ res ]
(MO_Ctz width)
[ x ]
---------------------------------------------------------------------------
-- Pushing to the update remembered set
---------------------------------------------------------------------------
-- | Push a range of pointer-array elements that are about to be copied over to
-- the update remembered set.
emitCopyUpdRemSetPush :: DynFlags
-> WordOff -- ^ array header size
-> CmmExpr -- ^ destination array
-> CmmExpr -- ^ offset in destination array (in words)
-> Int -- ^ number of elements to copy
-> FCode ()
emitCopyUpdRemSetPush _dflags _hdr_size _dst _dst_off 0 = return ()
emitCopyUpdRemSetPush dflags hdr_size dst dst_off n =
whenUpdRemSetEnabled dflags $ do
updfr_off <- getUpdFrameOff
graph <- mkCall lbl (NativeNodeCall,NativeReturn) [] args updfr_off []
emit graph
where
lbl = mkLblExpr $ mkPrimCallLabel
$ PrimCall (fsLit "stg_copyArray_barrier") rtsUnitId
args =
[ mkIntExpr dflags hdr_size
, dst
, dst_off
, mkIntExpr dflags n
]
......@@ -39,6 +39,11 @@ module StgCmmUtils (
mkWordCLit,
newStringCLit, newByteStringCLit,
blankWord,
-- * Update remembered set operations
whenUpdRemSetEnabled,
emitUpdRemSetPush,
emitUpdRemSetPushThunk,
) where
#include "HsVersions.h"
......@@ -576,3 +581,40 @@ assignTemp' e
let reg = CmmLocal lreg
emitAssign reg e
return (CmmReg reg)
---------------------------------------------------------------------------
-- Pushing to the update remembered set
---------------------------------------------------------------------------
whenUpdRemSetEnabled :: DynFlags -> FCode a -> FCode ()
whenUpdRemSetEnabled dflags code = do
do_it <- getCode code
the_if <- mkCmmIfThenElse' is_enabled do_it mkNop (Just False)
emit the_if
where
enabled = CmmLoad (CmmLit $ CmmLabel mkNonmovingWriteBarrierEnabledLabel) (bWord dflags)
zero = zeroExpr dflags
is_enabled = cmmNeWord dflags enabled zero
-- | Emit code to add an entry to a now-overwritten pointer to the update
-- remembered set.
emitUpdRemSetPush :: CmmExpr -- ^ value of pointer which was overwritten
-> FCode ()
emitUpdRemSetPush ptr = do
emitRtsCall
rtsUnitId
(fsLit "updateRemembSetPushClosure_")
[(CmmReg (CmmGlobal BaseReg), AddrHint),
(ptr, AddrHint)]
False
emitUpdRemSetPushThunk :: CmmExpr -- ^ the thunk
-> FCode ()
emitUpdRemSetPushThunk ptr = do
emitRtsCall
rtsUnitId
(fsLit "updateRemembSetPushThunk_")
[(CmmReg (CmmGlobal BaseReg), AddrHint),
(ptr, AddrHint)]
False
......@@ -832,6 +832,10 @@
__gen = TO_W_(bdescr_gen_no(__bd)); \
if (__gen > 0) { recordMutableCap(__p, __gen); }
/* -----------------------------------------------------------------------------
Update remembered set write barrier
-------------------------------------------------------------------------- */
/* -----------------------------------------------------------------------------
Arrays
-------------------------------------------------------------------------- */
......@@ -934,3 +938,25 @@
prim %memcpy(dst_p, src_p, n * SIZEOF_W, SIZEOF_W); \
\
return (dst);
//
// Nonmoving write barrier helpers
//
// See Note [Update remembered set] in NonMovingMark.c.
#if defined(THREADED_RTS)
#define IF_NONMOVING_WRITE_BARRIER_ENABLED \
if (W_[nonmoving_write_barrier_enabled] != 0) (likely: False)
#else
// A similar measure is also taken in rts/NonMoving.h, but that isn't visible from C--
#define IF_NONMOVING_WRITE_BARRIER_ENABLED \
if (0)
#define nonmoving_write_barrier_enabled 0
#endif
// A useful helper for pushing a pointer to the update remembered set.
#define updateRemembSetPushPtr(p) \
IF_NONMOVING_WRITE_BARRIER_ENABLED { \
ccall updateRemembSetPushClosure_(BaseReg "ptr", p "ptr"); \
}
......@@ -74,6 +74,10 @@ extern "C" {
#define RTS_UNREACHABLE abort()
#endif
/* Prefetch primitives */
#define prefetchForRead(ptr) __builtin_prefetch(ptr, 0)
#define prefetchForWrite(ptr) __builtin_prefetch(ptr, 1)
/* Fix for mingw stat problem (done here so it's early enough) */
#if defined(mingw32_HOST_OS)
#define __MSVCRT__ 1
......@@ -189,6 +193,7 @@ void _assertFail(const char *filename, unsigned int linenum)
#include "rts/storage/ClosureMacros.h"
#include "rts/storage/MBlock.h"
#include "rts/storage/GC.h"
#include "rts/NonMoving.h"
/* Other RTS external APIs */
#include "rts/Parallel.h"
......
......@@ -151,6 +151,23 @@ typedef struct GCDetails_ {
Time cpu_ns;
// The time elapsed during GC itself
Time elapsed_ns;
//
// Concurrent garbage collector
//
// The CPU time used during the post-mark pause phase of the concurrent
// nonmoving GC.
Time nonmoving_gc_sync_cpu_ns;
// The time elapsed during the post-mark pause phase of the concurrent
// nonmoving GC.
Time nonmoving_gc_sync_elapsed_ns;
// The CPU time used during the post-mark pause phase of the concurrent
// nonmoving GC.
Time nonmoving_gc_cpu_ns;
// The time elapsed during the post-mark pause phase of the concurrent
// nonmoving GC.
Time nonmoving_gc_elapsed_ns;
} GCDetails;
//
......@@ -241,6 +258,28 @@ typedef struct _RTSStats {
// The number of times a GC thread has iterated it's outer loop across all
// parallel GCs
uint64_t scav_find_work;
// ----------------------------------
// Concurrent garbage collector
// The CPU time used during the post-mark pause phase of the concurrent
// nonmoving GC.
Time nonmoving_gc_sync_cpu_ns;
// The time elapsed during the post-mark pause phase of the concurrent
// nonmoving GC.
Time nonmoving_gc_sync_elapsed_ns;
// The maximum time elapsed during the post-mark pause phase of the
// concurrent nonmoving GC.
Time nonmoving_gc_sync_max_elapsed_ns;
// The CPU time used during the post-mark pause phase of the concurrent
// nonmoving GC.
Time nonmoving_gc_cpu_ns;
// The time elapsed during the post-mark pause phase of the concurrent
// nonmoving GC.
Time nonmoving_gc_elapsed_ns;
// The maximum time elapsed during the post-mark pause phase of the
// concurrent nonmoving GC.
Time nonmoving_gc_max_elapsed_ns;
} RTSStats;
void getRTSStats (RTSStats *s);
......
......@@ -182,12 +182,21 @@
#define EVENT_USER_BINARY_MSG 181
#define EVENT_CONC_MARK_BEGIN 200
#define EVENT_CONC_MARK_END 201
#define EVENT_CONC_SYNC_BEGIN 202
#define EVENT_CONC_SYNC_END 203
#define EVENT_CONC_SWEEP_BEGIN 204
#define EVENT_CONC_SWEEP_END 205
#define EVENT_CONC_UPD_REM_SET_FLUSH 206
#define EVENT_NONMOVING_HEAP_CENSUS 207
/*
* The highest event code +1 that ghc itself emits. Note that some event
* ranges higher than this are reserved but not currently emitted by ghc.
* This must match the size of the EventDesc[] array in EventLog.c
*/
#define NUM_GHC_EVENT_TAGS 182
#define NUM_GHC_EVENT_TAGS 208
#if 0 /* DEPRECATED EVENTS: */
/* we don't actually need to record the thread, it's implicit */
......
......@@ -169,6 +169,7 @@ typedef struct _TRACE_FLAGS {
bool timestamp; /* show timestamp in stderr output */
bool scheduler; /* trace scheduler events */
bool gc; /* trace GC events */
bool nonmoving_gc; /* trace nonmoving GC events */
bool sparks_sampled; /* trace spark events by a sampled method */
bool sparks_full; /* trace spark events 100% accurately */
bool user; /* trace user events (emitted from Haskell code) */
......
/* -----------------------------------------------------------------------------
*
* (c) The GHC Team, 2018-2019
*
* Non-moving garbage collector
*
* Do not #include this file directly: #include "Rts.h" instead.
*
* To understand the structure of the RTS headers, see the wiki:
* http://ghc.haskell.org/trac/ghc/wiki/Commentary/SourceTree/Includes
*
* -------------------------------------------------------------------------- */
#pragma once
/* This is called by the code generator */
extern DLL_IMPORT_RTS
void updateRemembSetPushClosure_(StgRegTable *reg, StgClosure *p);
void updateRemembSetPushClosure(Capability *cap, StgClosure *p);
void updateRemembSetPushThunk_(StgRegTable *reg, StgThunk *p);
// Note that RTS code should not condition on this directly by rather
// use the IF_NONMOVING_WRITE_BARRIER_ENABLED macro to ensure that
// the barrier is eliminated in the non-threaded RTS.
extern StgWord DLL_IMPORT_DATA_VAR(nonmoving_write_barrier_enabled);
......@@ -97,6 +97,8 @@ typedef struct bdescr_ {
// block allocator. In particular, the
// value (StgPtr)(-1) is used to
// indicate that a block is unallocated.
//
// Unused by the non-moving allocator.
struct bdescr_ *link; // used for chaining blocks together
......@@ -141,7 +143,8 @@ typedef struct bdescr_ {
#define BF_LARGE 2
/* Block is pinned */
#define BF_PINNED 4
/* Block is to be marked, not copied */
/* Block is to be marked, not copied. Also used for marked large objects in
* non-moving heap. */
#define BF_MARKED 8
/* Block is executable */
#define BF_EXEC 32
......@@ -153,6 +156,12 @@ typedef struct bdescr_ {
#define BF_SWEPT 256
/* Block is part of a Compact */
#define BF_COMPACT 512
/* A non-moving allocator segment (see NonMoving.c) */
#define BF_NONMOVING 1024
/* A large object which has been moved to off of oldest_gen->large_objects and
* onto nonmoving_large_objects. The mark phase ignores objects which aren't
* so-flagged */
#define BF_NONMOVING_SWEEPING 2048
/* Maximum flag value (do not define anything higher than this!) */
#define BF_FLAG_MAX (1 << 15)
......
......@@ -107,6 +107,14 @@ INLINE_HEADER const StgConInfoTable *get_con_itbl(const StgClosure *c)
return CON_INFO_PTR_TO_STRUCT((c)->header.info);
}
/* Used when we expect another thread to be mutating the info table pointer of
* a closure (e.g. when busy-waiting on a WHITEHOLE).
*/
INLINE_HEADER const StgInfoTable *get_volatile_itbl(StgClosure *c) {
return INFO_PTR_TO_STRUCT((StgInfoTable*) VOLATILE_LOAD(&c->header.info));
}
INLINE_HEADER StgHalfWord GET_TAG(const StgClosure *con)
{
return get_itbl(con)->srt;
......
......@@ -234,7 +234,7 @@ void setKeepCAFs (void);
and is put on the mutable list.
-------------------------------------------------------------------------- */
void dirty_MUT_VAR(StgRegTable *reg, StgClosure *p);
void dirty_MUT_VAR(StgRegTable *reg, StgMutVar *mv, StgClosure *old);
/* set to disable CAF garbage collection in GHCi. */
/* (needed when dynamic libraries are used). */
......
......@@ -185,6 +185,53 @@ typedef struct StgTSO_ {
} *StgTSOPtr; // StgTSO defined in rts/Types.h
/* Note [StgStack dirtiness flags and concurrent marking]
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*
* Without concurrent collection by the nonmoving collector the stack dirtiness story
* is quite simple: The stack is either STACK_DIRTY (meaning it has been added to mut_list)
* or not.
*
* However, things are considerably more complicated with concurrent collection
* (namely, when nonmoving_write_barrier_enabled is set): In addition to adding
* the stack to mut_list and flagging it as STACK_DIRTY, we also must ensure
* that stacks are marked in accordance with the nonmoving collector's snapshot
* invariant. This is: every stack alive at the time the snapshot is taken must
* be marked at some point after the moment the snapshot is taken and before it
* is mutated or the commencement of the sweep phase.
*
* This marking may be done by the concurrent mark phase (in the case of a
* thread that never runs during the concurrent mark) or by the mutator when
* dirtying the stack. However, it is unsafe for the concurrent collector to
* traverse the stack while it is under mutation. Consequently, the following
* handshake is obeyed by the mutator's write barrier and the concurrent mark to
* ensure this doesn't happen:
*
* 1. The entity seeking to mark first checks that the stack lives in the nonmoving
* generation; if not then the stack was not alive at the time the snapshot
* was taken and therefore we need not mark it.
*
* 2. The entity seeking to mark checks the stack's mark bit. If it is set then
* no mark is necessary.
*
* 3. The entity seeking to mark tries to lock the stack for marking by
* atomically setting its `marking` field to the current non-moving mark
* epoch:
*
* a. If the mutator finds the concurrent collector has already locked the
* stack then it waits until it is finished (indicated by the mark bit
* being set) before proceeding with execution.
*
* b. If the concurrent collector finds that the mutator has locked the stack
* then it moves on, leaving the mutator to mark it. There is no need to wait;
* the mark is guaranteed to finish before sweep due to the post-mark
* synchronization with mutators.
*
* c. Whoever succeeds in locking the stack is responsible for marking it and
* setting the stack's mark bit (either the BF_MARKED bit for large objects
* or otherwise its bit in its segment's mark bitmap).
*
*/
#define STACK_DIRTY 1
// used by sanity checker to verify that all dirty stacks are on the mutable list
......@@ -193,7 +240,8 @@ typedef struct StgTSO_ {
typedef struct StgStack_ {
StgHeader header;
StgWord32 stack_size; // stack size in *words*
StgWord32 dirty; // non-zero => dirty
StgWord dirty; // non-zero => dirty
StgWord marking; // non-zero => someone is currently marking the stack
StgPtr sp; // current stack pointer
StgWord stack[];
} StgStack;
......
......@@ -542,5 +542,6 @@ void * pushCostCentre (void *ccs, void *cc);
// Capability.c
extern unsigned int n_capabilities;
extern void updateRemembSetPushThunk_(void *reg, void *p1);
#endif
......@@ -292,6 +292,8 @@ data TraceFlags = TraceFlags
, timestamp :: Bool -- ^ show timestamp in stderr output
, traceScheduler :: Bool -- ^ trace scheduler events
, traceGc :: Bool -- ^ trace GC events
, traceNonmovingGc
:: Bool -- ^ trace nonmoving GC heap census samples
, sparksSampled :: Bool -- ^ trace spark events by a sampled method
, sparksFull :: Bool -- ^ trace spark events 100% accurately
, user :: Bool -- ^ trace user events (emitted from Haskell code)
......@@ -525,6 +527,8 @@ getTraceFlags = do
(#{peek TRACE_FLAGS, scheduler} ptr :: IO CBool))
<*> (toBool <$>
(#{peek TRACE_FLAGS, gc} ptr :: IO CBool))
<*> (toBool <$>
(#{peek TRACE_FLAGS, nonmoving_gc} ptr :: IO CBool))
<*> (toBool <$>
(#{peek TRACE_FLAGS, sparks_sampled} ptr :: IO CBool))
<*> (toBool <$>
......
......@@ -103,6 +103,25 @@ data RTSStats = RTSStats {
-- | Total elapsed time (at the previous GC)
, elapsed_ns :: RtsTime
-- | The CPU time used during the post-mark pause phase of the concurrent
-- nonmoving GC.
, nonmoving_gc_sync_cpu_ns :: RtsTime
-- | The time elapsed during the post-mark pause phase of the concurrent
-- nonmoving GC.
, nonmoving_gc_sync_elapsed_ns :: RtsTime
-- | The maximum time elapsed during the post-mark pause phase of the
-- concurrent nonmoving GC.
, nonmoving_gc_sync_max_elapsed_ns :: RtsTime
-- | The CPU time used during the post-mark pause phase of the concurrent
-- nonmoving GC.
, nonmoving_gc_cpu_ns :: RtsTime
-- | The time elapsed during the post-mark pause phase of the concurrent
-- nonmoving GC.
, nonmoving_gc_elapsed_ns :: RtsTime
-- | The maximum time elapsed during the post-mark pause phase of the
-- concurrent nonmoving GC.
, nonmoving_gc_max_elapsed_ns :: RtsTime
-- | Details about the most recent GC
, gc :: GCDetails
} deriving ( Read -- ^ @since 4.10.0.0
......@@ -146,6 +165,13 @@ data GCDetails = GCDetails {
, gcdetails_cpu_ns :: RtsTime
-- | The time elapsed during GC itself
, gcdetails_elapsed_ns :: RtsTime
-- | The CPU time used during the post-mark pause phase of the concurrent
-- nonmoving GC.
, gcdetails_nonmoving_gc_sync_cpu_ns :: RtsTime
-- | The time elapsed during the post-mark pause phase of the concurrent
-- nonmoving GC.
, gcdetails_nonmoving_gc_sync_elapsed_ns :: RtsTime
} deriving ( Read -- ^ @since 4.10.0.0
, Show -- ^ @since 4.10.0.0
)
......@@ -192,6 +218,12 @@ getRTSStats = do
gc_elapsed_ns <- (# peek RTSStats, gc_elapsed_ns) p
cpu_ns <- (# peek RTSStats, cpu_ns) p
elapsed_ns <- (# peek RTSStats, elapsed_ns) p
nonmoving_gc_sync_cpu_ns <- (# peek RTSStats, nonmoving_gc_sync_cpu_ns) p
nonmoving_gc_sync_elapsed_ns <- (# peek RTSStats, nonmoving_gc_sync_elapsed_ns) p
nonmoving_gc_sync_max_elapsed_ns <- (# peek RTSStats, nonmoving_gc_sync_max_elapsed_ns) p
nonmoving_gc_cpu_ns <- (# peek RTSStats, nonmoving_gc_cpu_ns) p
nonmoving_gc_elapsed_ns <- (# peek RTSStats, nonmoving_gc_elapsed_ns) p
nonmoving_gc_max_elapsed_ns <- (# peek RTSStats, nonmoving_gc_max_elapsed_ns) p
let pgc = (# ptr RTSStats, gc) p
gc <- do
gcdetails_gen <- (# peek GCDetails, gen) pgc
......@@ -211,5 +243,7 @@ getRTSStats = do
gcdetails_sync_elapsed_ns <- (# peek GCDetails, sync_elapsed_ns) pgc
gcdetails_cpu_ns <- (# peek GCDetails, cpu_ns) pgc
gcdetails_elapsed_ns <- (# peek GCDetails, elapsed_ns) pgc
gcdetails_nonmoving_gc_sync_cpu_ns <- (# peek GCDetails, nonmoving_gc_sync_cpu_ns) pgc
gcdetails_nonmoving_gc_sync_elapsed_ns <- (# peek GCDetails, nonmoving_gc_sync_elapsed_ns) pgc
return GCDetails{..}
return RTSStats{..}
......@@ -2,7 +2,11 @@ test('heap_all',
[when(have_profiling(), extra_ways(['prof'])),
# These ways produce slightly different heap representations.
# Currently we don't test them.
omit_ways(['ghci', 'hpc'])
omit_ways(['ghci', 'hpc',
'nonmoving', 'nonmoving_thr', 'nonmoving_thr_ghc']),
# The debug RTS initializes some fields with 0xaa and so
# this test spuriously fails.
when(compiler_debugged(), skip)
],
compile_and_run, [''])
......
......@@ -652,6 +652,8 @@ INFO_TABLE(stg_AP_STACK,/*special layout*/0,0,AP_STACK,"AP_STACK","AP_STACK")
/* someone else beat us to it */
jump ENTRY_LBL(stg_WHITEHOLE) (ap);
}
// Can't add StgInd_indirectee(ap) to UpdRemSet here because the old value is
// not reachable.
StgInd_indirectee(ap) = CurrentTSO;
prim_write_barrier;
SET_INFO(ap, __stg_EAGER_BLACKHOLE_info);
......
......@@ -27,6 +27,7 @@
#include "STM.h"
#include "RtsUtils.h"
#include "sm/OSMem.h"
#include "sm/BlockAlloc.h" // for countBlocks()
#if !defined(mingw32_HOST_OS)
#include "rts/IOManager.h" // for setIOManagerControlFd()
......@@ -291,6 +292,11 @@ initCapability (Capability *cap, uint32_t i)
RtsFlags.GcFlags.generations,
"initCapability");
// At this point storage manager is not initialized yet, so this will be
// initialized in initStorage().
cap->upd_rem_set.queue.blocks = NULL;
for (g = 0; g < RtsFlags.GcFlags.generations; g++) {
cap->mut_lists[g] = NULL;
}
......@@ -860,16 +866,27 @@ yieldCapability (Capability** pCap, Task *task, bool gcAllowed)
{
PendingSync *sync = pending_sync;
if (sync && sync->type == SYNC_GC_PAR) {
if (! sync->idle[cap->no]) {
traceEventGcStart(cap);
gcWorkerThread(cap);
traceEventGcEnd(cap);
traceSparkCounters(cap);
// See Note [migrated bound threads 2]
if (task->cap == cap) {
return true;
if (sync) {
switch (sync->type) {
case SYNC_GC_PAR:
if (! sync->idle[cap->no]) {
traceEventGcStart(cap);
gcWorkerThread(cap);
traceEventGcEnd(cap);
traceSparkCounters(cap);
// See Note [migrated bound threads 2]
if (task->cap == cap) {
return true;
}
}
break;
case SYNC_FLUSH_UPD_REM_SET:
debugTrace(DEBUG_nonmoving_gc, "Flushing update remembered set blocks...");
break;
default:
break;
}
}
}
......