Skip to content

Non-moving GC unsoundness due to dead-lock GC

The non-moving collector's deadlock-detection GC can result in unsoundness due to a subtle bug in alloc_for_copy. I laid out the reason in this Note,

/*
 * Note [Double-checking dirtyness]
 * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 * In the case of the moving collector it is always safe to add an object to
 * the mut_list. Consequently, the moving collector's `evacuate` function can
 * set the failed_to_evac with no ill effects.
 *
 * However, in the case of the non-moving collector evacuate claiming that we
 * failed to evacuate an object can be catastrophic.  This is because we use
 * the dirty bit to achieve an optimisation for MUT_VARs and TVARs, as
 * described in Note [Dirty flags in the non-moving collector] in NonMoving.c.
 * Failing to mark an object that contains a reference into the non-moving
 * heap as clean is a one-way ticket to unsoundness-town.
 *
 * To see why, consider the program has this heap structure before a
 * preparatory collection:
 *
 *    Non-moving heap     |      Moving heap
 *         gen 1          |         gen 0
 *  -------------------------------------------------
 *                        |
 *       MUT_VAR A        |          X
 *        (dirty)         |
 *           |            |
 *                        |
 *                        |
 *                        |
 *           Z            |
 *                        |
 *
 * During the preparatory collection we promote X to the nonmoving heap.
 * However, due to a bug evacuate claims that we failed to evacuate. This gives
 * us:
 *
 *    Non-moving heap     |      Moving heap
 *         gen 1          |         gen 0
 *  -------------------------------------------------
 *                        |
 *       MUT_VAR A        |
 *        (dirty)         |
 *           |            |
 *                        |
 *           X'           |
 *           |            |
 *                        |
 *           Z            |
 *                        |
 *
 * Now we resume mutation and some unwitting mutator goes to mutator MUT_VAR A.
 * It will see that A is already dirty and consequently fail to push it to the
 * update remembered set, breaching the snapshot invariant. Bad news!
 *
 * In principle there is no reason why the moving collector should incorrectly
 * set failed_to_evac. However, I once encountered a nasty bug which resulted
 * in exactly this behavior. Consequently, there are now rather paranoid
 * ASSERTs in nonmovingScavengeOne to ensure that an object that we failed to
 * evacuate truly is dirty.
 *
 */
Edited by Ben Gamari
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information