rts/nonmoving: Don't scavenge objects which weren't evacuated
This fixes a rather subtle bug in the logic responsible for scavenging objects evacuated to the non-moving generation. In particular, objects can be allocated into the non-moving generation by two ways:
a. evacuation out of from-space by the garbage collector
b. direct allocation by the mutator
Like all evacuation, objects moved by (a) must be scavenged, since they may contain references to other objects located in from-space. To accomplish this we have the following scheme:
-
each nonmoving segment's block descriptor has a scan pointer which points to the first object which has yet to be scavenged
-
the GC tracks a set of "todo" segments which have pending scavenging work
-
to scavenge a segment, we scavenge each of the unmarked blocks between the scan pointer and segment's
next_free
pointer.We skip marked blocks since we know the allocator wouldn't have allocated into marked blocks (since they contain presumably live data).
We can stop at
next_free
since, by definition, the GC could not have evacuated any objects to blocks abovenext_free
(otherwise `next_free wouldn't be the first free block).
However, this neglected to consider objects allocated by path (b). In short, the problem is that objects directly allocated by the mutator may become unreachable (but not swept, since the containing segment is not yet full), at which point they may contain references to swept objects. Specifically, we observed this in #21885 (closed) in the following way:
- the mutator (specifically in #21885 (closed), a
lockCAF
) allocates an object (specifically a blackhole, which here we will callblkh
; see Note [Static objects under the nonmoving collector] for the reason why) on the non-moving heap. The bitmap of the allocated block remains 0 (since allocation doesn't affect the bitmap) and the containing segment's (which we will callblkh_seg
)next_free
is advanced. - We enter the blackhole, evaluating the blackhole to produce a result (specificaly a cons cell) in the nursery
- The blackhole gets updated into an indirection pointing to the cons cell; it is pushed to the generational remembered set
- we perform a GC, the cons cell is evacuated into the nonmoving heap
(into segment
cons_seg
) - the cons cell is marked
- the GC concludes
- the CAF and blackhole become unreachable
-
cons_seg
is filled - we start another GC; the cons cell is swept
- we start a new GC
- something is evacuated into
blkh_seg
, adding it to the "todo" list - we attempt to scavenge
blkh_seg
(namely, all unmarked blocks betweenscan
andnext_free
, which includesblkh
). We attempt to evacuateblkh
's indirectee, which is the previously-swept cons cell. This is unsafe, since the indirectee is no longer a valid heap object.
The problem here was that the scavenging logic assumed that (a) was
the only source of allocations into the non-moving heap and therefore
all unmarked blocks between scan
and next_free
were evacuated.
However, due to (b) this is not true.
The solution is to ensure that that the scanned region only encompasses
the region of objects allocated during evacuation. We do this by
updating scan
as we push the segment to the todo-segment list to
point to the block which was evacuated into.
Doing this required changing the nonmoving scavenging implementation's
update of the scan
pointer to bump it once, instead of after
scavenging each block as was done previously. This is because we may end
up evacuating into the segment being scavenged as we scavenge it. This
was quite tricky to discover but the result is quite simple,
demonstrating yet again that global mutable state should be used
exceedingly sparingly.
Fixes #21885 (closed)