An idea for a different nonmoving collector mark queue
While reading the nonmoving collector code I had an idea for alternative mark queue structure. I'm not sure if this is any good, but I thought I'd write it down in any case before I forget about it.
Motivation
As laid out in the design docs for the nonmoving collector, we currently have to keep track of the nonmoving epoch to distinguish between things that have been marked during this GC or a previous GC to deal with cycles on the heap. There's currently a TODO in the codebase to get rid of this.
Also the current marking strategy tends to jump around the heap (I think), which isn't great for locality.
I have a vague idea that might do better in these two regards.
Idea
The core of the idea is to split the nonmoving segment bitmaps in two. Currently it is used during marking to track marked blocks, and it is used during allocation to find free blocks.
Under the idea, nonmoving segments retains a bitmap but now it not changed or read during marking at all but instead updated when marking finishes.
At the start of marking we create an empty bitmap-like data structure that holds the marking state and since no information from previous marking is available here we don't need to make use of the nonmoving epoch anymore. I have a somewhat fuzzy idea of what this structure would look like but the core of it is that it holds for each segment, a queue of block indexes to be marked and a bitmap of already marked blocks. The key idea here is that we do things on the segment level to allow us to (as much as possible) to stay marking blocks from the same segment until we are forced elsewhere by an empty queue. There's also potential for parallelising marking through different workers marking segments in parallel.
So the algorithm is:
- Mark all the blocks in the segment
- Pick the next segment with to-be marked blocks and repeat.
- When everything is marked overwrite the bitmaps on the segments with the ones from the mark phase up to the snapshot pointer.
This has the advantage of:
- not requiring the marking epoch state
- having a clear split between marking and allocator state
- maybe encouraging a bit more locality that might lead to better cache usage
I haven't worked through these ideas completely so maybe there's some fundamental issues with it, but it was fun to think about.