Skip to content

WIP: Garbage collect profiling data, permit object unloading in profiled programs

Andrew Farmer requested to merge xich/ghc:profiled_object_unload_test into master

Picking the thread of #11776 (closed) back up.

Immediate Problem

https://phabricator.haskell.org/D2069 added a check in the GC to keep an object alive if some part of the CCS tree lives in it. This prevents a profiled program that loads/unloads objects at runtime from segfaulting on exit when it tries to print the profile.

Unfortunately it also prevents any object from being unloaded until the program ends, because the CCS tree only ever grows. In particular, this means a profiled ghci can never unload code... effectively a memory leak.

Details

Previously, the CCS tree grew forever, and then was traversed/pruned/written out at program exit. This MR effectively implements garbage collection for the two profiling data structures (CCS tree and CC_LIST) and a means of snapshotting the profile data.

What happened before:

  1. Object is marked as unloaded, moved to unloaded_objects list.
  2. During major GC, checkUnload walks the heap, the CCS tree, and CC_LIST, and marks referenced bit on owning object.
  3. Unreferenced objects in unloaded_objects list are actually unloaded.

What happens now:

  1. Object is marked as unloaded, moved to unloaded_objects list.
  2. During major GC, checkUnload:
    1. Walks the heap, marking the referenced bit on objects AND marking a new referenced bit on CCSs.
    2. CCS tree is pruned via post-order traversal. CCS is kept alive if any of these are true:
      • it is referenced by the heap
      • it has any entries/ticks/alloc
      • it has any live children.
    3. CCS tree (the smaller, pruned one) is traversed to mark the referenced bit on owning objects.
  3. Unreferenced objects in unloaded_objects list are actually unloaded.
    1. Any CC that lives in an unreferenced object in unloaded_objects is removed from CC_LIST.

In order to meet the pruning condition for CCS tree nodes, we need to snapshot the data in it and reset it to zero. To do that, I fixed report generation so it could be run more than once per program, and not just at end-of-program. Generating a report resets the data in the CCS tree to zero.

There were several things to fix:

  • reportPerCCCosts destructively mutated CC_LIST in order to build a sorted list. Now it does the extra bookkeeping so it can tack sorted_cc_list back on to the front of CC_LIST when it is done. Also takes the right mutex to prevent weirdness if the linker runs concurrently.

  • Got rid of the CCS tree pruning during report generation, since it may prune CCSs referenced by the heap. Pruning was done to hide cost centres in the report. Instead, we tag each CCS with CCS_VSIBILE while traversing the tree to aggregate data. This exposed a bug in the existing pruning, where a back edge would keep a node alive even if it had no entries/cost, so the profile contained lots of needless extra rows of all zeros. I updated the tests to remove these rows from expected output.

  • Wrapped the sorting of children in the mutex, so sorting doesn't break a concurrent call of pushCostCentre. Also fixed a bug in pushCostCentre where we were checking the old memo table (the one before taking the mutex) instead of the new one.

  • Combined several traversals of the CCS tree to aggregate data into a single traversal.

Diffs

I organized the diffs into a stack for easier reviewing. Hopefully Gitlab allows you to review it that way. If people prefer, I can split into multiple merge requests.

Edited by Andrew Farmer

Merge request reports