Out of process heap traversal

In #16788 (closed) @DanielG brought up an idea that I have long been tossing around in the back of my mind:

While working on this and seeing how horrible traversing the heap is in C a thought occurred. Can we use ghc-heap to do all of this stuff in Haskell instead of C? At first that sounds kind of crazy. If we attempt to do this in the same process as the code being profiled we have to make sure GC doesn't run or else heap objects might move. It could still be possible by reserving a large nursery before running the profiling code. I'm not sure.

I have also considered this possibility in the past. As you point out, GC is a significant problem. I don't see any way to address this without severely limiting the utility of the mechanism (e.g. you restrict the amount of traversal that can be done to that which can fit in your nursery). Intriguingly, using the new nonmoving garbage collector (~"nonmoving-gc") would avoid the problem of objects being moved during traversal, although would not save you from measurement artifacts due to the traversal logic finding objects that it creates (which in some cases may dwarf the original heap we are measuring).

For this reason I would suggest that we instead think about how to allow the heap traversal to run out of process. This is similar to the remote debugging capabilities available in most JVM implementations: you allow a process (the debuggee) to allow itself to be attached to from another process (the debugger). The debugger can then request that the debuggee pause, giving it a consistent view into the debuggee's heap (via mmap, for instance). At this point the debugger can do whatever traversals it needs (with the ghc-heap API). When it is finished the debugger can signal the debuggee to continue execution.

The basic idea here is fairly straightforward but there are a few details to be worked out:

How does the debugger enumerate the heap's "roots"?
How does signalling between the debugger and the debugee happen? In principle this whole mechanism could be implemented with ptrace, at the expense of cross-platform support. Alternatively, the RTS could expose a socket interface, like the JVM does.
How does the debugee provide access to its heap?
1. If ptrace is used then PTRACE_PEEK* could be used by the debugger.
2. On POSIX platforms the debugger could mmap the debuggee's /proc/*/mem file
3. If the socket interface were a domain socket: On some platforms it might be possible the debuggee could send an fd representing their heap mapping to the debugger, which the debugger could then mmap
4. The socket protocol could define read operations allowing the debugger to request heap objects from the debuggee
Of these, (4) is by far the simplest and least platform dependent. However, it is also the least efficient, requiring two context switches per request. The cost of, e.g., walking a long linked list via this mechanism would like be prohibitive if naively implemented. For this reason I suspect the protocol would need to provide some sort of batching.

On the whole I suspect a simple socket protocol with option (4) could be reasonably simple. Moreover, the entire thing could be implemented in a pair of libraries:

a ghc-debugee library would include a C start-up hook ¹ which would open the socket and start a thread to serve requests, calling into the RTS to pause/resume execution when necessary (although figuring out how best to do this will take a bit of digging).
a ghc-debugger-raw library would provide the Haskell client bits. This would just be a small shim putting a safe interface around connecting to the debuggee, requesting a pause, and requesting objects from the debuggee, and deserialising them into the ghc-heap representation.

Finally, a ghc-debugger library could offer a high-level interface which would provide a "pure" view into the debuggee's heap via laziness (which could even be automatically prefetched and batched into larger requests for efficiency). The only potential issue with this final layer is the possibility for leaks (e.g. it would be very easy for the debugger to inadvertently retain an full copy of the debuggee's heap in the form of the (much larger) ghc-heap representation).

I do a trick similar to this in my ghc-eventlog-socket experiment. ↩

Edited Jun 09, 2019 by Ben Gamari

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information