Out of process heap traversal
In #16788 (closed) @DanielG brought up an idea that I have long been tossing around in the back of my mind:
While working on this and seeing how horrible traversing the heap is in C a thought occurred. Can we use
ghc-heap
to do all of this stuff in Haskell instead of C? At first that sounds kind of crazy. If we attempt to do this in the same process as the code being profiled we have to make sure GC doesn't run or else heap objects might move. It could still be possible by reserving a large nursery before running the profiling code. I'm not sure.
I have also considered this possibility in the past. As you point out, GC is a significant problem. I don't see any way to address this without severely limiting the utility of the mechanism (e.g. you restrict the amount of traversal that can be done to that which can fit in your nursery). Intriguingly, using the new nonmoving garbage collector (~"nonmoving-gc") would avoid the problem of objects being moved during traversal, although would not save you from measurement artifacts due to the traversal logic finding objects that it creates (which in some cases may dwarf the original heap we are measuring).
For this reason I would suggest that we instead think about how to allow the heap traversal to run out of process. This is similar to the remote debugging capabilities available in most JVM implementations: you allow a process (the debuggee) to allow itself to be attached to from another process (the debugger). The debugger can then request that the debuggee pause, giving it a consistent view into the debuggee's heap (via mmap
, for instance). At this point the debugger can do whatever traversals it needs (with the ghc-heap
API). When it is finished the debugger can signal the debuggee to continue execution.
The basic idea here is fairly straightforward but there are a few details to be worked out:
-
How does the debugger enumerate the heap's "roots"?
-
How does signalling between the debugger and the debugee happen? In principle this whole mechanism could be implemented with
ptrace
, at the expense of cross-platform support. Alternatively, the RTS could expose a socket interface, like the JVM does. -
How does the debugee provide access to its heap?
- If ptrace is used then
PTRACE_PEEK*
could be used by the debugger. - On POSIX platforms the debugger could
mmap
the debuggee's/proc/*/mem
file - If the socket interface were a domain socket: On some platforms it might be possible the debuggee could send an
fd
representing their heap mapping to the debugger, which the debugger could thenmmap
- The socket protocol could define read operations allowing the debugger to request heap objects from the debuggee
Of these, (4) is by far the simplest and least platform dependent. However, it is also the least efficient, requiring two context switches per request. The cost of, e.g., walking a long linked list via this mechanism would like be prohibitive if naively implemented. For this reason I suspect the protocol would need to provide some sort of batching.
- If ptrace is used then
On the whole I suspect a simple socket protocol with option (4) could be reasonably simple. Moreover, the entire thing could be implemented in a pair of libraries:
- a
ghc-debugee
library would include a C start-up hook 1 which would open the socket and start a thread to serve requests, calling into the RTS to pause/resume execution when necessary (although figuring out how best to do this will take a bit of digging). - a
ghc-debugger-raw
library would provide the Haskell client bits. This would just be a small shim putting a safe interface around connecting to the debuggee, requesting a pause, and requesting objects from the debuggee, and deserialising them into theghc-heap
representation.
Finally, a ghc-debugger
library could offer a high-level interface which would provide a "pure" view into the debuggee's heap via laziness (which could even be automatically prefetched and batched into larger requests for efficiency). The only potential issue with this final layer is the possibility for leaks (e.g. it would be very easy for the debugger to inadvertently retain an full copy of the debuggee's heap in the form of the (much larger) ghc-heap
representation).
-
I do a trick similar to this in my ghc-eventlog-socket experiment.
↩