Propagate breakpoints when inlining
The goal of this MR is to enable breakpoint preservation when inlining code as described in #23394 (closed).
Update: I've gained some confidence about the constraints involved in this topic and came up with a solution that seems reasonable to me. I've kept my previously confused thoughts about it in this description below in the collapsible section.
Quick summary of the changes:
- Add constructor
IfaceBreakpointtoIfaceTickish - Store breakpoint data in interface files
- Store
BreakArrayfor the breakpoint's module, not the current module, in BCOs - Store module name in BCOs instead of
Unique, since theUniquefrom anIfacedoesn't match the modules in GHCi's state - Allocate module name in
ModBreaks, likeBreakArray - Lookup breakpoint by module name in GHCi
- Skip creating breakpoint instructions when no
ModBreaksare available, rather than injectingModBreaksin the linker when breakpoints are enabled, and panicking whenModBreaksis missing
I decided that the contortions made with the ModBreaks in the BCGen are unnecessary, so I implemented that when ModBreaks are Nothing (either from the arg given to byteCodeGen or the HPT for foreign-module breakpoints), we assume that breakpoints are undesired for this expression and completely omit them from the BCOs.
My understanding is that ModBreaks are optional primarily because of TH (and pending !10466 (closed) because the user explicitly disabled them), but please disabuse me of that misconception if I'm mistaken.
In any case, it doesn't seem dangerous to not panic when ModBreaks are missing for obscure reasons (there's a test in which a boot module is reloaded and ModBreaks is reset, which results in a core lint error when adding inlining, but that might better be handled in !10550 (closed) or !10563 (closed)
Work sponsored by Tweag
Previous description
My implementation is comically clunky since I've figured out how all of this works while writing it, so I'd appreciate any advice about more appropriate solutions.
The first assumption I made is that breakpoint data needs to be propagated across the Iface boundary, including the module in which it was first created, which was pretty straightforward to achieve.
As a result, an inlined function's breakpoint would show up in StgToByteCode.schemeER_wrk with its defining module attached.
As a naive attempt, I replaced the Unique that was previously obtained from the currently processed module with the one from the breakpoint, but that did not have the desired result – the Unique stored in GHCi's state turned out to be different from the one in the Iface data.
My next assumptions about what needed to be done were:
- Like cost centres, the breakpoint information relayed to GHCi must contain the module name instead of the
Unique - The array that the RTS consults when encountering a break site needs to be the
modBreaks_flagsfor the breakpoint's module, not the current module's referenced byBCOPtrBreakArray
For the first part, I had a hard time figuring out how to send a string from StgToByteCode through the RTS to GHCi, and after many trials and errors I achieved it by using the GHCi message mallocData.
This is certainly a pretty bad solution, since it allocates a new string for each breakpoint with the same contents.
A nicer solution might be to add another ForeignRef to ModBreaks, but that is only available as Maybe for the current module (although mallocData also isn't unconditionally available, so both solutions are equally bad in that regard).
On the GHCi side, I then use the module to lookup the breakpoints in the HPT, which works fine.
I had some trouble figuring out what type to use for the BreakpointCallback argument that doesn't segfault, and only had success with a Word# that I then assembled into a RemotePtr.
There's surely a better way to do this as well.
For the break array, I looked up the ModBreaks of the breakpoints module in the HPT in StgToByteCode.
I don't know whether it's guaranteed that all local dependencies are present in there, and I assume that this won't work at all for Iface data from an object file – for the latter, I assume we can't use breakpoints anyway, so maybe the breakpoint should just be discarded in that case.
Anyway, right now the code assumes that a missing ModBreaks means that the breakpoint is from the current module.
I modified BCOPtrBreakArray to take a Maybe (ForeignRef BreakArray) since that was the only way I managed to encode the pointer without segfaulting.
Ultimately, I didn't have to make any changes to the RTS – all arguments have the same representation as before.
As you can see from the test I added, it works for a simple case! But there is certainly a lot to take care of still, or do differently, so I'd appreciate any suggestions and elucidations of my mistakes.