CAF reversion is generally broken
While looking at #22417 (closed) @AndreasK and I realized that CAF reversion is at best fragile and has been for quite some time.
In short, the debug RTS assumes that it can clobber CAFs' saved_info
field in newCAF
. A nearby comment offers this explanation for why this reuse is safe:
The
saved_info
field of the CAF is used as the link field fordebug_caf_list
, because this field is only used bynewDynCAF
for revertible CAFs, and we don't put those on thedebug_caf_list
.
Note that newDynCAF
mentioned above no longer exists but now rather seems to be called newRetainedCAF
.
This overloading of saved_info
is fine in the case that static linking is used, newRetainedCAF
is used in place of newCAF
when a CAF from dynamically-loaded code is entered. newRetainedCAF
will then add the CAF to revertible_caf_list
and lockCAF
will preserve the CAF's original info table in its saved_info
field. CAFs on revertible_caf_list
are treated as roots by the GC. When revertCAFs
is called we walk revertible_caf_list
and reset the info pointer of each to caf->saved_info
.
However, in the case that dynamic linking is used (as it is on most platforms at this point) we cannot redirect calls to newCAF
to newRetainedCAF
in loaded code (since we have no control over the dynamic loader's symbol resolution behavior). Consequently, we have a hack in newCAF
to make it behave somewhat like newRetainedCAF
. In this path, CAFs are added to another list, dyn_caf_list
(which is chained through static_link
), which revertCAFs
does not touch. This keepCAFs
codepath neglects to add the CAFs to the debug_caf_list
, which is perhaps okay since we will never collect these CAFs anyways.
For this reason CAF reversion can't be expected to work in the dynamic way. This explains some of the strange behavior I have observed in GHCi and while debugging CAF issues in the past.
Because breaking CAF reversion is quite bad for some uses, we at some point merged a patch from a third-party introducing the --high-mem-dynamic
RTS flag. This allows us to recover the ability to revert CAFs in dynamic objects by way of a rather fragile address check. This seems entirely untenable given the growing ubiquity of ASLR.
A better solution?
All of the above seems like a rather horrible state of affairs. The implementation is subtle, the semantics are poorly defined and context dependent, and in some cases we are forced on rely on fragile assumptions on address-space layout to try to recover sensible semantics.
However, the problem we are trying to solve is very simple: We need a way to distinguish CAFs defined in dynamically-loaded code (e.g. that loaded in GHCi) from those defined in the executable image. I suggest the following straightforward mechanism of addressing this problem:
- the RTS defines a global
revertible
variable - the code generator forms a linked list of all CAFs defined in a compilation unit via the initial values of their
static_link
fields - each compilation unit carries a static constructor which calls a registration function in the RTS.
- when called, the registration function will link all CAFs onto
revertible_caf_list
ifrevertible
is set; otherwise it will do nothing
Admittedly, I am a tad worried about the runtime overhead of this. However, we do similar things for IPE information and cost-centre registration so perhaps it is acceptable.
Relevant notes
-
Note [GHCi CAFs]
inrts/sm/Storage.c
-
Note [STATIC_LINK fields]
inrts/sm/Storage.h
-
Note [dyn_caf_list]
inrts/sm/Storage.c