Fix thunk update ordering
Previously we attempted to ensure soundness of concurrent thunk update by synchronizing on the access of the thunk's info table pointer field. This was believed to be sufficient since the indirectee (which may expose a closure allocated by another core) would not be examined until the info table pointer update is complete.
However, it turns out that this can result in data races in the presence of multiple threads racing a update a single thunk. For instance, consider this interleaving under the old scheme:
Thread A Thread B
--------- ---------
t=0 Enter t
1 Push update frame
2 Begin evaluation
4 Pause thread
5 t.indirectee=tso
6 Release t.info=BLACKHOLE
7 ... (e.g. GC)
8 Resume thread
9 Finish evaluation
10 Relaxed t.indirectee=x
11 Load t.info
12 Acquire fence
13 Inspect t.indirectee
14 Release t.info=BLACKHOLE
Here Thread A enters thunk t
but is soon paused, resulting in t
being lazily blackholed at t=6. Then, at t=10 Thread A finishes
evaluation and updates t.indirectee
with a relaxed store.
Meanwhile, Thread B enters the blackhole. Under the old scheme this
would introduce an acquire-fence but this would only synchronize with
Thread A at t=6. Consequently, the result of the evaluation, x
, is not
visible to Thread B, introducing a data race.
We fix this by treating the indirectee
field as we do all other
mutable fields. This means we must always access this field with
acquire-loads and release-stores.
See #23185. Broken out from !10957 (closed)