noDuplicate# might have a race condition
Looking at the code of stg_noDuplicatezh
I believe I discovered a race condition resulting in the argument function potentially being executed twice. Although it should be very hard to trigger).
1: stg_noDuplicatezh /* no arg list: explicit stack layout */
2: {
3: // With a single capability there's no chance of work duplication.
4: CInt n_caps;
5: n_caps = %relaxed CInt[n_capabilities];
6: if (n_caps == 1 :: CInt) {
7: jump %ENTRY_CODE(Sp(0)) [];
8: }
9: ...
Scenario:
- We have one capability and two threads (A,B).
- Both threads want to evaluate a thunk whos rhs is something like
thunk = noDuplicate# (foo x)
- A: Get's to run first, we evaluate as usual and take the fast path in line 6 since we have just one capability.
- A: Jump to the entry code of the function under evaluation
- A: Fail a heap/stack check, we perform a context switch
- The scheduler switches over to thread B
- B: evaluates thunk, starts evaluating stg_noDuplicatezh as expected
- B: We take the fast path
- Now we have two threads both running the same function under evaluation despite noDuplicate#
Maybe I got something wrong. I will have to reread the note etc. but if that is true it's quite bad. Perhaps it would be simpler, safer and more efficient to re-implement noDuplicate# via a compare and swap operation instead somehow.