Skip to content

noDuplicate# might have a race condition

Looking at the code of stg_noDuplicatezh I believe I discovered a race condition resulting in the argument function potentially being executed twice. Although it should be very hard to trigger).

1: stg_noDuplicatezh /* no arg list: explicit stack layout */
2: {
3:    // With a single capability there's no chance of work duplication.
4:    CInt n_caps;
5:    n_caps = %relaxed CInt[n_capabilities];
6:    if (n_caps == 1 :: CInt) {
7:        jump %ENTRY_CODE(Sp(0)) [];
8:    }
9:    ...
 

Scenario:

  • We have one capability and two threads (A,B).
  • Both threads want to evaluate a thunk whos rhs is something like thunk = noDuplicate# (foo x)
  • A: Get's to run first, we evaluate as usual and take the fast path in line 6 since we have just one capability.
  • A: Jump to the entry code of the function under evaluation
  • A: Fail a heap/stack check, we perform a context switch
  • The scheduler switches over to thread B
  • B: evaluates thunk, starts evaluating stg_noDuplicatezh as expected
  • B: We take the fast path
  • Now we have two threads both running the same function under evaluation despite noDuplicate#

Maybe I got something wrong. I will have to reread the note etc. but if that is true it's quite bad. Perhaps it would be simpler, safer and more efficient to re-implement noDuplicate# via a compare and swap operation instead somehow.

Edited by Andreas Klebinger
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information