Skip to content

Subtle data race in blackhole claim logic in threadPaused

While looking at #20093 (closed) I stumbled upon this data race report from TSAN:

Stderr ( T10414 ):
==================
WARNING: ThreadSanitizer: data race (pid=735365)
  Write of size 8 at 0x7ee01791b4b8 by main thread:
    #0 threadPaused rts/ThreadPaused.c:355 (T10414+0x77c8c3)
    #1 stg_returnToSched <null> (T10414+0x7b27e5)
    #2 scheduleWaitThread rts/Schedule.c:2651 (T10414+0x77395f)
    #3 rts_evalLazyIO rts/RtsAPI.c:566 (T10414+0x7ca573)
    #4 hs_main rts/RtsMain.c:72 (T10414+0x76a8d5)
    #5 main <null> (T10414+0x4166d1)

  Previous atomic read of size 8 at 0x7ee01791b4b8 by thread T6:
    #0 __tsan_atomic64_load <null> (libtsan.so.0+0x67489)
    #1 messageBlackHole rts/Messages.c:188 (T10414+0x7c4924)
    #2 stg_BLACKHOLE_info <null> (T10414+0x7b1a99)
    #3 scheduleWorker rts/Schedule.c:2668 (T10414+0x7739df)
    #4 workerStart rts/Task.c:445 (T10414+0x77b126)
    #5 <null> <null> (libtsan.so.0+0x2e0b6)

  Thread T6 (tid=735382, running) created by thread T4 at:
    #0 pthread_create <null> (libtsan.so.0+0x3055b)
    #1 createOSThread rts/posix/OSThreads.c:166 (T10414+0x7aa1cf)
    #2 startWorkerTask rts/Task.c:497 (T10414+0x77bb7a)
    #3 releaseCapability_ rts/Capability.c:588 (T10414+0x7625e7)
    #4 suspendThread rts/Schedule.c:2502 (T10414+0x773093)
    #5 <null> <null> (T10414+0x6c33da)
    #6 scheduleWorker rts/Schedule.c:2668 (T10414+0x7739df)
    #7 workerStart rts/Task.c:445 (T10414+0x77b126)
    #8 <null> <null> (libtsan.so.0+0x2e0b6)

SUMMARY: ThreadSanitizer: data race rts/ThreadPaused.c:355 in threadPaused

I believe this could come to bite us in the following scenario:

  1. Thread 1 is created
  2. Thread 1 simultaneously start evaluation of Thunk A
  3. Thread 1 suspends mutation and calls threadPaused
  4. Thread 2 tries to enter Thunk A and enters messageBlackHole (via stg_BLACKHOLE_info)
  5. Thread 1 in threadPaused successfully whiteholes Thunk A and non-atomically writes its TSO as Thunk A's indirectee
  6. Thread 2 in messageBlackHole reads Thunk A's info table, notes that it's a whitewhole, and acquire-reads the indirectee (which now points to Thread 1's TSO)
  7. Thread 2 in messageBlackHole acquire-reads the info table of Thread 1's TSO. However, the previous write to this field (in step (5)) was not atomic and therefore the TSO may not be visible to Thread 2, resulting in an undefined read
Edited by Ben Gamari
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information