Subtle data race in blackhole claim logic in threadPaused
While looking at #20093 (closed) I stumbled upon this data race report from TSAN:
Stderr ( T10414 ):
==================
WARNING: ThreadSanitizer: data race (pid=735365)
Write of size 8 at 0x7ee01791b4b8 by main thread:
#0 threadPaused rts/ThreadPaused.c:355 (T10414+0x77c8c3)
#1 stg_returnToSched <null> (T10414+0x7b27e5)
#2 scheduleWaitThread rts/Schedule.c:2651 (T10414+0x77395f)
#3 rts_evalLazyIO rts/RtsAPI.c:566 (T10414+0x7ca573)
#4 hs_main rts/RtsMain.c:72 (T10414+0x76a8d5)
#5 main <null> (T10414+0x4166d1)
Previous atomic read of size 8 at 0x7ee01791b4b8 by thread T6:
#0 __tsan_atomic64_load <null> (libtsan.so.0+0x67489)
#1 messageBlackHole rts/Messages.c:188 (T10414+0x7c4924)
#2 stg_BLACKHOLE_info <null> (T10414+0x7b1a99)
#3 scheduleWorker rts/Schedule.c:2668 (T10414+0x7739df)
#4 workerStart rts/Task.c:445 (T10414+0x77b126)
#5 <null> <null> (libtsan.so.0+0x2e0b6)
Thread T6 (tid=735382, running) created by thread T4 at:
#0 pthread_create <null> (libtsan.so.0+0x3055b)
#1 createOSThread rts/posix/OSThreads.c:166 (T10414+0x7aa1cf)
#2 startWorkerTask rts/Task.c:497 (T10414+0x77bb7a)
#3 releaseCapability_ rts/Capability.c:588 (T10414+0x7625e7)
#4 suspendThread rts/Schedule.c:2502 (T10414+0x773093)
#5 <null> <null> (T10414+0x6c33da)
#6 scheduleWorker rts/Schedule.c:2668 (T10414+0x7739df)
#7 workerStart rts/Task.c:445 (T10414+0x77b126)
#8 <null> <null> (libtsan.so.0+0x2e0b6)
SUMMARY: ThreadSanitizer: data race rts/ThreadPaused.c:355 in threadPaused
I believe this could come to bite us in the following scenario:
- Thread 1 is created
- Thread 1 simultaneously start evaluation of Thunk A
- Thread 1 suspends mutation and calls
threadPaused
- Thread 2 tries to enter Thunk A and enters
messageBlackHole
(viastg_BLACKHOLE_info
) - Thread 1 in
threadPaused
successfully whiteholes Thunk A and non-atomically writes its TSO as Thunk A's indirectee - Thread 2 in
messageBlackHole
reads Thunk A's info table, notes that it's a whitewhole, and acquire-reads the indirectee (which now points to Thread 1's TSO) - Thread 2 in
messageBlackHole
acquire-reads the info table of Thread 1's TSO. However, the previous write to this field (in step (5)) was not atomic and therefore the TSO may not be visible to Thread 2, resulting in an undefined read