Subtle data race in blackhole claim logic in threadPaused
While looking at #20093 (closed) I stumbled upon this data race report from TSAN:
Stderr ( T10414 ):
==================
WARNING: ThreadSanitizer: data race (pid=735365)
Write of size 8 at 0x7ee01791b4b8 by main thread:
#0 threadPaused rts/ThreadPaused.c:355 (T10414+0x77c8c3)
#1 stg_returnToSched <null> (T10414+0x7b27e5)
#2 scheduleWaitThread rts/Schedule.c:2651 (T10414+0x77395f)
#3 rts_evalLazyIO rts/RtsAPI.c:566 (T10414+0x7ca573)
#4 hs_main rts/RtsMain.c:72 (T10414+0x76a8d5)
#5 main <null> (T10414+0x4166d1)
Previous atomic read of size 8 at 0x7ee01791b4b8 by thread T6:
#0 __tsan_atomic64_load <null> (libtsan.so.0+0x67489)
#1 messageBlackHole rts/Messages.c:188 (T10414+0x7c4924)
#2 stg_BLACKHOLE_info <null> (T10414+0x7b1a99)
#3 scheduleWorker rts/Schedule.c:2668 (T10414+0x7739df)
#4 workerStart rts/Task.c:445 (T10414+0x77b126)
#5 <null> <null> (libtsan.so.0+0x2e0b6)
Thread T6 (tid=735382, running) created by thread T4 at:
#0 pthread_create <null> (libtsan.so.0+0x3055b)
#1 createOSThread rts/posix/OSThreads.c:166 (T10414+0x7aa1cf)
#2 startWorkerTask rts/Task.c:497 (T10414+0x77bb7a)
#3 releaseCapability_ rts/Capability.c:588 (T10414+0x7625e7)
#4 suspendThread rts/Schedule.c:2502 (T10414+0x773093)
#5 <null> <null> (T10414+0x6c33da)
#6 scheduleWorker rts/Schedule.c:2668 (T10414+0x7739df)
#7 workerStart rts/Task.c:445 (T10414+0x77b126)
#8 <null> <null> (libtsan.so.0+0x2e0b6)
SUMMARY: ThreadSanitizer: data race rts/ThreadPaused.c:355 in threadPaused
I believe this could come to bite us in the following scenario:
- Thread 1 is created
- Thread 1 simultaneously start evaluation of Thunk A
- Thread 1 suspends mutation and calls
threadPaused - Thread 2 tries to enter Thunk A and enters
messageBlackHole(viastg_BLACKHOLE_info) - Thread 1 in
threadPausedsuccessfully whiteholes Thunk A and non-atomically writes its TSO as Thunk A's indirectee - Thread 2 in
messageBlackHolereads Thunk A's info table, notes that it's a whitewhole, and acquire-reads the indirectee (which now points to Thread 1's TSO) - Thread 2 in
messageBlackHoleacquire-reads the info table of Thread 1's TSO. However, the previous write to this field (in step (5)) was not atomic and therefore the TSO may not be visible to Thread 2, resulting in an undefined read
Edited by Ben Gamari