Logging of "heap allocated" events is racy
While working on !1603 (closed) I encountered the following race:
WARNING: ThreadSanitizer: data race (pid=19339)
Read of size 8 at 0x7b1400000058 by main thread (mutexes: write M13):
#0 hasRoomForEvent rts/eventlog/EventLog.c:1357 (cgrun065+0x000000742d9b)
#1 ensureRoomForEvent rts/eventlog/EventLog.c:1378 (cgrun065+0x000000742d9b)
#2 postHeapEvent rts/eventlog/EventLog.c:890 (cgrun065+0x000000742d9b)
#3 traceHeapEvent_ rts/Trace.c:331 (cgrun065+0x00000073cb3f)
#4 traceEventHeapAllocated rts/Trace.h:720 (cgrun065+0x00000075af46)
#5 calcTotalAllocated rts/sm/Storage.c:1194 (cgrun065+0x00000075af46)
#6 stat_endGC rts/Stats.c:337 (cgrun065+0x000000735951)
#7 GarbageCollect rts/sm/GC.c:862 (cgrun065+0x00000074db5b)
#8 scheduleDoGC rts/Schedule.c:1806 (cgrun065+0x00000072e9b7)
#9 schedule rts/Schedule.c:552 (cgrun065+0x00000072fbf7)
#10 scheduleWaitThread rts/Schedule.c:2549 (cgrun065+0x00000073265d)
#11 rts_evalLazyIO rts/RtsAPI.c:530 (cgrun065+0x00000077a521)
#12 hs_main rts/RtsMain.c:72 (cgrun065+0x000000729eb5)
#13 main <null> (cgrun065+0x000000414199)
Previous write of size 8 at 0x7b1400000058 by thread T4:
#0 postWord8 rts/eventlog/EventLog.c:142 (cgrun065+0x000000745d34)
#1 postWord16 rts/eventlog/EventLog.c:147 (cgrun065+0x000000745d34)
#2 postEventTypeNum rts/eventlog/EventLog.c:189 (cgrun065+0x000000745d34)
#3 postEventHeader rts/eventlog/EventLog.c:220 (cgrun065+0x000000745d34)
#4 postEvent rts/eventlog/EventLog.c:1012 (cgrun065+0x000000745d34)
#5 traceGcEvent_ rts/Trace.c:303 (cgrun065+0x00000073cacf)
#6 traceEventGcIdle rts/Trace.h:654 (cgrun065+0x00000074b6bd)
#7 scavenge_until_all_done rts/sm/GC.c:1111 (cgrun065+0x00000074b6bd)
#8 gcWorkerThread rts/sm/GC.c:1164 (cgrun065+0x00000074eeb0)
#9 yieldCapability rts/Capability.c:899 (cgrun065+0x000000724721)
#10 scheduleYield rts/Schedule.c:681 (cgrun065+0x00000073016a)
#11 schedule rts/Schedule.c:295 (cgrun065+0x00000073016a)
#12 scheduleWorker rts/Schedule.c:2566 (cgrun065+0x0000007326ce)
#13 workerStart rts/Task.c:445 (cgrun065+0x0000007393b5)
#14 <null> <null> (libtsan.so.0+0x000000028d5b)
Location is heap block of size 80 at 0x7b1400000050 allocated by main thread:
#0 malloc <null> (libtsan.so.0+0x00000002b251)
#1 stgMallocBytes rts/RtsUtils.c:64 (cgrun065+0x00000072b10b)
#2 moreCapEventBufs rts/eventlog/EventLog.c:557 (cgrun065+0x00000073eda7)
#3 initEventLogging rts/eventlog/EventLog.c:510 (cgrun065+0x00000073ef66)
#4 initTracing rts/Trace.c:102 (cgrun065+0x00000073c910)
#5 hs_init_ghc rts/RtsStartup.c:228 (cgrun065+0x00000072acc7)
#6 hs_main rts/RtsMain.c:57 (cgrun065+0x000000729e8b)
#7 main <null> (cgrun065+0x000000414199)
Mutex M13 (0x0000009124c0) created at:
#0 pthread_mutex_init <null> (libtsan.so.0+0x00000002c81e)
#1 initMutex rts/posix/OSThreads.c:170 (cgrun065+0x00000075e407)
#2 initStorage rts/sm/Storage.c:148 (cgrun065+0x0000007586eb)
#3 hs_init_ghc rts/RtsStartup.c:245 (cgrun065+0x00000072ace0)
#4 hs_main rts/RtsMain.c:57 (cgrun065+0x000000729e8b)
#5 main <null> (cgrun065+0x000000414199)
Thread T4 (tid=19344, running) created by thread T3 at:
#0 pthread_create <null> (libtsan.so.0+0x00000002c010)
#1 createOSThread rts/posix/OSThreads.c:137 (cgrun065+0x00000075e37f)
#2 startWorkerTask rts/Task.c:497 (cgrun065+0x000000739dca)
#3 releaseCapability_ rts/Capability.c:567 (cgrun065+0x000000723e17)
#4 suspendThread rts/Schedule.c:2414 (cgrun065+0x000000731eb4)
#5 base_GHCziEventziEPoll_new9_info <null> (cgrun065+0x000000676472)
#6 scheduleWorker rts/Schedule.c:2566 (cgrun065+0x0000007326ce)
#7 workerStart rts/Task.c:445 (cgrun065+0x0000007393b5)
#8 <null> <null> (libtsan.so.0+0x000000028d5b)
SUMMARY: ThreadSanitizer: data race rts/eventlog/EventLog.c:1357 in hasRoomForEvent
In short, calcTotalAllocated
uses each capabilities' event buffer to log that capability's "heap allocated" event. However, there was never any sort of memory barrier ensuring that threads' local modifications to their buffer are made visible to the main GC thread.