Data race in GC time accounting

The ThreadSanitizer work (!2642 (merged)) pointed out a rather nasty data race in our GC time accounting logic which I believe is responsible for #17964 (closed).

Specifically, it reports races of the following form:

WARNING: ThreadSanitizer: data race (pid=17126)                                                                         
  Read of size 8 at 0x7b50000004c0 by thread T4 (mutexes: write M18):                                                   
    #0 stat_endGC rts/Stats.c:510 (libHSrts-1.0_thr-ghc9.1.0.20200917.so+0x587db)                                       
    #1 GarbageCollect rts/sm/GC.c:962 (libHSrts-1.0_thr-ghc9.1.0.20200917.so+0x76a8a)                                   
    #2 scheduleDoGC rts/Schedule.c:1888 (libHSrts-1.0_thr-ghc9.1.0.20200917.so+0x50d59)                                 
    #3 schedule rts/Schedule.c:581 (libHSrts-1.0_thr-ghc9.1.0.20200917.so+0x524f8)                                      
    #4 scheduleWorker rts/Schedule.c:2671 (libHSrts-1.0_thr-ghc9.1.0.20200917.so+0x54dde)                               
    #5 workerStart rts/Task.c:446 (libHSrts-1.0_thr-ghc9.1.0.20200917.so+0x5c624)                                       
    #6 <null> <null> (libtsan.so.0+0x2cb4d)                                                                             
                                                                                                                        
  Previous write of size 8 at 0x7b50000004c0 by thread T6:                                                              
    #0 stat_endGCWorker rts/Stats.c:417 (libHSrts-1.0_thr-ghc9.1.0.20200917.so+0x5836a)                                 
    #1 gcWorkerThread rts/sm/GC.c:1284 (libHSrts-1.0_thr-ghc9.1.0.20200917.so+0x73c13)                                  
    #2 yieldCapability rts/Capability.c:924 (libHSrts-1.0_thr-ghc9.1.0.20200917.so+0x342ad)                             
    #3 scheduleYield rts/Schedule.c:716 (libHSrts-1.0_thr-ghc9.1.0.20200917.so+0x51af1)                                 
    #4 schedule rts/Schedule.c:317 (libHSrts-1.0_thr-ghc9.1.0.20200917.so+0x51af1)                                      
    #5 scheduleWorker rts/Schedule.c:2671 (libHSrts-1.0_thr-ghc9.1.0.20200917.so+0x54dde)                               
    #6 workerStart rts/Task.c:446 (libHSrts-1.0_thr-ghc9.1.0.20200917.so+0x5c624)                                       
    #7 <null> <null> (libtsan.so.0+0x2cb4d)                                                                             
                                                                                                                        
  Location is heap block of size 512 at 0x7b5000000400 allocated by main thread:                                        
    #0 malloc <null> (libtsan.so.0+0x2e1b3)                                                                             
    #1 stgMallocBytes rts/RtsUtils.c:64 (libHSrts-1.0_thr-ghc9.1.0.20200917.so+0x4cf9b)                                 
    #2 initGcThreads rts/sm/GC.c:1075 (libHSrts-1.0_thr-ghc9.1.0.20200917.so+0x730d5)                                   
    #3 storageAddCapabilities rts/sm/Storage.c:296 (libHSrts-1.0_thr-ghc9.1.0.20200917.so+0x888a2)                      
    #4 setNumCapabilities rts/Schedule.c:2338 (libHSrts-1.0_thr-ghc9.1.0.20200917.so+0x54306)                           
    #5 <null> <null> (libHSghc-9.1.0.20200917-ghc9.1.0.20200917.so+0x2a28baa)                                           
    #6 scheduleWaitThread rts/Schedule.c:2654 (libHSrts-1.0_thr-ghc9.1.0.20200917.so+0x54d70)                           
    #7 rts_evalLazyIO rts/RtsAPI.c:530 (libHSrts-1.0_thr-ghc9.1.0.20200917.so+0x47c53)                                  
    #8 hs_main rts/RtsMain.c:72 (libHSrts-1.0_thr-ghc9.1.0.20200917.so+0x4b9a6)                                         
    #9 main <null> (ghc+0x435b82)

The problem here is that the GC leader thread does not synchronize with gcWorkerThreads after requesting that they terminate. This means that the leader (T4, in this case) may arrive in stat_endGC, which sums up the GC CPU times of all GC worker threads, before the worker threads have had a chance to call stat_endGCWorker.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information