Skip to content

Data race in compact_gc

While looking at !1603 (closed) I found the following data race triggered by the compact_gc test in the threaded2 way:

==================
WARNING: ThreadSanitizer: data race (pid=29004)
  Atomic read of size 2 at 0x7ee5c5700168 by main thread (mutexes: write M16, write M28):                                                                                                                                                                                                
    #0 __tsan_atomic16_load <null> (libtsan.so.0+0x000000060eec)
    #1 evacuate rts/sm/Evac.c:637 (compact_gc+0x000000411982)
    #2 evacuate_hash_entry rts/sm/Scav.c:165 (compact_gc+0x00000087453e)
    #3 mapHashTable rts/Hash.c:394 (compact_gc+0x000000845e27)
    #4 scavenge_compact rts/sm/Scav.c:182 (compact_gc+0x000000875037)                                                                                                                                                                                                                    
    #5 scavenge_one rts/sm/Scav.c:1551 (compact_gc+0x000000875037)
    #6 scavenge_large rts/sm/Scav.c:2008 (compact_gc+0x000000876afa)
    #7 scavenge_find_work rts/sm/Scav.c:2068 (compact_gc+0x000000876afa)
    #8 scavenge_loop rts/sm/Scav.c:2137 (compact_gc+0x000000876afa)                                                                                                                                                                                                                      
    #9 scavenge_until_all_done rts/sm/GC.c:1112 (compact_gc+0x00000086a4fb)
    #10 GarbageCollect rts/sm/GC.c:438 (compact_gc+0x00000086bc31)
    #11 scheduleDoGC rts/Schedule.c:1814 (compact_gc+0x00000084db15)
    #12 schedule rts/Schedule.c:552 (compact_gc+0x00000084eea7)                                                                                                                                                                                                                          
    #13 scheduleWaitThread rts/Schedule.c:2559 (compact_gc+0x000000851b8d)
    #14 rts_evalLazyIO rts/RtsAPI.c:530 (compact_gc+0x0000008991d1)
    #15 hs_main rts/RtsMain.c:72 (compact_gc+0x000000849215)
    #16 main <null> (compact_gc+0x000000414a41)                                                                                                                                                                                                                                          

  Previous write of size 8 at 0x7ee5c5700168 by thread T6:
    #0 mmap <null> (libtsan.so.0+0x00000005d920)
    #1 my_mmap rts/posix/OSMem.c:240 (compact_gc+0x00000087cae2)
    #2 osCommitMemory rts/posix/OSMem.c:599 (compact_gc+0x00000087d345)
    #3 getFreshMBlocks rts/sm/MBlock.c:205 (compact_gc+0x00000086fef3)
    #4 getCommittedMBlocks rts/sm/MBlock.c:216 (compact_gc+0x00000086fef3)
    #5 getMBlocks rts/sm/MBlock.c:580 (compact_gc+0x000000870218)
    #6 alloc_mega_group rts/sm/BlockAlloc.c:378 (compact_gc+0x000000868be1)
    #7 allocGroupOnNode rts/sm/BlockAlloc.c:429 (compact_gc+0x00000086970d)
    #8 allocLargeChunkOnNode rts/sm/BlockAlloc.c:506 (compact_gc+0x000000869e32)
    #9 allocBlocks_sync rts/sm/GCUtils.c:59 (compact_gc+0x00000086f8ee)
    #10 alloc_todo_block rts/sm/GCUtils.c:329 (compact_gc+0x00000086f8ee)
    #11 todo_block_full rts/sm/GCUtils.c:298 (compact_gc+0x00000086fc5d)
    #12 alloc_for_copy rts/sm/Evac.c:80 (compact_gc+0x00000040e3f5)
    #13 copy_tag_nolock rts/sm/Evac.c:157 (compact_gc+0x00000040e3f5)
    #14 evacuate rts/sm/Evac.c:706 (compact_gc+0x00000040e3f5)
    #15 scavenge_block rts/sm/Scav.c:496 (compact_gc+0x000000409024)
    #16 scavenge_find_work rts/sm/Scav.c:2061 (compact_gc+0x0000008768e7)
    #17 scavenge_loop rts/sm/Scav.c:2137 (compact_gc+0x0000008768e7)
    #18 scavenge_until_all_done rts/sm/GC.c:1112 (compact_gc+0x00000086a4fb)
    #19 gcWorkerThread rts/sm/GC.c:1185 (compact_gc+0x00000086e368)
    #20 yieldCapability rts/Capability.c:904 (compact_gc+0x000000843a65)
    #21 scheduleYield rts/Schedule.c:681 (compact_gc+0x00000084f452)
    #22 schedule rts/Schedule.c:295 (compact_gc+0x00000084f452)
    #23 scheduleWorker rts/Schedule.c:2576 (compact_gc+0x000000851bfe)
    #24 workerStart rts/Task.c:445 (compact_gc+0x000000858815)
    #25 <null> <null> (libtsan.so.0+0x000000028d5b)

  Mutex M16 (0x000000a530c0) created at:
    #0 pthread_mutex_init <null> (libtsan.so.0+0x00000002c81e)
    #1 initMutex rts/posix/OSThreads.c:170 (compact_gc+0x00000087d777)
    #2 initStorage rts/sm/Storage.c:148 (compact_gc+0x00000087791b)
    #3 hs_init_ghc rts/RtsStartup.c:245 (compact_gc+0x00000084a040)
    #4 hs_main rts/RtsMain.c:57 (compact_gc+0x0000008491eb)
    #5 main <null> (compact_gc+0x000000414a41)

  Mutex M28 (0x000000a52320) created at:
    #0 pthread_mutex_init <null> (libtsan.so.0+0x00000002c81e)
    #1 initMutex rts/posix/OSThreads.c:170 (compact_gc+0x00000087d777)
    #2 initStablePtrTable rts/StablePtr.c:162 (compact_gc+0x0000008538c1)
    #3 initStablePtrTable rts/StablePtr.c:155 (compact_gc+0x0000008539b4)
    #4 hs_init_ghc rts/RtsStartup.c:248 (compact_gc+0x00000084a045)
    #5 hs_main rts/RtsMain.c:57 (compact_gc+0x0000008491eb)
    #6 main <null> (compact_gc+0x000000414a41)

  Thread T6 (tid=29011, running) created by thread T4 at:
    #0 pthread_create <null> (libtsan.so.0+0x00000002c010)
    #1 createOSThread rts/posix/OSThreads.c:137 (compact_gc+0x00000087d6ef)
    #2 startWorkerTask rts/Task.c:497 (compact_gc+0x00000085922a)
    #3 releaseCapability_ rts/Capability.c:567 (compact_gc+0x000000843197)
    #4 suspendThread rts/Schedule.c:2424 (compact_gc+0x0000008513e4)
    #5 <null> <null> (compact_gc+0x0000007ad139)
    #6 scheduleWorker rts/Schedule.c:2576 (compact_gc+0x000000851bfe)
    #7 workerStart rts/Task.c:445 (compact_gc+0x000000858815)
    #8 <null> <null> (libtsan.so.0+0x000000028d5b)

SUMMARY: ThreadSanitizer: data race (/nix/store/c7hj2bk4aqgpb3q0h5xhq7lag0lq3jm7-gcc-7.4.0-lib/lib/libtsan.so.0+0x60eec) in __tsan_atomic16_load

The load in question in evacuate is:

      StgClosure *e = (StgClosure*)UN_FORWARDING_PTR(info);                                                                                                                                                                                                                              
       RELAXED_STORE(p, TAG_CLOSURE(tag,e));                                                                                                                                                                                                                                              
       if (gen_no < gct->evac_gen_no) {  // optimisation                                                                                                                                                                                                                                  
           if (RELAXED_LOAD(&Bdescr((P_)e)->gen_no)    // <===== this, I think
                      < gct->evac_gen_no) {                                                                                                                                                                                                                 
               gct->failed_to_evac = true;                                                                                                                                                                                                                                                
               TICK_GC_FAILED_PROMOTION();                                                                                                                                                                                                                                                
           }                                                                                                                                                                                                                                                                              
       }                                                                                                                                                                                                                                                                                  
       return;                                          

Given that I only see this in compact_gc, I suspect this is a bug in the CNF compactor which fails to initialize block descriptors correctly.

Edited by Ben Gamari
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information