Segmentation fault in concurrent program
@WJWH has recently been working on a io-uring
-based IO manager based on my bindings. However, in testing his branch he encountered a rather suspicious segmentation fault during GC. Specifically, using 8.10.1:
$ git clone https://github.com/bgamari/io-uring segfault
$ cd io-uring
$ cabal new-run -w ghc-8.10.1 segfault &
$ echo 'GET http://localhost:3000/' | nix run nixpkgs.vegeta -c vegeta attack -rate 8000 -body main.hs >log
Within seconds the executable will segmentation fault with something like the following:
0x000000000087084c in LOOKS_LIKE_INFO_PTR_NOT_NULL (p=12297829382473034410) at includes/rts/storage/ClosureMacros.h:244
244 return info->type != INVALID_OBJECT && info->type < N_CLOSURE_TYPES;
(gdb) bt
#0 0x000000000087084c in LOOKS_LIKE_INFO_PTR_NOT_NULL (p=12297829382473034410) at includes/rts/storage/ClosureMacros.h:244
#1 0x000000000087089b in LOOKS_LIKE_INFO_PTR (p=12297829382473034410) at includes/rts/storage/ClosureMacros.h:249
#2 0x00000000008708d3 in LOOKS_LIKE_CLOSURE_PTR (p=0x4200c17808) at includes/rts/storage/ClosureMacros.h:254
#3 0x0000000000871cdb in checkTSO (tso=0x4200da27e8) at rts/sm/Sanity.c:629
#4 0x0000000000871578 in checkClosure (p=0x4200da27e8) at rts/sm/Sanity.c:432
#5 0x0000000000871f70 in checkMutableList (mut_bd=0x4201203f80, gen=1) at rts/sm/Sanity.c:691
#6 0x000000000087202f in checkLocalMutableLists (cap_no=6) at rts/sm/Sanity.c:710
#7 0x000000000087205c in checkMutableLists () at rts/sm/Sanity.c:719
#8 0x000000000087254c in checkSanity (after_gc=true, major_gc=false) at rts/sm/Sanity.c:871
#9 0x0000000000863764 in GarbageCollect (collect_gen=0, do_heap_census=false, deadlock_detect=false, gc_type=2, cap=0xac80d0, idle_cap=0x7fff9c000f80) at rts/sm/GC.c:863
#10 0x00000000008489dc in scheduleDoGC (pcap=0x7fffa7ffee68, task=0x7fffcc000b70, force_major=false, deadlock_detect=false) at rts/Schedule.c:1853
#11 0x00000000008474ef in scheduleProcessInbox (pcap=0x7fffa7ffee68) at rts/Schedule.c:1009
#12 0x0000000000846d32 in scheduleFindWork (pcap=0x7fffa7ffee68) at rts/Schedule.c:627
#13 0x000000000084629f in schedule (initialCapability=0xac80d0, task=0x7fffcc000b70) at rts/Schedule.c:290
#14 0x0000000000849d0f in scheduleWorker (cap=0xac80d0, task=0x7fffcc000b70) at rts/Schedule.c:2617
#15 0x0000000000850c76 in workerStart (task=0x7fffcc000b70) at rts/Task.c:445
#16 0x00007ffff7f2eef7 in start_thread () from /nix/store/wx1vk75bpdr65g6xwxbj4rw0pk04v5j3-glibc-2.27/lib/libpthread.so.0
#17 0x00007ffff7dcc22f in clone () from /nix/store/wx1vk75bpdr65g6xwxbj4rw0pk04v5j3-glibc-2.27/lib/libc.so.6
(gdb) up
#1 0x000000000087089b in LOOKS_LIKE_INFO_PTR (p=12297829382473034410) at includes/rts/storage/ClosureMacros.h:249
249 return p && (IS_FORWARDING_PTR(p) || LOOKS_LIKE_INFO_PTR_NOT_NULL(p));
(gdb)
#2 0x00000000008708d3 in LOOKS_LIKE_CLOSURE_PTR (p=0x4200c17808) at includes/rts/storage/ClosureMacros.h:254
254 return LOOKS_LIKE_INFO_PTR((StgWord)
(gdb)
#3 0x0000000000871cdb in checkTSO (tso=0x4200da27e8) at rts/sm/Sanity.c:629
629 ASSERT(LOOKS_LIKE_CLOSURE_PTR(tso->global_link) &&
(gdb) print tso->global_link
$1 = (struct StgTSO_ *) 0x4200c17808
(gdb) x/8a 0x4200c17808
0x4200c17808: 0xaaaaaaaaaaaaaaaa 0xaaaaaaaaaaaaaaaa
0x4200c17818: 0xaaaaaaaaaaaaaaaa 0xaaaaaaaaaaaaaaaa
0x4200c17828: 0xaaaaaaaaaaaaaaaa 0xaaaaaaaaaaaaaaaa
0x4200c17838: 0xaaaaaaaaaaaaaaaa 0xaaaaaaaaaaaaaaaa
It appears as though we somehow end up with a TSO on the heap with an invalid global_link
field.