Segfault during nonblocking GC
After having switched to the new nonblocking GC in production we've been experiencing frequent segfaults.
I've been able to extract the following stack trace from a core dump1:
#0 isAlive (p=0x4214d682b0) at rts/sm/GCAux.c:96 #1 0x000000000e45de5a in tidyWeakList (gen=<optimized out>) at rts/sm/MarkWeak.c:266 #2 0x000000000e45e119 in traverseWeakPtrList (dead_weak_ptr_list=dead_weak_ptr_list@entry=0x7f156d7f9760, resurrected_threads=resurrected_threads@entry=0x7f156d7f9768) at rts/sm/MarkWeak.c:124 #3 0x000000000e43f47d in GarbageCollect (collect_gen=<optimized out>, collect_gen@entry=1, do_heap_census=do_heap_census@entry=false, deadlock_detect=deadlock_detect@entry=false, gc_type=gc_type@entry=2, cap=0x12194720, cap@entry=0x121dc180, idle_cap=idle_cap@entry=0x7f154801c220) at rts/sm/GC.c:456 #4 0x000000000e431f7e in scheduleDoGC (pcap=pcap@entry=0x7f156d7f99f0, task=task@entry=0x7f1578000b70, force_major=force_major@entry=false, deadlock_detect=deadlock_detect@entry=false) at rts/Schedule.c:1853 #5 0x000000000e432db8 in schedule (initialCapability=initialCapability@entry=0x12194720, task=task@entry=0x7f1578000b70) at rts/Schedule.c:565 #6 0x000000000e433e1c in scheduleWorker (cap=cap@entry=0x12194720, task=task@entry=0x7f1578000b70) at rts/Schedule.c:2617 #7 0x000000000e4300d3 in workerStart (task=0x7f1578000b70) at rts/Task.c:445 #8 0x00007f1595ff06db in start_thread (arg=0x7f156d7fa700) at pthread_create.c:463 #9 0x00007f1595489a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
I'm sure I can spare some time to further dissect the core dump (with some guidance) if there is interest to do so.
Steps to reproduce
I've been unable to reproduce the issue during testing or running in a local development environment; I assume it's triggered by the higher load of the production environment.
The project is a ~60k LOC Yesod-based and database-heavy webapplication.
An absence of segfaults ;)
GHC version used: 8.10.2
Operating System: Ubuntu 18.04.5 LTS (Bionic Beaver)
Linux 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
I'm sadly unable to share the full core dump since it likely contains personal information of third parties I'm legally unable to disclose