Segfault during nonmoving marking of weak ptrs (GHC-9.2.6)
Summary
We've been running into some segfaults while using the non-moving gc with GHC-9.2.6 in production recently. Here is a back trace:
#0 mark_closure (queue=queue@entry=0x7f2340016e70, p0=0x4d5497fc51, origin=origin@entry=0x0) at rts/sm/NonMovingMark.c:1427
#1 0x0000000000417473 in nonmovingMark (budget=budget@entry=0x7f1f4d7f9c68, queue=queue@entry=0x7f2340016e70) at rts/sm/NonMovingMark.c:1762
#2 0x000000000b4d0a1c in nonmovingMarkThreadsWeaks (budget=budget@entry=0x7f1f4d7f9c68, mark_queue=mark_queue@entry=0x7f2340016e70) at rts/sm/NonMoving.c:1024
#3 0x000000000b4d0afd in nonmovingMark_ (mark_queue=0x7f2340016e70, dead_weaks=dead_weaks@entry=0x7f1f4d7f9ca0, resurrected_threads=resurrected_threads@entry=0x7f1f4d7f9ca8)
at rts/sm/NonMoving.c:1099
#4 0x000000000b4d0dc2 in nonmovingConcurrentMark (data=<optimized out>) at rts/sm/NonMoving.c:1044
#5 0x00007f23a381e609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f23a352f133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
As far as I can tell from this and poking around in gdb, at some point during the marking of weak pointers, a rubbish closure gets put on the mark queue.
This is probably related to the fixes for #22264
As our codebase is proprietary and this has only happened so far in production. It will be tricky to extract a reproducer.
In the meantime if there's any information you'd like us to extract from the core dumps do let us know.
Environment
- GHC version used: GHC-9.2.6
Optional:
- Operating System: Ubuntu
- System Architecture: x86_64-linux