Skip to content

Segfault during nonmoving marking of weak ptrs (GHC-9.2.6)

Summary

We've been running into some segfaults while using the non-moving gc with GHC-9.2.6 in production recently. Here is a back trace:

#0  mark_closure (queue=queue@entry=0x7f2340016e70, p0=0x4d5497fc51, origin=origin@entry=0x0) at rts/sm/NonMovingMark.c:1427
#1  0x0000000000417473 in nonmovingMark (budget=budget@entry=0x7f1f4d7f9c68, queue=queue@entry=0x7f2340016e70) at rts/sm/NonMovingMark.c:1762
#2  0x000000000b4d0a1c in nonmovingMarkThreadsWeaks (budget=budget@entry=0x7f1f4d7f9c68, mark_queue=mark_queue@entry=0x7f2340016e70) at rts/sm/NonMoving.c:1024
#3  0x000000000b4d0afd in nonmovingMark_ (mark_queue=0x7f2340016e70, dead_weaks=dead_weaks@entry=0x7f1f4d7f9ca0, resurrected_threads=resurrected_threads@entry=0x7f1f4d7f9ca8)
    at rts/sm/NonMoving.c:1099
#4  0x000000000b4d0dc2 in nonmovingConcurrentMark (data=<optimized out>) at rts/sm/NonMoving.c:1044
#5  0x00007f23a381e609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6  0x00007f23a352f133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

As far as I can tell from this and poking around in gdb, at some point during the marking of weak pointers, a rubbish closure gets put on the mark queue.

This is probably related to the fixes for #22264 (closed)

As our codebase is proprietary and this has only happened so far in production. It will be tricky to extract a reproducer.

In the meantime if there's any information you'd like us to extract from the core dumps do let us know.

Environment

  • GHC version used: GHC-9.2.6

Optional:

  • Operating System: Ubuntu
  • System Architecture: x86_64-linux
Edited by Teo Camarasu
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information