Seg fault with nonmoving collector and GHC-9.2.3
I've been running into some intermittent seg faults when running an app with the nonmoving collector with GHC-9.2.3.
Presumably this doesn't happen with the tip of the GHC-8.10 branch since running the same application with that hasn't caused this.
I am using the following ghc-options:
-threaded -debug -eventlog -O2 -rtsopts "-with-rtsopts=-T -I0 -F1.2 -N12 -qn6 -A64m -n4m --disable-delayed-os-memory-return"
It still happens without -debug.
Here's some backtraces:
Backtraces
[Switching to LWP 26223]
0x0000000005fce271 in evacuate_large (p=0x423305a508) at rts/sm/Evac.c:455
455 rts/sm/Evac.c: No such file or directory.
(gdb) bt
#0 0x0000000005fce271 in evacuate_large (p=0x423305a508) at rts/sm/Evac.c:455
#1 0x0000000005fcea07 in evacuate (p=0x42000388b8) at rts/sm/Evac.c:790
#2 0x0000000005fd22e7 in nonmovingScavengeOne (q=0x42000388b0) at rts/sm/NonMovingScav.c:351
#3 0x0000000005fd2478 in scavengeNonmovingSegment (seg=0x4200038000) at rts/sm/NonMovingScav.c:397
#4 0x0000000005fa3f3b in scavenge_find_work () at rts/sm/Scav.c:2090
#5 0x0000000005fa40b2 in scavenge_loop () at rts/sm/Scav.c:2177
#6 0x0000000005f8eb6a in scavenge_until_all_done () at rts/sm/GC.c:1307
#7 0x0000000005f8cf9b in GarbageCollect (collect_gen=0, do_heap_census=false, is_overflow_gc=true, deadlock_detect=false, gc_type=2,
cap=0x729dec0, idle_cap=0x7fff740015e0) at rts/sm/GC.c:548
#8 0x0000000005f79247 in scheduleDoGC (pcap=0x7fffae7fbd90, task=0x7fffd4000bb0, force_major=false, is_overflow_gc=true,
deadlock_detect=false) at rts/Schedule.c:1860
#9 0x0000000005f774f3 in schedule (initialCapability=0x729dec0, task=0x7fffd4000bb0) at rts/Schedule.c:579
#10 0x0000000005f79ee0 in scheduleWorker (cap=0x729dec0, task=0x7fffd4000bb0) at rts/Schedule.c:2645
#11 0x0000000005f83cdc in workerStart (task=0x7fffd4000bb0) at rts/Task.c:445
#12 0x00007ffff7fadd40 in start_thread () from /nix/store/q29bwjibv9gi9n86203s38n0577w09sx-glibc-2.33-117/lib/libpthread.so.0
#13 0x00007ffff7a7403f in clone () from /nix/store/q29bwjibv9gi9n86203s38n0577w09sx-glibc-2.33-117/lib/libc.so.6
app: internal error: invalid closure, info=(nil)
(GHC version 9.2.3 for x86_64_unknown_linux)
Please report this as a GHC bug: https://www.haskell.org/ghc/reportabug
Thread 18 "app" received signal SIGABRT, Aborted.
[Switching to LWP 26596]
0x00007ffff79b3bda in raise () from /nix/store/q29bwjibv9gi9n86203s38n0577w09sx-glibc-2.33-117/lib/libc.so.6
(gdb) bt
#0 0x00007ffff79b3bda in raise () from /nix/store/q29bwjibv9gi9n86203s38n0577w09sx-glibc-2.33-117/lib/libc.so.6
#1 0x00007ffff799e533 in abort () from /nix/store/q29bwjibv9gi9n86203s38n0577w09sx-glibc-2.33-117/lib/libc.so.6
#2 0x0000000005f74d5e in rtsFatalInternalErrorFn (s=0x6362ecd "invalid closure, info=%p", ap=0x7fffaf7fd288) at rts/RtsMessages.c:192
#3 0x0000000005f749eb in barf (s=0x6362ecd "invalid closure, info=%p") at rts/RtsMessages.c:48
#4 0x0000000005fce7cc in evacuate (p=0x42000787b8) at rts/sm/Evac.c:693
#5 0x0000000005fd22e7 in nonmovingScavengeOne (q=0x42000787b0) at rts/sm/NonMovingScav.c:351
#6 0x0000000005fd2478 in scavengeNonmovingSegment (seg=0x4200078000) at rts/sm/NonMovingScav.c:397
#7 0x0000000005fa3f3b in scavenge_find_work () at rts/sm/Scav.c:2090
#8 0x0000000005fa40b2 in scavenge_loop () at rts/sm/Scav.c:2177
#9 0x0000000005f8eb6a in scavenge_until_all_done () at rts/sm/GC.c:1307
#10 0x0000000005f8edf7 in gcWorkerThread (cap=0x728d8b0) at rts/sm/GC.c:1395
#11 0x0000000005f6d3dd in yieldCapability (pCap=0x7fffaf7fdd60, task=0x7fffd0000bb0, gcAllowed=true) at rts/Capability.c:971
#12 0x0000000005f77826 in scheduleYield (pcap=0x7fffaf7fdd90, task=0x7fffd0000bb0) at rts/Schedule.c:705
#13 0x0000000005f76ad4 in schedule (initialCapability=0x728d8b0, task=0x7fffd0000bb0) at rts/Schedule.c:315
#14 0x0000000005f79ee0 in scheduleWorker (cap=0x728d8b0, task=0x7fffd0000bb0) at rts/Schedule.c:2645
#15 0x0000000005f83cdc in workerStart (task=0x7fffd0000bb0) at rts/Task.c:445
#16 0x00007ffff7fadd40 in start_thread () from /nix/store/q29bwjibv9gi9n86203s38n0577w09sx-glibc-2.33-117/lib/li
mark_closure (origin=0x0, p0=0x42310fd349, queue=0x7fffd0000df0) at includes/rts/storage/ClosureMacros.h:60
60 includes/rts/storage/ClosureMacros.h: No such file or directory.
(gdb) bt
#0 mark_closure (origin=0x0, p0=0x42310fd349, queue=0x7fffd0000df0) at includes/rts/storage/ClosureMacros.h:60
#1 nonmovingMark (queue=queue@entry=0x7fffd0000df0) at rts/sm/NonMovingMark.c:1689
#2 0x0000000005f884c8 in nonmovingMarkThreadsWeaks (mark_queue=<optimized out>) at rts/sm/NonMoving.c:1016
#3 nonmovingMark_ (mark_queue=0x7fffd0000df0, dead_weaks=dead_weaks@entry=0x7fff4affce40,
resurrected_threads=resurrected_threads@entry=0x7fff4affce48) at rts/sm/NonMoving.c:1076
#4 0x0000000005f88802 in nonmovingConcurrentMark (data=<optimized out>) at rts/sm/NonMoving.c:1032
#5 0x00007ffff7fadd40 in start_thread () from /nix/store/q29bwjibv9gi9n86203s38n0577w09sx-glibc-2.33-117/lib/libpthread.so.0
#6 0x00007ffff7a7403f in clone () from /nix/store/q29bwjibv9gi9n86203s38n0577w09sx-glibc-2.33-117/lib/libc.so.6
I can't share the code for the app that is causing this. I will try to extract a reproducer. But let me know if there's any more information that I can give in the meantime.