Seg fault with nonmoving collector and GHC-9.2.3
I've been running into some intermittent seg faults when running an app with the nonmoving collector with GHC-9.2.3.
Presumably this doesn't happen with the tip of the GHC-8.10 branch since running the same application with that hasn't caused this.
I am using the following ghc-options:
-threaded -debug -eventlog -O2 -rtsopts "-with-rtsopts=-T -I0 -F1.2 -N12 -qn6 -A64m -n4m --disable-delayed-os-memory-return"
It still happens without -debug.
Here's some backtraces:
[Switching to LWP 26223] 0x0000000005fce271 in evacuate_large (p=0x423305a508) at rts/sm/Evac.c:455 455 rts/sm/Evac.c: No such file or directory. (gdb) bt #0 0x0000000005fce271 in evacuate_large (p=0x423305a508) at rts/sm/Evac.c:455 #1 0x0000000005fcea07 in evacuate (p=0x42000388b8) at rts/sm/Evac.c:790 #2 0x0000000005fd22e7 in nonmovingScavengeOne (q=0x42000388b0) at rts/sm/NonMovingScav.c:351 #3 0x0000000005fd2478 in scavengeNonmovingSegment (seg=0x4200038000) at rts/sm/NonMovingScav.c:397 #4 0x0000000005fa3f3b in scavenge_find_work () at rts/sm/Scav.c:2090 #5 0x0000000005fa40b2 in scavenge_loop () at rts/sm/Scav.c:2177 #6 0x0000000005f8eb6a in scavenge_until_all_done () at rts/sm/GC.c:1307 #7 0x0000000005f8cf9b in GarbageCollect (collect_gen=0, do_heap_census=false, is_overflow_gc=true, deadlock_detect=false, gc_type=2, cap=0x729dec0, idle_cap=0x7fff740015e0) at rts/sm/GC.c:548 #8 0x0000000005f79247 in scheduleDoGC (pcap=0x7fffae7fbd90, task=0x7fffd4000bb0, force_major=false, is_overflow_gc=true, deadlock_detect=false) at rts/Schedule.c:1860 #9 0x0000000005f774f3 in schedule (initialCapability=0x729dec0, task=0x7fffd4000bb0) at rts/Schedule.c:579 #10 0x0000000005f79ee0 in scheduleWorker (cap=0x729dec0, task=0x7fffd4000bb0) at rts/Schedule.c:2645 #11 0x0000000005f83cdc in workerStart (task=0x7fffd4000bb0) at rts/Task.c:445 #12 0x00007ffff7fadd40 in start_thread () from /nix/store/q29bwjibv9gi9n86203s38n0577w09sx-glibc-2.33-117/lib/libpthread.so.0 #13 0x00007ffff7a7403f in clone () from /nix/store/q29bwjibv9gi9n86203s38n0577w09sx-glibc-2.33-117/lib/libc.so.6 app: internal error: invalid closure, info=(nil) (GHC version 9.2.3 for x86_64_unknown_linux) Please report this as a GHC bug: https://www.haskell.org/ghc/reportabug Thread 18 "app" received signal SIGABRT, Aborted. [Switching to LWP 26596] 0x00007ffff79b3bda in raise () from /nix/store/q29bwjibv9gi9n86203s38n0577w09sx-glibc-2.33-117/lib/libc.so.6 (gdb) bt #0 0x00007ffff79b3bda in raise () from /nix/store/q29bwjibv9gi9n86203s38n0577w09sx-glibc-2.33-117/lib/libc.so.6 #1 0x00007ffff799e533 in abort () from /nix/store/q29bwjibv9gi9n86203s38n0577w09sx-glibc-2.33-117/lib/libc.so.6 #2 0x0000000005f74d5e in rtsFatalInternalErrorFn (s=0x6362ecd "invalid closure, info=%p", ap=0x7fffaf7fd288) at rts/RtsMessages.c:192 #3 0x0000000005f749eb in barf (s=0x6362ecd "invalid closure, info=%p") at rts/RtsMessages.c:48 #4 0x0000000005fce7cc in evacuate (p=0x42000787b8) at rts/sm/Evac.c:693 #5 0x0000000005fd22e7 in nonmovingScavengeOne (q=0x42000787b0) at rts/sm/NonMovingScav.c:351 #6 0x0000000005fd2478 in scavengeNonmovingSegment (seg=0x4200078000) at rts/sm/NonMovingScav.c:397 #7 0x0000000005fa3f3b in scavenge_find_work () at rts/sm/Scav.c:2090 #8 0x0000000005fa40b2 in scavenge_loop () at rts/sm/Scav.c:2177 #9 0x0000000005f8eb6a in scavenge_until_all_done () at rts/sm/GC.c:1307 #10 0x0000000005f8edf7 in gcWorkerThread (cap=0x728d8b0) at rts/sm/GC.c:1395 #11 0x0000000005f6d3dd in yieldCapability (pCap=0x7fffaf7fdd60, task=0x7fffd0000bb0, gcAllowed=true) at rts/Capability.c:971 #12 0x0000000005f77826 in scheduleYield (pcap=0x7fffaf7fdd90, task=0x7fffd0000bb0) at rts/Schedule.c:705 #13 0x0000000005f76ad4 in schedule (initialCapability=0x728d8b0, task=0x7fffd0000bb0) at rts/Schedule.c:315 #14 0x0000000005f79ee0 in scheduleWorker (cap=0x728d8b0, task=0x7fffd0000bb0) at rts/Schedule.c:2645 #15 0x0000000005f83cdc in workerStart (task=0x7fffd0000bb0) at rts/Task.c:445 #16 0x00007ffff7fadd40 in start_thread () from /nix/store/q29bwjibv9gi9n86203s38n0577w09sx-glibc-2.33-117/lib/li mark_closure (origin=0x0, p0=0x42310fd349, queue=0x7fffd0000df0) at includes/rts/storage/ClosureMacros.h:60 60 includes/rts/storage/ClosureMacros.h: No such file or directory. (gdb) bt #0 mark_closure (origin=0x0, p0=0x42310fd349, queue=0x7fffd0000df0) at includes/rts/storage/ClosureMacros.h:60 #1 nonmovingMark (queue=queue@entry=0x7fffd0000df0) at rts/sm/NonMovingMark.c:1689 #2 0x0000000005f884c8 in nonmovingMarkThreadsWeaks (mark_queue=<optimized out>) at rts/sm/NonMoving.c:1016 #3 nonmovingMark_ (mark_queue=0x7fffd0000df0, dead_weaks=dead_weaks@entry=0x7fff4affce40, resurrected_threads=resurrected_threads@entry=0x7fff4affce48) at rts/sm/NonMoving.c:1076 #4 0x0000000005f88802 in nonmovingConcurrentMark (data=<optimized out>) at rts/sm/NonMoving.c:1032 #5 0x00007ffff7fadd40 in start_thread () from /nix/store/q29bwjibv9gi9n86203s38n0577w09sx-glibc-2.33-117/lib/libpthread.so.0 #6 0x00007ffff7a7403f in clone () from /nix/store/q29bwjibv9gi9n86203s38n0577w09sx-glibc-2.33-117/lib/libc.so.6
I can't share the code for the app that is causing this. I will try to extract a reproducer. But let me know if there's any more information that I can give in the meantime.