GHC 9.2.8 RTS SIGABRT and SIGSEGV
Summary
Application (cardano-node 8.1.2 compiled with GHC 9.2.8) randomly crashes during runtime with either SIGABORT or SIGSEGV when using RTS -Fd flag with aggressive memory release settings.
Steps to reproduce
No definitive way as it is just random and seems to be platform dependent. I only have crashes on a particular computer (with an Intel(R) Celeron(R) N5095 @ 2.00GHz CPU), The crashes seem very frequent on this platform from 1-2 per day to 1 per week.
I have several full crash dumps at your disposal if you need them (just provide me with a place to upload them ~12GB per dump).
I'm using the following RTS settings (trying to leverage the -Fd flag which seems to be the main cause):
--disable-delayed-os-memory-return -I0.001 -Iw1800 -A64M -AL256M -n16m -F1.6 -Fd1 -O4250M -H5000M -T -S
(I'm forcing the GC to run every 30 minutes with those settings)
I compiled the software from a different x86_64 platform running the same OS using GHC 9.2.8 from GHCUP Here is the RTS --info of cardano-node:
cardano-node +RTS --info
[("GHC RTS", "YES")
,("GHC version", "9.2.8")
,("RTS way", "rts_thr")
,("Build platform", "x86_64-unknown-linux")
,("Build architecture", "x86_64")
,("Build OS", "linux")
,("Build vendor", "unknown")
,("Host platform", "x86_64-unknown-linux")
,("Host architecture", "x86_64")
,("Host OS", "linux")
,("Host vendor", "unknown")
,("Target platform", "x86_64-unknown-linux")
,("Target architecture", "x86_64")
,("Target OS", "linux")
,("Target vendor", "unknown")
,("Word size", "64")
,("Compiler unregisterised", "NO")
,("Tables next to code", "YES")
,("Flag -with-rtsopts", "-T -I0 -A16m -N2 --disable-delayed-os-memory-return")
]
Here are the last 3 back traces (2 SIGABRT and 1 SIGSEGV):
last crash:
Reading symbols from /home/cardano/.local/bin/cardano-node...
[New LWP 134094]
[New LWP 133928]
[New LWP 133939]
[New LWP 133936]
[New LWP 138575]
[New LWP 133974]
[New LWP 134095]
[New LWP 133940]
[New LWP 133937]
[New LWP 138574]
[New LWP 133938]
[New LWP 133949]
[New LWP 133947]
[New LWP 138573]
[New LWP 133946]
[New LWP 138571]
[New LWP 133945]
[New LWP 133948]
[New LWP 138572]
[New LWP 138570]
[New LWP 133944]
[New LWP 133942]
[New LWP 133941]
[New LWP 133943]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/home/cardano/.local/bin/cardano-node +RTS -N4 --disable-delayed-os-memory-retu'.
Program terminated with signal SIGABRT, Aborted.
(#)0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f498e7fc700 (LWP 134094))]
(gdb) bt
(#)0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
(#)1 0x00007f49bb324537 in __GI_abort () at abort.c:79
(#)2 0x000000000542f5ea in rtsFatalInternalErrorFn (s=0x55e2190 "evacuate: strange closure type %d", ap=0x7f498e7fb928) at rts/RtsMessages.c:192
(#)3 0x000000000542f70d in barf (s=s@entry=0x55e2190 "evacuate: strange closure type %d") at rts/RtsMessages.c:48
(#)4 0x0000000000411ba9 in evacuate (p=p@entry=0x42cd64a0f8) at rts/sm/Evac.c:1064
(#)5 0x000000000040f034 in scavenge_block (bd=0x42cd601280) at rts/sm/Scav.c:600
(#)6 0x0000000005439ba1 in scavenge_find_work () at rts/sm/Scav.c:2130
(#)7 scavenge_loop () at rts/sm/Scav.c:2177
(#)8 0x00000000054341d5 in scavenge_until_all_done () at rts/sm/GC.c:1315
(#)9 0x00000000054347d1 in gcWorkerThread (cap=cap@entry=0x755a940) at rts/sm/GC.c:1402
(#)10 0x00000000054207e9 in yieldCapability (pCap=pCap@entry=0x7f498e7fbbb8, task=task@entry=0x7f497c000bb0, gcAllowed=gcAllowed@entry=true) at rts/Capability.c:973
(#)11 0x000000000542b4ea in scheduleYield (task=0x7f497c000bb0, pcap=0x7f498e7fbbb0) at rts/Schedule.c:705
(#)12 schedule (initialCapability=initialCapability@entry=0x755a940, task=task@entry=0x7f497c000bb0) at rts/Schedule.c:315
(#)13 0x000000000542c22c in scheduleWorker (cap=cap@entry=0x755a940, task=task@entry=0x7f497c000bb0) at rts/Schedule.c:2647
(#)14 0x0000000005428066 in workerStart (task=0x7f497c000bb0) at rts/Task.c:445
(#)15 0x00007f49bb742ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
(#)16 0x00007f49bb3fda2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Previous crash:
Reading symbols from /home/cardano/.local/bin/cardano-node...
[New LWP 134094]
[New LWP 133928]
[New LWP 133939]
[New LWP 133936]
[New LWP 138575]
[New LWP 133974]
[New LWP 134095]
[New LWP 133940]
[New LWP 133937]
[New LWP 138574]
[New LWP 133938]
[New LWP 133949]
[New LWP 133947]
[New LWP 138573]
[New LWP 133946]
[New LWP 138571]
[New LWP 133945]
[New LWP 133948]
[New LWP 138572]
[New LWP 138570]
[New LWP 133944]
[New LWP 133942]
[New LWP 133941]
[New LWP 133943]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/home/cardano/.local/bin/cardano-node +RTS -N4 --disable-delayed-os-memory-retu'.
Program terminated with signal SIGABRT, Aborted.
(#)0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f498e7fc700 (LWP 134094))]
(gdb) bt
(#)0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
(#)1 0x00007f49bb324537 in __GI_abort () at abort.c:79
(#)2 0x000000000542f5ea in rtsFatalInternalErrorFn (s=0x55e2190 "evacuate: strange closure type %d", ap=0x7f498e7fb928) at rts/RtsMessages.c:192
(#)3 0x000000000542f70d in barf (s=s@entry=0x55e2190 "evacuate: strange closure type %d") at rts/RtsMessages.c:48
(#)4 0x0000000000411ba9 in evacuate (p=p@entry=0x42cd64a0f8) at rts/sm/Evac.c:1064
(#)5 0x000000000040f034 in scavenge_block (bd=0x42cd601280) at rts/sm/Scav.c:600
(#)6 0x0000000005439ba1 in scavenge_find_work () at rts/sm/Scav.c:2130
(#)7 scavenge_loop () at rts/sm/Scav.c:2177
(#)8 0x00000000054341d5 in scavenge_until_all_done () at rts/sm/GC.c:1315
(#)9 0x00000000054347d1 in gcWorkerThread (cap=cap@entry=0x755a940) at rts/sm/GC.c:1402
(#)10 0x00000000054207e9 in yieldCapability (pCap=pCap@entry=0x7f498e7fbbb8, task=task@entry=0x7f497c000bb0, gcAllowed=gcAllowed@entry=true) at rts/Capability.c:973
(#)11 0x000000000542b4ea in scheduleYield (task=0x7f497c000bb0, pcap=0x7f498e7fbbb0) at rts/Schedule.c:705
(#)12 schedule (initialCapability=initialCapability@entry=0x755a940, task=task@entry=0x7f497c000bb0) at rts/Schedule.c:315
(#)13 0x000000000542c22c in scheduleWorker (cap=cap@entry=0x755a940, task=task@entry=0x7f497c000bb0) at rts/Schedule.c:2647
(#)14 0x0000000005428066 in workerStart (task=0x7f497c000bb0) at rts/Task.c:445
(#)15 0x00007f49bb742ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
(#)16 0x00007f49bb3fda2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Previous previous crash:
[New LWP 125636]
[New LWP 125625]
[New LWP 126882]
[New LWP 125626]
[New LWP 125645]
[New LWP 125643]
[New LWP 125628]
[New LWP 125627]
[New LWP 125629]
[New LWP 125634]
[New LWP 125630]
[New LWP 125638]
[New LWP 125637]
[New LWP 125646]
[New LWP 125647]
[New LWP 125635]
[New LWP 126891]
[New LWP 125639]
[New LWP 125633]
[New LWP 125644]
[New LWP 125632]
[New LWP 125631]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/home/cardano/.local/bin/cardano-node +RTS -N4 --disable-delayed-os-memory-retu'.
Program terminated with signal SIGSEGV, Segmentation fault.
(#)0 evacuate (p=p@entry=0x42fb734868) at includes/rts/storage/ClosureMacros.h:60
60 includes/rts/storage/ClosureMacros.h: No such file or directory.
[Current thread is 1 (Thread 0x7ff76effd700 (LWP 125636))]
(gdb) bt
(#)0 evacuate (p=p@entry=0x42fb734868) at includes/rts/storage/ClosureMacros.h:60
(#)1 0x000000000040efcf in scavenge_block (bd=0x42fb700d00) at rts/sm/Scav.c:569
(#)2 0x0000000005439ba1 in scavenge_find_work () at rts/sm/Scav.c:2130
(#)3 scavenge_loop () at rts/sm/Scav.c:2177
(#)4 0x00000000054341d5 in scavenge_until_all_done () at rts/sm/GC.c:1315
(#)5 0x00000000054347d1 in gcWorkerThread (cap=cap@entry=0x6aa0d00) at rts/sm/GC.c:1402
(#)6 0x00000000054207e9 in yieldCapability (pCap=pCap@entry=0x7ff76effcbb8, task=task@entry=0x7ff774000bb0, gcAllowed=gcAllowed@entry=true) at rts/Capability.c:973
(#)7 0x000000000542b4ea in scheduleYield (task=0x7ff774000bb0, pcap=0x7ff76effcbb0) at rts/Schedule.c:705
(#)8 schedule (initialCapability=initialCapability@entry=0x6aa0d00, task=task@entry=0x7ff774000bb0) at rts/Schedule.c:315
(#)9 0x000000000542c22c in scheduleWorker (cap=cap@entry=0x6aa0d00, task=task@entry=0x7ff774000bb0) at rts/Schedule.c:2647
(#)10 0x0000000005428066 in workerStart (task=0x7ff774000bb0) at rts/Task.c:445
(#)11 0x00007ff7912c0ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
(#)12 0x00007ff790f7ba2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Expected behavior
No RTS crash
Environment
- GHC version used:
9.2.8 from ghcup
Optional:
- Operating System: Debian Bullseye 11.7 (5.10.191-1)
- System Architecture: x86_64