RTS crash with 9.8.1
Summary
I'm working on implementing a production system via a rete network, using STM. When I add certain productions, I get an exception thrown due to a lens
Fold
folding an empty list. When trying to debug this in GHCi I added a breakpoint to the function where the empty Fold occurs, and before the Fold even happens, about two seconds after hitting the breakpoint, GHCi crashes with the following:
ghci> :break Rete.ReteEffect.matchFromToken
Breakpoint 0 activated at src/Rete/ReteEffect.hs:(437,34)-(470,85)
ghci> import ForTesting
ghci> testCommand "add rules {\"rule\" (state <snake> * *) (<n1> <n2> <n3>) (<c> <g> <b>) --> (<n1> <n2> <n3>) (<c> <g> <b>)}"
Stopped in Rete.ReteEffect.matchFromToken, src/Rete/ReteEffect.hs:(437,34)-(470,85)
_result :: Sem r Core.WME.RuleMatch = _
tok :: Core.WME.WMEToken ? Core.WME.IsProductionToken = _
token :: Rete.Types.Tokens.WMEToken' Core.WME.WMEMetadata = _
[src/Rete/ReteEffect.hs:(437,34)-(470,85)] ghci> <interactive>: internal error: evacuate: strange closure type 636340344
Stack trace:
0x7fffef73027f set_initial_registers (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
0x7fffee1de088 dwfl_thread_getframes (/nix/store/8hy9xdnczdpn846qpsybwlpk5l607lbj-elfutils-0.189/lib/libdw-0.189.so)
0x7fffee1ddbdb get_one_thread_cb (/nix/store/8hy9xdnczdpn846qpsybwlpk5l607lbj-elfutils-0.189/lib/libdw-0.189.so)
0x7fffee1ddeea dwfl_getthreads (/nix/store/8hy9xdnczdpn846qpsybwlpk5l607lbj-elfutils-0.189/lib/libdw-0.189.so)
0x7fffee1de417 dwfl_getthread_frames (/nix/store/8hy9xdnczdpn846qpsybwlpk5l607lbj-elfutils-0.189/lib/libdw-0.189.so)
0x7fffef730887 libdwGetBacktrace (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
0x7fffef73902d rtsFatalInternalErrorFn (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
0x7fffef739200 barf (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
0x7fffef716171 evacuate1 (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
0x7fffef71beac scavenge_block1 (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
0x7fffef7658b7 scavenge_loop1 (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
0x7fffef75a702 scavenge_until_all_done (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
0x7fffef75c193 GarbageCollect (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
0x7fffef73d320 scheduleDoGC (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
0x7fffef73e339 schedule (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
0x7fffef73ebdc scheduleWorker (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
0x7fffef7431dd workerStart (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
0x7fffee29fdd4 start_thread (/nix/store/aw2fw9ag10wr9pf0qk4nk5sxi0q0bn56-glibc-2.37-8/lib/libc.so.6)
0x7fffee3219b0 __clone3 (/nix/store/aw2fw9ag10wr9pf0qk4nk5sxi0q0bn56-glibc-2.37-8/lib/libc.so.6)
(GHC version 9.8.1 for x86_64_unknown_linux)
Please report this as a GHC bug: https://www.haskell.org/ghc/reportabug
Error: cabal: repl failed for cosmothought-core-0.1.0.0. The build process
terminated with exit code -6
Steps to reproduce
Most convenient as a Nix flake:
- grab from https://gitlab.com/spacekitteh/cosmothought/-/tree/repro-24506?ref_type=heads, ensure the current branch is
repro-24506
nix develop
cabal repl core
ghci> import ForTesting
- Optionally,
ghci> :break Rete.ReteEffect.producePatternMap
for a stack trace of the segfault. Sometimes, it doesn't produce a stacktrace, but it often does. ghci> repro24506
- Sometimes, a single invocation of
repro24506
doesn't result in the segfault; just runrepro24506
again until it does.
Alternatively, it can be reproduced in Ok that's due to setting the breakpoint incorrectly, see #24506 (comment 552349)gdb
using cosmothought-core-oneoff
; but this requires setting a breakpoint. If you run gdb --args cosmothought-core-oneoff +RTS -DS
, and set a breakpoint on cosmothoughtzmcorezm0zi1zi0zi0zminplace_ReteziReteBuilder_createOrShareAlphaNodes_info
, it will barf during Sanity checking.
The exact value for the strange closure type changes from run to run. Sometimes, it just segfaults instead of printing the backtrace.
The problem is almost certainly in the core/src/Rete
modules, in particular, how I'm using stm-hamt's
.
I'm trying to narrow it down in order to find a minimal test case, but it's difficult.
Expected behavior
Not crash.
Things tried
-
+RTS --nonmoving-gc
and--copying-gc
both segfault -
-O0 -fno-static-argument-transformation
still segfaults -
+RTS -C0 -V0
with non-threaded runtime still segfaults - Linked with
-debug
-
+RTS -DS
doesn't produce anything immediately obvious to me
Possibly related
~~#24443 ~~ Nope, just a coincidence, but a strange one
Environment
- GHC version used: 9.8.1
- Operating System: NixOS
- System Architecture: x64