Skip to content

RTS crash with 9.8.1

Summary

I'm working on implementing a production system via a rete network, using STM. When I add certain productions, I get an exception thrown due to a lens Fold folding an empty list. When trying to debug this in GHCi I added a breakpoint to the function where the empty Fold occurs, and before the Fold even happens, about two seconds after hitting the breakpoint, GHCi crashes with the following:

ghci> :break Rete.ReteEffect.matchFromToken
Breakpoint 0 activated at src/Rete/ReteEffect.hs:(437,34)-(470,85)
ghci> import ForTesting
ghci> testCommand "add rules {\"rule\" (state <snake> * *) (<n1> <n2> <n3>) (<c> <g> <b>) --> (<n1> <n2> <n3>) (<c> <g> <b>)}"
Stopped in Rete.ReteEffect.matchFromToken, src/Rete/ReteEffect.hs:(437,34)-(470,85)
_result :: Sem r Core.WME.RuleMatch = _
tok :: Core.WME.WMEToken ? Core.WME.IsProductionToken = _
token :: Rete.Types.Tokens.WMEToken' Core.WME.WMEMetadata = _
[src/Rete/ReteEffect.hs:(437,34)-(470,85)] ghci> <interactive>: internal error: evacuate: strange closure type 636340344
Stack trace:
            0x7fffef73027f    set_initial_registers (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
            0x7fffee1de088    dwfl_thread_getframes (/nix/store/8hy9xdnczdpn846qpsybwlpk5l607lbj-elfutils-0.189/lib/libdw-0.189.so)
            0x7fffee1ddbdb    get_one_thread_cb (/nix/store/8hy9xdnczdpn846qpsybwlpk5l607lbj-elfutils-0.189/lib/libdw-0.189.so)
            0x7fffee1ddeea    dwfl_getthreads (/nix/store/8hy9xdnczdpn846qpsybwlpk5l607lbj-elfutils-0.189/lib/libdw-0.189.so)
            0x7fffee1de417    dwfl_getthread_frames (/nix/store/8hy9xdnczdpn846qpsybwlpk5l607lbj-elfutils-0.189/lib/libdw-0.189.so)
            0x7fffef730887    libdwGetBacktrace (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
            0x7fffef73902d    rtsFatalInternalErrorFn (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
            0x7fffef739200    barf (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
            0x7fffef716171    evacuate1 (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
            0x7fffef71beac    scavenge_block1 (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
            0x7fffef7658b7    scavenge_loop1 (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
            0x7fffef75a702    scavenge_until_all_done (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
            0x7fffef75c193    GarbageCollect (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
            0x7fffef73d320    scheduleDoGC (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
            0x7fffef73e339    schedule (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
            0x7fffef73ebdc    scheduleWorker (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
            0x7fffef7431dd    workerStart (/nix/store/20m2ha1x3rq3aiwg87asapqxv2hzgzw1-ghc-9.8.1/lib/ghc-9.8.1/lib/x86_64-linux-ghc-9.8.1/libHSrts-1.0.2_thr-ghc9.8.1.so)
            0x7fffee29fdd4    start_thread (/nix/store/aw2fw9ag10wr9pf0qk4nk5sxi0q0bn56-glibc-2.37-8/lib/libc.so.6)
            0x7fffee3219b0    __clone3 (/nix/store/aw2fw9ag10wr9pf0qk4nk5sxi0q0bn56-glibc-2.37-8/lib/libc.so.6)

    (GHC version 9.8.1 for x86_64_unknown_linux)
    Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug
Error: cabal: repl failed for cosmothought-core-0.1.0.0. The build process
terminated with exit code -6

Steps to reproduce

Most convenient as a Nix flake:

  1. grab from https://gitlab.com/spacekitteh/cosmothought/-/tree/repro-24506?ref_type=heads, ensure the current branch is repro-24506
  2. nix develop
  3. cabal repl core
  4. ghci> import ForTesting
  5. Optionally, ghci> :break Rete.ReteEffect.producePatternMap for a stack trace of the segfault. Sometimes, it doesn't produce a stacktrace, but it often does.
  6. ghci> repro24506
  7. Sometimes, a single invocation of repro24506 doesn't result in the segfault; just run repro24506 again until it does.

Alternatively, it can be reproduced in gdb using cosmothought-core-oneoff; but this requires setting a breakpoint. If you run gdb --args cosmothought-core-oneoff +RTS -DS, and set a breakpoint on cosmothoughtzmcorezm0zi1zi0zi0zminplace_ReteziReteBuilder_createOrShareAlphaNodes_info, it will barf during Sanity checking. Ok that's due to setting the breakpoint incorrectly, see #24506 (comment 552349)

The exact value for the strange closure type changes from run to run. Sometimes, it just segfaults instead of printing the backtrace.

The problem is almost certainly in the core/src/Rete modules, in particular, how I'm using stm-hamt's.

I'm trying to narrow it down in order to find a minimal test case, but it's difficult.

Expected behavior

Not crash.

Things tried

  • +RTS --nonmoving-gc and --copying-gc both segfault
  • -O0 -fno-static-argument-transformation still segfaults
  • +RTS -C0 -V0 with non-threaded runtime still segfaults
  • Linked with -debug
  • +RTS -DS doesn't produce anything immediately obvious to me

Possibly related

~~#24443 ~~ Nope, just a coincidence, but a strange one

Environment

  • GHC version used: 9.8.1
  • Operating System: NixOS
  • System Architecture: x64
Edited by Sophie Taylor
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information