Skip to content

Many test cases crash at runtime with nonmoving_sanity test way

How to reproduce: first, apply the following patch to a clean checkout (0962b50d as of last update):

diff --git a/testsuite/config/ghc b/testsuite/config/ghc
index 90b11bc57fc..5216366ddd7 100644
--- a/testsuite/config/ghc
+++ b/testsuite/config/ghc
@@ -2,6 +2,8 @@
 
 import re
 
+from testglobals import default_testopts
+
 # Testsuite configuration setup for GHC
 #
 # This file is Python source
@@ -29,10 +31,14 @@ config.other_ways         = ['hpc',
                              'ghci-ext', 'ghci-ext-prof',
                              'ext-interp',
                              'nonmoving',
+                             'nonmoving_sanity',
                              'nonmoving_thr',
                              'nonmoving_thr_sanity',
                              'nonmoving_thr_ghc',
                              'compacting_gc',
+                             'compacting_gc_sanity',
+                             'sweeping_gc',
+                             'sweeping_gc_sanity'
                              ]
 
 
@@ -69,10 +75,8 @@ if windows:
     else:
         config.other_ways += winio_ways
 
-# LLVM
-if not config.unregisterised and not config.arch in {"wasm32", "javascript"} and config.have_llvm:
-    config.compile_ways.append('optllvm')
-    config.run_ways.append('optllvm')
+if True:
+    default_testopts.extra_ways = ['compacting_gc', 'compacting_gc_sanity', 'nonmoving', 'nonmoving_sanity', 'nonmoving_thr', 'nonmoving_thr_sanity', 'sweeping_gc', 'sweeping_gc_sanity', 'sanity', 'threaded2_sanity']
 
 # HPC
 if not config.arch == "javascript":
@@ -122,10 +126,14 @@ config.way_flags = {
     'ghci-ext-prof'    : ['--interactive', '-v0', '-ignore-dot-ghci', '-fno-ghci-history', '-fexternal-interpreter', '-prof', '+RTS', '-I0.1', '-RTS'],
     'ext-interp'   : ['-fexternal-interpreter'],
     'nonmoving'    : [],
+    'nonmoving_sanity' : ['-debug'],
     'nonmoving_thr': ['-threaded'],
     'nonmoving_thr_sanity': ['-threaded', '-debug'],
     'nonmoving_thr_ghc': ['+RTS', '-xn', '-N2', '-RTS', '-threaded'],
     'compacting_gc': [],
+    'compacting_gc_sanity': ['-debug'],
+    'sweeping_gc': [],
+    'sweeping_gc_sanity': ['-debug'],
     'winio': [],
     'winio_threaded': ['-threaded'],
    }
@@ -169,10 +177,14 @@ config.way_rts_flags = {
     'ghci-ext-prof'    : [],
     'ext-interp'       : [],
     'nonmoving'        : ['-xn'],
+    'nonmoving_sanity' : ['-xn', '-DS'],
     'nonmoving_thr'    : ['-xn', '-N2'],
     'nonmoving_thr_sanity'    : ['-xn', '-N2', '-DS'],
     'nonmoving_thr_ghc': ['-xn', '-N2'],
     'compacting_gc': ['-c'],
+    'compacting_gc_sanity': ['-c', '-DS'],
+    'sweeping_gc': ['-w'],
+    'sweeping_gc_sanity': ['-w', '-DS'],
     'winio': ['--io-manager=native'],
     'winio_threaded': ['--io-manager=native'],
    }

The patch above adds a bunch of test ways that are forcibly executed regardless of test speed. Then, on x86_64-linux, in a clean debian 12 image and using ghc-9.8.2 to bootstrap, just build it and run the full testsuite as usual. There will be numerous segmentation faults or RTS internal errors for the nonmoving_sanity test way, which tests single-threaded nonmoving GC combined with sanity check.

Full test log: https://gitlab.haskell.org/-/snippets/5750

Test cases failing with Segmentation fault (core dumped):

*** unexpected failure for cmp64(nonmoving_sanity)
*** unexpected failure for conc027(nonmoving_sanity)
*** unexpected failure for conc004(nonmoving_sanity)
*** unexpected failure for arith003(nonmoving_sanity)
*** unexpected failure for CarryOverflow(nonmoving_sanity)
*** unexpected failure for T3245(nonmoving_sanity)
*** unexpected failure for CmpInt16(nonmoving_sanity)
*** unexpected failure for CmpInt32(nonmoving_sanity)
*** unexpected failure for life_space_leak(nonmoving_sanity)
*** unexpected failure for thurston-modular-arith(nonmoving_sanity)
*** unexpected failure for T11108(nonmoving_sanity)
*** unexpected failure for T9532(nonmoving_sanity)
*** unexpected failure for inits1tails1(nonmoving_sanity)
*** unexpected failure for T20107(nonmoving_sanity)
*** unexpected failure for bytestringread001(nonmoving_sanity)
*** unexpected failure for AtomicSwapIORef(nonmoving_sanity)
*** unexpected failure for Timeout001(nonmoving_sanity)
*** unexpected failure for memo001(nonmoving_sanity)
*** unexpected failure for stack_underflow(nonmoving_sanity)

Test cases failing with internal error: checkClosure: found EVACUATED closure:

*** unexpected failure for arr020(nonmoving_sanity)
*** unexpected failure for arr017(nonmoving_sanity)
*** unexpected failure for annrun01(nonmoving_sanity)
*** unexpected failure for conc051(nonmoving_sanity)
*** unexpected failure for throwto003(nonmoving_sanity)
*** unexpected failure for throwto002(nonmoving_sanity)
*** unexpected failure for tryReadMVar2(nonmoving_sanity)
*** unexpected failure for readMVar1(nonmoving_sanity)
*** unexpected failure for hs_try_putmvar002(nonmoving_sanity)
*** unexpected failure for LintEtaExpand(nonmoving_sanity)
*** unexpected failure for AtomicPrimops(nonmoving_sanity)
*** unexpected failure for T11535(nonmoving_sanity)
*** unexpected failure for T5313(nonmoving_sanity)
*** unexpected failure for ffi016(nonmoving_sanity)
*** unexpected failure for fptr02(nonmoving_sanity)
*** unexpected failure for T4221(nonmoving_sanity)
*** unexpected failure for ffi023(nonmoving_sanity)
*** unexpected failure for T10942(nonmoving_sanity)
*** unexpected failure for T9595(nonmoving_sanity)
*** unexpected failure for T18522-dbg-ppr(nonmoving_sanity)
*** unexpected failure for T18181(nonmoving_sanity)
*** unexpected failure for PartialDownsweep(nonmoving_sanity)
*** unexpected failure for OldModLocation(nonmoving_sanity)
*** unexpected failure for T10508_api(nonmoving_sanity)
*** unexpected failure for dynCompileExpr(nonmoving_sanity)
*** unexpected failure for T11938(nonmoving_sanity)
*** unexpected failure for TargetContents(nonmoving_sanity)
*** unexpected failure for T3372(nonmoving_sanity)
*** unexpected failure for PatTypes(nonmoving_sanity)
*** unexpected failure for HieVdq(nonmoving_sanity)
*** unexpected failure for RecordDotTypes(nonmoving_sanity)
*** unexpected failure for HieQueries(nonmoving_sanity)
*** unexpected failure for T23492(nonmoving_sanity)
*** unexpected failure for T20341(nonmoving_sanity)
*** unexpected failure for SpliceTypes(nonmoving_sanity)
*** unexpected failure for T23540(nonmoving_sanity)
*** unexpected failure for Monoid_ByteArray(nonmoving_sanity)
*** unexpected failure for arith008(nonmoving_sanity)
*** unexpected failure for arith011(nonmoving_sanity)
*** unexpected failure for T8726(nonmoving_sanity)
*** unexpected failure for foundation(nonmoving_sanity)
*** unexpected failure for space_leak_001(nonmoving_sanity)
*** unexpected failure for static-plugins(nonmoving_sanity)
*** unexpected failure for T4334(nonmoving_sanity)
*** unexpected failure for ArithWord16(nonmoving_sanity)
*** unexpected failure for CmpWord16(nonmoving_sanity)
*** unexpected failure for ArithInt16(nonmoving_sanity)
*** unexpected failure for ArithWord32(nonmoving_sanity)
*** unexpected failure for ShrinkSmallMutableArrayB(nonmoving_sanity)
*** unexpected failure for ArithInt32(nonmoving_sanity)
*** unexpected failure for CmpWord32(nonmoving_sanity)
*** unexpected failure for CmpInt8(nonmoving_sanity)
*** unexpected failure for CmpWord8(nonmoving_sanity)
*** unexpected failure for ArithInt8(nonmoving_sanity)
*** unexpected failure for ArithWord8(nonmoving_sanity)
*** unexpected failure for T23071(nonmoving_sanity)
*** unexpected failure for T22488_docHead(nonmoving_sanity)
*** unexpected failure for andre_monad(nonmoving_sanity)
*** unexpected failure for queens(nonmoving_sanity)
*** unexpected failure for 10queens(nonmoving_sanity)
*** unexpected failure for jules_xref(nonmoving_sanity)
*** unexpected failure for jules_xref2(nonmoving_sanity)
*** unexpected failure for jl_defaults(nonmoving_sanity)
*** unexpected failure for andy_cherry(nonmoving_sanity)
*** unexpected failure for barton-mangler-bug(nonmoving_sanity)
*** unexpected failure for seward-space-leak(nonmoving_sanity)
*** unexpected failure for galois_raytrace(nonmoving_sanity)
*** unexpected failure for bug1010(nonmoving_sanity)
*** unexpected failure for T7160(nonmoving_sanity)
*** unexpected failure for stack003(nonmoving_sanity)
*** unexpected failure for stablename001(nonmoving_sanity)
*** unexpected failure for T7636(nonmoving_sanity)
*** unexpected failure for T10904(nonmoving_sanity)
*** unexpected failure for alloccounter1(nonmoving_sanity)
*** unexpected failure for T19381(nonmoving_sanity)
*** unexpected failure for cloneMyStack_retBigStackFrame(nonmoving_sanity)
*** unexpected failure for decodeMyStack_underflowFrames(nonmoving_sanity)
*** unexpected failure for T23400(nonmoving_sanity)
*** unexpected failure for T17574(nonmoving_sanity)
*** unexpected failure for T19481(nonmoving_sanity)
*** unexpected failure for T9646(nonmoving_sanity)
*** unexpected failure for IOManager(nonmoving_sanity)
*** unexpected failure for unicode002(nonmoving_sanity)
*** unexpected failure for inits(nonmoving_sanity)
*** unexpected failure for length001(nonmoving_sanity)
*** unexpected failure for unicode003(nonmoving_sanity)
*** unexpected failure for tup001(nonmoving_sanity)
*** unexpected failure for memo002(nonmoving_sanity)
*** unexpected failure for stableptr001(nonmoving_sanity)
*** unexpected failure for dynamic003(nonmoving_sanity)
*** unexpected failure for stack_misc_closures(nonmoving_sanity)
*** unexpected failure for stableptr003(nonmoving_sanity)
*** unexpected failure for weak001(nonmoving_sanity)
*** unexpected failure for stableptr004(nonmoving_sanity)
*** unexpected failure for dynamic004(nonmoving_sanity)
*** unexpected failure for ioref001(nonmoving_sanity)
*** unexpected failure for WasmControlFlow(nonmoving_sanity)
*** unexpected failure for T7653(nonmoving_sanity)
*** unexpected failure for T13167(nonmoving_sanity)
*** unexpected failure for AtomicModifyIORef(nonmoving_sanity)
*** unexpected failure for finalization001(nonmoving_sanity)
*** unexpected failure for qsem001(nonmoving_sanity)
*** unexpected failure for qsemn001(nonmoving_sanity)
*** unexpected failure for Chan003(nonmoving_sanity)
*** unexpected failure for hGetBuf001(nonmoving_sanity)
*** unexpected failure for encoding001(nonmoving_sanity)
*** unexpected failure for T2411(nonmoving_sanity)
*** unexpected failure for stm050(nonmoving_sanity)
*** unexpected failure for T15136(nonmoving_sanity)
*** unexpected failure for encoding004(nonmoving_sanity)
*** unexpected failure for T16707(nonmoving_sanity)

Test cases failing with internal error: ASSERTION FAILED: file rts/sm/Sanity.c, line 571:

*** unexpected failure for T23120(nonmoving_sanity)

Test cases failing with internal error: checkClosure: stack frame:

*** unexpected failure for openFile008(nonmoving_sanity)

For the most common failure pattern, the full test log offers stack trace since I configured with --enable-dwarf-unwind and built with perf+debug_info flavour:

arith008: internal error: checkClosure: found EVACUATED closure 1107316858
Stack trace:
                  0x2b960d    set_initial_registers (rts/Libdw.c:297.5)
            0x7ff60031fc38    dwfl_thread_getframes (/usr/lib/x86_64-linux-gnu/libdw-0.188.so)
            0x7ff60031ffec    dwfl_getthread_frames (/usr/lib/x86_64-linux-gnu/libdw-0.188.so)
                  0x2b94ef    libdwGetBacktrace (rts/Libdw.c:263.15)
                  0x2bf253    rtsFatalInternalErrorFn (rts/RtsMessages.c:175.22)
                  0x2bee8d    barf (rts/RtsMessages.c:49.3)
                  0x307473    checkClosure (rts/sm/Sanity.c:367.12)
                  0x307d31    checkHeapChain (rts/sm/Sanity.c:601.26)
                  0x308c6f    checkGeneration (rts/sm/Sanity.c:1015.9)
                  0x308d1a    checkFullHeap (rts/sm/Sanity.c:1034.52)
                  0x308d94    checkSanity (rts/sm/Sanity.c:1046.5)
                  0x2f1cc0    GarbageCollect (rts/sm/GC.c:461.3)
                  0x2d6c12    scheduleDoGC (rts/Schedule.c:1915.9)
                  0x2d780e    exitScheduler (rts/Schedule.c:2801.9)
                  0x2bfb4f    hs_exit_ (rts/RtsStartup.c:490.12)
                  0x2bfd39    shutdownHaskellAndExit (rts/RtsStartup.c:678.5)
                  0x2bd89f    ASSIGN_FLT (rts/include/Stg.h:426.62)
                  0x23d901    (null) (/tmp/ghctest-kphe85yc/test   spaces/testsuite/tests/numeric/should_run/arith008.run/arith008)
            0x7ff60039d24a    __libc_start_call_main (../sysdeps/nptl/libc_start_call_main.h:74.3)
            0x7ff60039d305    __libc_start_main@@GLIBC_2.34 (../csu/libc-start.c:128.20)
                  0x23c021    _start (/tmp/ghctest-kphe85yc/test   spaces/testsuite/tests/numeric/should_run/arith008.run/arith008)

    (GHC version 9.11.20240511 for x86_64_unknown_linux)
    Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug
Aborted (core dumped)
*** unexpected failure for arith008(nonmoving_sanity)

Similar issues were first discovered when I was attempting to test wasm with more test ways, and they also exists on i386. It seems like something's wrong in sanity checking logic with single-threaded nonmoving GC.

Edited by Cheng Shao
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information