i386: Calling convention issue with SIMD vectors
Summary
A bunch of SIMD tests segfault on i386. Some of they crash because of a calling convention issue.
(This was originally reported as a part of #25498, but I think it would be better to have an individual issue to discuss the fix.)
Steps to reproduce
You can run the testsuite to see the segfault:
$ hadrian/build --flavour=devel2 test --test-root-dirs=testsuite/tests/simd
...
Detected CPU features: ['3dnowext', '3dnowprefetch', 'abm', 'aes', 'apic', 'arat', 'bmi1', 'bmi2', 'clflush', 'clflushopt', 'cmov', 'cmp_legacy', 'constant_tsc', 'cpuid', 'cr8_legacy', 'cx16', 'cx8', 'de', 'extd_apicid', 'fpu', 'fsgsbase', 'fxsr', 'fxsr_opt', 'ht', 'hypervisor', 'invpcid', 'lahf_lm', 'mca', 'mce', 'misalignsse', 'mmx', 'mmxext', 'movbe', 'msr', 'mtrr', 'nonstop_tsc', 'nx', 'pae', 'pat', 'pclmulqdq', 'pge', 'pni', 'popcnt', 'pse', 'pse36', 'rdrand', 'rdrnd', 'rdseed', 'rdtscp', 'rep_good', 'sep', 'sse', 'sse2', 'sse4_1', 'sse4_2', 'sse4a', 'ssse3', 'syscall', 'tsc', 'tsc_known_freq', 'vme', 'vmmcall']
Found CPU features: 3dnowext 3dnowprefetch abm aes apic arat bmi1 bmi2 clflush clflushopt cmov cmp_legacy constant_tsc cpuid cr8_legacy cx16 cx8 de extd_apicid fpu fsgsbase fxsr fxsr_opt ht hypervisor invpcid lahf_lm mca mce misalignsse mmx mmxext movbe msr mtrr nonstop_tsc nx pae pat pclmulqdq pge pni popcnt pse pse36 rdrand rdrnd rdseed rdtscp rep_good sep sse sse2 sse4_1 sse4_2 sse4a ssse3 syscall tsc tsc_known_freq vme vmmcall
...
Wrong exit code for floatx4_arith_baseline(optllvm)(expected 0 , actual 139 )
Stderr comp ( floatx4_arith_baseline ):
[1 of 2] Compiling Main ( floatx4_arith_baseline.hs, floatx4_arith_baseline.o )
[2 of 2] Linking floatx4_arith_baseline
Stderr run ( floatx4_arith_baseline ):
Segmentation fault
*** unexpected failure for floatx4_arith_baseline(optllvm)
...
Unexpected results from:
TEST="T25062_V16 T25169 T25658 T25659 doublex2_arith doublex2_arith_baseline floatx4_arith floatx4_arith_baseline int16x8_arith int16x8_arith_baseline int16x8_shuffle int16x8_shuffle_baseline int32x4_arith int32x4_arith_baseline int32x4_shuffle int32x4_shuffle_baseline int64x2_arith int64x2_arith_baseline int64x2_shuffle int64x2_shuffle_baseline int8x16_arith int8x16_arith_baseline int8x16_shuffle int8x16_shuffle_baseline word16x8_arith word16x8_arith_baseline word32x4_arith word32x4_arith_baseline word64x2_arith word64x2_arith_baseline word8x16_arith word8x16_arith_baseline"
Alternatively, here is a small program that reproduces the problem:
$ cat veccall.hs
{-# LANGUAGE MagicHash, UnboxedTuples, ExtendedLiterals #-}
import GHC.Exts
import GHC.Word
foo :: Word32X4# -> Word32X4# -> Word32X4#
foo x y = plusWord32X4# x y
{-# NOINLINE foo #-}
bar :: (Word32X4# -> Word32X4# -> Word32X4#) -> IO ()
bar f = let a = packWord32X4# (# 0x12c0ffee#Word32, 0xdeadbeef#Word32, 0x12345678#Word32, 0x87654321#Word32 #)
b = packWord32X4# (# 0x11223344#Word32, 0xaabbccdd#Word32, 0x77665544#Word32, 0x55443322#Word32 #)
v = f a b
(# x0, x1, x2, x3 #) = unpackWord32X4# v
in print (W32# x0, W32# x1, W32# x2, W32# x3)
{-# NOINLINE bar #-}
main :: IO ()
main = bar foo
$ ghc -fforce-recomp veccall.hs
$ ./veccall
Segmentation fault
$ ghc -fforce-recomp -fllvm veccall.hs
$ ./veccall
Segmentation fault
Expected behavior
$ ./veccall
(602092338,2305395660,2308615100,3702093379)
Environment
- GHC version used: 9.13.20250420 (a00eeaec)
- Operating System: Debian 12 (bookworm)
- System Architecture: i386