Segfaults on Intel Goldmont Plus Atom CPU
Summary
GHC eventually segfaults when compiling non-trivial code on an Intel Celeron J4005 (Goldmont Plus) Atom. The issue is not completely reproducible, but happens often enough that it renders GHC useless on this system. The resulting segfaults happen at similar places. For example, ghc 8.6.3 has hit this exact segfault 3 times now:
Nov 06 20:36:35 kernel: ghc[8021]: segfault at 1 ip 00007ffff099d0df sp 00007fffffff64d8 error 6 in libHSr
ts_thr-ghc8.6.3.so[7ffff094c000+71000]
Nov 06 20:36:35 kernel: Code: 00 00 00 19 00 00 00 00 00 00 00 48 83 ec 08 48 8d 3d 41 d9 00 00 31 c0 e8 be 57 fd ff 48 83 c4 08 eb e8 48 63 43 0c 48 89 c1 <48> c1 e1 03 48 83 c1 02 48 89 ea 48 29 ca 4c 39 fa 0f 82 8a 00 00
This is extremely weird, because the offending instruction is a shift:
0000000000000018 4863430c movsxd rax, dword [rbx+0xc]
000000000000001c 4889c1 mov rcx, rax
000000000000001f 48c1e103 shl rcx, 0x3 <--- Segfault happens here?
0000000000000023 4883c102 add rcx, 0x2
0000000000000027 4889ea mov rdx, rbp
000000000000002a 4829ca sub rdx, rcx
000000000000002d 4c39fa cmp rdx, r15
I initially thought this system has bad RAM, but:
- I see no other software crashing on this system,
- I ran memtest86 for the last night without issue.
I also see no segfaults with the same software stack on the three Intel Core systems I have available (Broadwell, Skylake, Whiskey Lake). The only important difference seems to be the CPU.
I would appreciate if someone with a modern Atom CPU can try to reproduce this issue.
Steps to reproduce
The easiest way to hit the problem is to compile GHC itself. On NixOS, this can be achieved by:
% git clone https://github.com/NixOS/nixpkgs.git
% cd nixpkgs
# We have to avoid fetching a cached copy, but don't want to rebuild all the dependencies
# as well. So the easiest way is to change the derivation in a meaningless way. Add a newline in preConfigure:
% $EDITOR pkgs/development/compilers/ghc/8.8.1.nix
% nix-build -A haskell.compiler.ghc881
[...]
"/nix/store/vkz7mjh3frlc3rl8wlyqddcn0356dicv-ghc-8.6.3-binary/bin/ghc" [...] -no-user-package-db -rtsopts
-outputdir compiler/stage1/build -c compiler/main/Ar.hs -o compiler/stage1/build/Ar.o
make[1]: *** [compiler/ghc.mk:444: compiler/stage1/build/Ar.o] Segmentation fault (core dumped)
Expected behavior
GHC does not crash.
Environment
- GHC version used: 8.6.3 (but also 8.4.4)
Optional:
- Operating System: Linux (NixOS 19.09), both on Linux 4.19.81 and 5.3.7
- System Architecture: x86_64