Building GHC fails for x86 with BMI2: The code generator tries to produce illegal instructions which cannot assemble (the PDEP & PEXT instructions can't take 16bit operands).
Summary
I don't really know for sure if this is a bug or just a typical manifestation of my stupidity. I preface every bug report I ever make with something like the preceding statement and it usually turns out to be a good idea when it becomes plain that I have no idea what I'm doing or talking about.
The problem
For some reason GHC produces assembly code with a non-existent (x86_64, BMI2) instruction: pdepw
. The pdep
instruction requires both operands to be either 32 or 64 bits (ie either DWORDs or QWORDs). GHC uses the awful GAS AT&T assembler syntax (please implement an Intel syntax variant? pretty please?) which requires size suffixes on instructions and therefore plainly announces its intention to use 16 bit operands, which is illegal. The same instruction appears twice in the assembly listing.
ghczmprim_GHCziPrimopWrappers_pdep8zh_info:
.Lc34s:
.Lc34u:
movzbl %r14b,%r14d
movzbl %sil,%esi
pdepw %si,%r14w,%ax
movzwl %ax,%eax
movq %rax,%rbx
jmp *(%rbp)
and
ghczmprim_GHCziPrimopWrappers_pdep16zh_info:
.Lc34H:
.Lc34J:
pdepw %si,%r14w,%ax
movzwl %ax,%eax
movq %rax,%rbx
jmp *(%rbp)
Given the function names involved, perhaps this illegal assembler code was hand written?
Because of this faulty code GHC fails to compile. Removing the option -mbmi2
allows compilation to proceed, though, obviously, without BMI2 support. In particular, stage-1 of the bootstrap process is built successfully. The next stage is the one that always fails.
Steps to reproduce
This is where it all falls down. It's not obvious how I should explain how to reproduce this. I'm compiling GHC on Gentoo (GNU/Linux) which tends to muck with build systems a bit to honor the various options its package manager supports. As far as I can tell it's not doing anything particularly untoward in this case. The -march=foo
C flags are passed through, and whatever the user has set their HCFLAGS variable to are forcibly added to build.mk before building. Bootstrapping is done by downloading a binary and using it to make stage-1. Nothing surprising there.
It's possible that the stage-1 binary isn't meant to have any flags given to it (particularly -mbmi2
). In that case, building GHC with those flags might produce the problem. This is how Gentoo does it:
# We also need to use the GHC_FLAGS flags when building ghc itself
echo "SRC_HC_OPTS+=${HCFLAGS} ${GHC_FLAGS}" >> mk/build.mk
echo "SRC_CC_OPTS+=${CFLAGS}" >> mk/build.mk
echo "SRC_LD_OPTS+=${LDFLAGS}" >> mk/build.mk
Alternatively you could just try to build it in Gentoo with the same flags enabled as I have?
-mbmi -mbmi2 -msse -msse2 -msse3 -msse4 -msse4.2 -mavx -mavx2 -mavx512cd -mavx512f -O2
There were a few others during my last attempt but the build still fails without them being there. The only C flag to get through is -march=native
, which is to say -march=rocketlake
.
I'm sorry if this is grossly insufficient. I'll attach the build log and try to supply more info if I can.
Expected behavior
Compilation should fail with this message:
"inplace/bin/ghc-stage1" -hisuf hi -osuf o -hcsuf hc -static -H32m -O -mbmi -mbmi2 -msse -msse2 -msse3 -msse4 -msse4.2 -mavx -mavx2 -mavx512cd -mavx512f -O2 -fllvm-pass-vectors-in-regs -fsolve-constant-dicts -fcross-module-specialise -flate-specialise -fmax-simplifier-iterations=6 -rdynamic -H128m -optc-mtune=native -opta-mtune=native -optc-march=native -opta-march=native -optl-Wl,-O1 -optl-Wl,--as-needed -Wall -this-unit-id ghc-prim-0.7.0 -hide-all-packages -i -ilibraries/ghc-prim/. -ilibraries/ghc-prim/dist-install/build -Ilibraries/ghc-prim/dist-install/build -ilibraries/ghc-prim/dist-install/build/./autogen -Ilibraries/ghc-prim/dist-install/build/./autogen -Ilibraries/ghc-prim/. -optP-include -optPlibraries/ghc-prim/dist-install/build/./autogen/cabal_macros.h -package-id rts -this-unit-id ghc-prim -XHaskell2010 -O2 -haddock -no-user-package-db -rtsopts -Wno-trustworthy-safe -Wno-deprecated-flags -Wnoncanonical-monad-instances -outputdir libraries/ghc-prim/dist-install/build -split-sections -dynamic-too -c libraries/ghc-prim/dist-install/build/GHC/PrimopWrappers.hs -o libraries/ghc-prim/dist-install/build/GHC/PrimopWrappers.o -dyno libraries/ghc-prim/dist-install/build/GHC/PrimopWrappers.dyn_o
/var/tmp/portage/dev-lang/ghc-9.0.1/temp/ghc22685_0/ghc_1.s: Assembler messages:
/var/tmp/portage/dev-lang/ghc-9.0.1/temp/ghc22685_0/ghc_1.s:11574:0: error:
Error: invalid instruction suffix for `pext'
|
11574 | pextw %si,%r14w,%ax
| ^
/var/tmp/portage/dev-lang/ghc-9.0.1/temp/ghc22685_0/ghc_1.s:11600:0: error:
Error: invalid instruction suffix for `pext'
|
11600 | pextw %si,%r14w,%ax
| ^
/var/tmp/portage/dev-lang/ghc-9.0.1/temp/ghc22685_0/ghc_1.s:11693:0: error:
Error: invalid instruction suffix for `pdep'
|
11693 | pdepw %si,%r14w,%ax
| ^
/var/tmp/portage/dev-lang/ghc-9.0.1/temp/ghc22685_0/ghc_1.s:11719:0: error:
Error: invalid instruction suffix for `pdep'
|
11719 | pdepw %si,%r14w,%ax
| ^
`x86_64-pc-linux-gnu-gcc' failed in phase `Assembler'. (Exit code: 1)
It also fails for version 8.10.4.
Environment
- GHC version used: Building either 8.10.4 or 9.0.1, stage-1 bootstrap
Optional:
- Operating System: Gentoo GNU/Linux, kernel v5.12.4
- System Architecture:
11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz
- ie x86_64
Here is the log. build.log
EDIT
Not sure if I should actually edit the above or just tack on more information. Guess I'll do the latter for now. I've dug into it a tiny bit and it seems that someone hardcoded the code generator to output the non-existent instruction. I surely can't be the first to notice this? Other people must build GHC with BMI2 enabled. It must have been tested. How could this code work for anyone?
The culprit seems to be in the file compiler/GHC/CmmToAsm/X86/CodeGen.hs
, at least in version 9.0.1. That file isn't there for 8.10.4. Must have been moved. Can't be bothered to find it. Here is the perp itself (I think):
(if width == W8 then
-- The PEXT instruction doesn't take a r/m8
unitOL (MOVZxL II8 (OpReg src_r ) (OpReg src_r )) `appOL`
unitOL (MOVZxL II8 (OpReg mask_r) (OpReg mask_r)) `appOL`
unitOL (PEXT II16 (OpReg mask_r) (OpReg src_r) dst_r)
else
unitOL (PEXT format (OpReg mask_r) (OpReg src_r) dst_r)) `appOL`
Note both the fact that only 8-bit operands are explicitly rejected, and that they're converted to 16 bits. Neither is valid. Only 32 and 64 bit operands are allowed. I wrote an awful patch just to test this and it did at least allow it to compile. Disclaimer: I've never written a single word of haskell in my life before and don't know what I'm doing. I just thought I'd share the info.