LLVM backend on x86 doesn't use popcnt instruction even if -msse4.2 is set
Summary
LLVM backend doesn't use popcnt instruction even if -msse4.2
is set.
Steps to reproduce
First, #25019 (closed) needs to be fixed and we assume we have "+sse4.2"
in compiler/GHC/Driver/Pipeline/Execute.hs
.
Then run the following:
$ cat popcounttest.hs
import Data.Bits
{-# NOINLINE foo #-}
foo :: Int -> Int
foo x = 1 + popCount x
main = print (foo 42)
$ ghc -fforce-recomp -S -O2 -msse4.2 popcounttest.hs
$ grep popcnt popcounttest.s
popcnt %r14,%rax
-> OK, GHC emitted popcnt instruction
$ _build/stage1/bin/ghc -fforce-recomp -S -fllvm -O2 -msse4.2 popcounttest.hs
$ grep popcnt popcounttest.s
-> No match. LLVM backend didn't use popcnt instruction
Expected behavior
LLVM should emit popcnt
instruction if -msse4.2
is set.
Observations
Although many compilers (including GCC, Clang, and GHC NCG) treat popcnt
as a part of SSE4.2, popcnt
has a different CPUID flag than SSE4.2.
LLVM's attribute system honors the difference and has a different attribute for popcnt
(X86.td).
So we should pass LLVM +popcnt
in addition to +sse4.2
.
Environment
- GHC version used: 9.13.20241006 (c9590ba0) with a patch to fix #25019 (closed)
- Operating System: Linux
- System Architecture: x86_64