Skip to content

LLVM backend on x86 doesn't use popcnt instruction even if -msse4.2 is set

Summary

LLVM backend doesn't use popcnt instruction even if -msse4.2 is set.

Steps to reproduce

First, #25019 (closed) needs to be fixed and we assume we have "+sse4.2" in compiler/GHC/Driver/Pipeline/Execute.hs.

Then run the following:

$ cat popcounttest.hs
import Data.Bits

{-# NOINLINE foo #-}
foo :: Int -> Int
foo x = 1 + popCount x

main = print (foo 42)
$ ghc -fforce-recomp -S -O2 -msse4.2 popcounttest.hs
$ grep popcnt popcounttest.s
        popcnt %r14,%rax
-> OK, GHC emitted popcnt instruction
$ _build/stage1/bin/ghc -fforce-recomp -S -fllvm -O2 -msse4.2 popcounttest.hs
$ grep popcnt popcounttest.s
-> No match. LLVM backend didn't use popcnt instruction

Expected behavior

LLVM should emit popcnt instruction if -msse4.2 is set.

Observations

Although many compilers (including GCC, Clang, and GHC NCG) treat popcnt as a part of SSE4.2, popcnt has a different CPUID flag than SSE4.2. LLVM's attribute system honors the difference and has a different attribute for popcnt (X86.td). So we should pass LLVM +popcnt in addition to +sse4.2.

Environment

  • GHC version used: 9.13.20241006 (c9590ba0) with a patch to fix #25019 (closed)
  • Operating System: Linux
  • System Architecture: x86_64
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information