GHC should produce FMA instructions even if `-mfma` is not set on AArch64
Motivation
Currently, GHC emits FMA instructions only when -mfma
is set.
This flag is necessary on x86 because only newer CPUs (Haswell/Ryzen or later) support them.
However, AArch64 has FMA instructions from the beginning, so the flag is redundant.
Here is an example to test the effect of -mfma
:
$ cat TwoProdFMA.hs
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE UnboxedTuples #-}
module TwoProdFMA where
import GHC.Exts
twoProductFloat# :: Float# -> Float# -> (# Float#, Float# #)
twoProductFloat# x y = let !r = x `timesFloat#` y
in (# r, fmsubFloat# x y r #)
$ ghc -S -fforce-recomp -O2 TwoProdFMA.hs
$ grep -E "fn?m[as]" TwoProdFMA.s
bl _fmaf
$ ghc -S -fforce-recomp -O2 -mfma TwoProdFMA.hs
$ grep -E "fn?m[as]" TwoProdFMA.s
fnmsub s9, s8, s9, s31
Tested with GHC 9.8.1 and GHC 9.9.20240105 (90ea574e) on AArch64 macOS.
Proposal
On AArch64, GHC should emit FMA instructions whether -mfma
is set or not.
I believe the same goes for PowerPC, but I don't have an environment to test it.