Add fused multiply-add instructions
This patch adds eight new primops that fuse a multiplication and an addition or subtraction: - `{fmadd,fmsub,fnmadd,fnmsub}{Float,Double}#` fmadd x y z is x * y + z, computed with a single rounding step. This patch implements code generation for these primops in the following backends: - X86, AArch64 and PowerPC NCG, - LLVM - C WASM uses the C implementation. The primops are unsupported in the JavaScript backend. The following constant folding rules are also provided: - compute a * b + c when a, b, c are all literals, - x * y + 0 ==> x * y, - ±1 * y + z ==> z ± y and x * ±1 + z ==> z ± x. NB: the constant folding rules incorrectly handle signed zero. This is a known limitation with GHC's floating-point constant folding rules (#21227), which we hope to resolve in the future.
Showing
- compiler/GHC/Builtin/primops.txt.pp 69 additions, 0 deletionscompiler/GHC/Builtin/primops.txt.pp
- compiler/GHC/Cmm/MachOp.hs 38 additions, 0 deletionscompiler/GHC/Cmm/MachOp.hs
- compiler/GHC/Cmm/Parser.y 6 additions, 1 deletioncompiler/GHC/Cmm/Parser.y
- compiler/GHC/CmmToAsm/AArch64/CodeGen.hs 39 additions, 2 deletionscompiler/GHC/CmmToAsm/AArch64/CodeGen.hs
- compiler/GHC/CmmToAsm/AArch64/Instr.hs 19 additions, 0 deletionscompiler/GHC/CmmToAsm/AArch64/Instr.hs
- compiler/GHC/CmmToAsm/AArch64/Ppr.hs 7 additions, 0 deletionscompiler/GHC/CmmToAsm/AArch64/Ppr.hs
- compiler/GHC/CmmToAsm/PPC/CodeGen.hs 34 additions, 1 deletioncompiler/GHC/CmmToAsm/PPC/CodeGen.hs
- compiler/GHC/CmmToAsm/PPC/Instr.hs 11 additions, 0 deletionscompiler/GHC/CmmToAsm/PPC/Instr.hs
- compiler/GHC/CmmToAsm/PPC/Ppr.hs 18 additions, 0 deletionscompiler/GHC/CmmToAsm/PPC/Ppr.hs
- compiler/GHC/CmmToAsm/Wasm/FromCmm.hs 3 additions, 1 deletioncompiler/GHC/CmmToAsm/Wasm/FromCmm.hs
- compiler/GHC/CmmToAsm/X86/CodeGen.hs 79 additions, 13 deletionscompiler/GHC/CmmToAsm/X86/CodeGen.hs
- compiler/GHC/CmmToAsm/X86/Instr.hs 21 additions, 1 deletioncompiler/GHC/CmmToAsm/X86/Instr.hs
- compiler/GHC/CmmToAsm/X86/Ppr.hs 23 additions, 0 deletionscompiler/GHC/CmmToAsm/X86/Ppr.hs
- compiler/GHC/CmmToC.hs 21 additions, 1 deletioncompiler/GHC/CmmToC.hs
- compiler/GHC/CmmToLlvm/CodeGen.hs 26 additions, 2 deletionscompiler/GHC/CmmToLlvm/CodeGen.hs
- compiler/GHC/Core/Opt/ConstantFold.hs 156 additions, 0 deletionscompiler/GHC/Core/Opt/ConstantFold.hs
- compiler/GHC/Driver/Config/StgToCmm.hs 22 additions, 0 deletionscompiler/GHC/Driver/Config/StgToCmm.hs
- compiler/GHC/Driver/Pipeline/Execute.hs 1 addition, 0 deletionscompiler/GHC/Driver/Pipeline/Execute.hs
- compiler/GHC/Driver/Session.hs 10 additions, 1 deletioncompiler/GHC/Driver/Session.hs
- compiler/GHC/Llvm/Ppr.hs 8 additions, 0 deletionscompiler/GHC/Llvm/Ppr.hs
Loading
Please register or sign in to comment