Add SIMD support to the X86 NCG
This MR adds SIMD support to the X86 NCG.
Main changes:
- Introduction of vector formats (
GHC.CmmToAsm.Format
). - Introduction of 128-bit virtual register (
GHC.Platform.Reg
), and removal of unusedFloat
virtual register. - Refactor of
GHC.Platform.Reg.Class.RegClass
: it now only contains two classes,RcInteger
(for general purpose registers) andRcFloatOrVector
(for registers that can be used for scalar floating point values as well as vectors). - Modify
GHC.CmmToAsm.X86.Instr.regUsageOfInstr
to keep track of which format each register is used at, so that the register allocator can know if it needs to spill the entire vector register or just the lower 64 bits. - Modify spill/load/reg-2-reg code to account for vector registers
(
GHC.CmmToAsm.X86.Instr.{mkSpillInstr, mkLoadInstr, mkRegRegMoveInstr, takeRegRegMoveInstr}
). - Modify the register allocator code (
GHC.CmmToAsm.Reg.*
) to propagate the format we are storing in any given register, for instance changingReg
toRegFormat
orGlobalReg
toGlobalRegUse
. - Add logic to lower vector
MachOp
s to X86 assembly (seeGHC.CmmToAsm.X86.CodeGen
) - Change the way we generate the Cmm in
AutoApply.cmm
, properly allocating vector registers and doing the appropriate load/stores. This fixes the segfaults reported in #25062 (closed). - Minor cleanups to genprimopcode, to remove the llvm_only attribute which is no longer applicable.
It also adds a new family of vector shuffle primops shuffle{Ty}X{N}#
(e.g. shuffleDoubleX2#
, shuffleFloatX4#
), and vector FMA primops (such as fmaddFloatX4#
), adding support to them to the LLVM backend and to the X86 NCG.
Further items of work to be tackled after this MR lands are tracked in #25030.
TODO:
-
Benchmark the compiler to ensure we are not regressing in runtime. The register allocator regresses in compile-time allocations by up to 5% in certain tests because it needs to track the type individual registers are used at on top of the register itself. This information is critical for getting correct spill/unspill code, as we need to know for a 128 bit wide register whether we need to spill/load the whole register or only the lower 64 bits.
Edited by sheaf