Skip to content

Add SIMD support to the X86 NCG

sheaf requested to merge wip/ncg-simd into master

This MR adds SIMD support to the X86 NCG.

Main changes:

  • Introduction of vector formats (GHC.CmmToAsm.Format).
  • Introduction of 128-bit virtual register (GHC.Platform.Reg), and removal of unused Float virtual register.
  • Refactor of GHC.Platform.Reg.Class.RegClass: it now only contains two classes, RcInteger (for general purpose registers) and RcFloatOrVector (for registers that can be used for scalar floating point values as well as vectors).
  • Modify GHC.CmmToAsm.X86.Instr.regUsageOfInstr to keep track of which format each register is used at, so that the register allocator can know if it needs to spill the entire vector register or just the lower 64 bits.
  • Modify spill/load/reg-2-reg code to account for vector registers (GHC.CmmToAsm.X86.Instr.{mkSpillInstr, mkLoadInstr, mkRegRegMoveInstr, takeRegRegMoveInstr}).
  • Modify the register allocator code (GHC.CmmToAsm.Reg.*) to propagate the format we are storing in any given register, for instance changing Reg to RegFormat or GlobalReg to GlobalRegUse.
  • Add logic to lower vector MachOps to X86 assembly (see GHC.CmmToAsm.X86.CodeGen)
  • Minor cleanups to genprimopcode, to remove the llvm_only attribute which is no longer applicable.

It also adds a new family of vector shuffle primops shuffle{Ty}X{N}# (e.g. shuffleDoubleX2#, shuffleFloatX4#), and vector FMA primops (such as fmaddFloatX4#), adding support to them to the LLVM backend and to the X86 NCG.

Further items of work to be tackled after this MR lands are tracked in #25030.

TODO:

  • Benchmark the compiler to ensure we are not regressing in runtime. The register allocator regresses in compile-time allocations by up to 5% in certain tests because it needs to track the type individual registers are used at on top of the register itself. This information is critical for getting correct spill/unspill code, as we need to know for a 128 bit wide register whether we need to spill/load the whole register or only the lower 64 bits.
Edited by sheaf

Merge request reports