Skip to content

Draft: Add SIMD support to the X86 NCG

sheaf requested to merge wip/ncg-simd into master

This draft MR tracks work in progress on adding support for SIMD to the NCG.

The main work consists in fixing the register allocator to properly handle vector registers (e.g. to spill a 128-bit wide vector to two consecutive stack slots).

It also adds a new family of vector shuffle primops shuffle{Ty}X{N}# (e.g. shuffleDoubleX2#, shuffleFloatX4#), adding support to them to the LLVM backend and to the X86 NCG.

The SIMD NCG wiki page contains further details about the work and general plan.

TODO:

  • Clean up the commits and remaining TODOs in the code.
  • Cherry-pick !12857.
  • Ensure the documentation and error messages accurately reflect the availability of vector primops on different architectures.
  • Fix regressions in compile-time allocations in T12707 (+1.6%), T3294 (+3.5%), T4801 (+4.5%).
  • Get the test-suite passing.

Things that will not be tackled in this MR (but should not present any particular difficulty after the current work is complete):

  • 256-bit and 512-bit wide X86 operations (with AVX2/AVX512).
  • AArch64 support for 128-bit vectors using the NEON instruction set, and PowerPC support for 128-bit vectors using the VMX/AltiVec instruction set.
Edited by sheaf

Merge request reports