Add bit deposit and bit extraction primops
Modern CPUs (on x86, Haswell and newer) have a PDEP instruction for efficient bit deposit and a PEXT instruction for efficient bit extraction. These instructions can be used to implement various data structures.
I propose we add the following set of primops
pdep8# :: Word# -> Word# -> Word#
pdep16# :: Word# -> Word# -> Word#
pdep32# :: Word# -> Word# -> Word#
pdep64# :: Word64# -> Word64# -> Word64#
pdep# :: Word# -> Word# -> Word#
pext8# :: Word# -> Word# -> Word#
pext16# :: Word# -> Word# -> Word#
pext32# :: Word# -> Word# -> Word#
pext64# :: Word64# -> Word64# -> Word64#
pext# :: Word# -> Word# -> Word#
Each primop compiles into either a single PDEP/PEXT instruction or a call to some fallback function, implemented in C.
For reference, see the following library that implements FFI wrapper for 32-bit and 64-bit functions for these instructions: