[#24956] Aarch64 reduce primop calls

Alex Mason requested to merge Axman6/ghc:aarch64-reduce-primop-calls into master

#24956 - Adds several new instructions to avoid making calls to primops.


  • floating point sqrt via the fsqrt instruction
  • bswap/byteSwap{,32,16}# using the REV (32 and 64 bit) and REV16 instrutions

The BREV32 instruction is currently commented out as it is only defined on 64 bit inputs, and reverser the order of the bytes in each 32 bit section of the input. It might be nice to expose these as primops so algorithms which need to reverse the byte order of 16/32 bit values have a more efficient implementation.

