Add Aarch64 clz, ctz and brev primops
Adds assembly implementations of the count leading zeros, count trailing zeros and bit reverse primops for Aarch64 for W8-W64 sizes. The code produced here appears to be better than what the C compiler produces for hs_clz8
and hs_clz16
, which uses conditional moves. I'll make a PR changing the implementations in clz.c
etc. so they can be a little faster on all platforms that use them too.
I also made some of the case statements in compiler/GHC/CmmToAsm/AArch64/Instr.hs
explicitly match all constructors so editor tools can let you know when an instruction is added but the necessary cases aren't handled - it's bitten me a few times already.
Please take a few moments to address the following points:
-
if your MR may break existing programs (e.g. touches base
or causes the compiler to reject programs), please describe the expected breakage and add the user-facing label. This will run ghc/head.hackage> to characterise the effect of your change on Hackage. -
ensure that your commits are either individually buildable or squashed -
ensure that your commit messages describe what they do (referring to tickets using #NNNN
syntax when appropriate) -
have added source comments describing your change. For larger changes you likely should add a [Note][notes] and cross-reference it from the relevant places. -
add a [testcase to the testsuite][adding test]. -
updates the users guide if applicable -
mentions new features in the release notes for the next release
Edited by Alex Mason