Add Aarch64 clz, ctz and brev primops

Alex Mason requested to merge Axman6/ghc:wip/aarch64-clz-ctz into master

Adds assembly implementations of the count leading zeros, count trailing zeros and bit reverse primops for Aarch64 for W8-W64 sizes. The code produced here appears to be better than what the C compiler produces for hs_clz8 and hs_clz16, which uses conditional moves. I'll make a PR changing the implementations in clz.c etc. so they can be a little faster on all platforms that use them too.

I also made some of the case statements in compiler/GHC/CmmToAsm/AArch64/Instr.hs explicitly match all constructors so editor tools can let you know when an instruction is added but the necessary cases aren't handled - it's bitten me a few times already.

