Use wasm clz/ctz/popcnt in wasm NCG
Currently, the wasm NCG lowers MO_Clz/MO_Ctz/MO_PopCnt to call cbits defined in ghc-prim. This was a temporary fix to #22470 (closed), mainly to get correct behavior when the operand size is W8/W16. However, this comes with unnecessary performance & code size overhead for W32/W64 operands when a single clz/ctz/popcnt instruction can be emitted.
This issue is a feature request to use wasm clz/ctz/popcnt instructions in the wasm NCG, at least for W32/W64 operands.