Better Codegen for ctz16
The codegen for ctz
on a 16-bit word is:
W16 -> toOL
[ TZCNT II16 (OpReg src_r) dst_r
, MOVZxL II16 (OpReg dst_r) (OpReg dst_r)
]
This will generate an instruction sequence that looks like (intel syntax, assuming src is r15w
and dest is r14w
):
tzcnt r14w, r15w
movzx r14w, r14w
The second step, zeroing the upper 48 bits, is necessary because x86's 16-bit tzcnt leaves the upper bits alone. This is however, not the best way to accomplish this. Register renaming for partial registers is uncommon, so on most architectures, the tzcnt
instruction incurs a false dependency on r14w
. A better instruction sequence would start by zeroing the destination:
xor r14, r14
movzx r14w, r14w
This eliminates the false dependency and produces the same result.