Skip to content

Better Codegen for ctz16

The codegen for ctz on a 16-bit word is:

    W16 -> toOL
      [ TZCNT  II16 (OpReg src_r) dst_r
      , MOVZxL II16 (OpReg dst_r) (OpReg dst_r)
      ]

This will generate an instruction sequence that looks like (intel syntax, assuming src is r15w and dest is r14w):

tzcnt r14w, r15w
movzx r14w, r14w

The second step, zeroing the upper 48 bits, is necessary because x86's 16-bit tzcnt leaves the upper bits alone. This is however, not the best way to accomplish this. Register renaming for partial registers is uncommon, so on most architectures, the tzcnt instruction incurs a false dependency on r14w. A better instruction sequence would start by zeroing the destination:

xor r14, r14
movzx r14w, r14w 

This eliminates the false dependency and produces the same result.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information