-
rwbarton authored
Summary: The 'popcnt r16, r/m16' instruction only writes the low 16 bits of the destination register, so we have to zero-extend the result to a full word as popCnt16# is supposed to return a Word#. For popCnt8# we could instead zero-extend the input to 32 bits and then do a 32-bit popcnt, and not have to zero-extend the result. LLVM produces the 16-bit popcnt sequence with two zero extensions, though, and who am I to argue? Test Plan: - ran "make TEST=cgrun071 EXTRA_HC_OPTS=-msse42" - then ran again adding "WAY=optasm", and verified that the popcnt sequences we generate match the ones produced by LLVM for its @llvm.ctpop.* intrinsics Reviewers: austin, hvr, tibbe Reviewed By: austin, hvr, tibbe Subscribers: phaskell, hvr, simonmar, relrod, ezyang, carter Differential Revision: https://phabricator.haskell.org/D147 GHC Trac Issues: #9435
64151913