Skip to content
  • rwbarton's avatar
    x86: zero extend the result of 16-bit popcnt instructions (#9435) · 64151913
    rwbarton authored
    Summary:
    The 'popcnt r16, r/m16' instruction only writes the low 16 bits of
    the destination register, so we have to zero-extend the result to
    a full word as popCnt16# is supposed to return a Word#.
    
    For popCnt8# we could instead zero-extend the input to 32 bits
    and then do a 32-bit popcnt, and not have to zero-extend the result.
    LLVM produces the 16-bit popcnt sequence with two zero extensions,
    though, and who am I to argue?
    
    Test Plan:
     - ran "make TEST=cgrun071 EXTRA_HC_OPTS=-msse42"
     - then ran again adding "WAY=optasm", and verified that
       the popcnt sequences we generate match the ones produced
       by LLVM for its @llvm.ctpop.* intrinsics
    
    Reviewers: austin, hvr, tibbe
    
    Reviewed By: austin, hvr, tibbe
    
    Subscribers: phaskell, hvr, simonmar, relrod, ezyang, carter
    
    Differential Revision: https://phabricator.haskell.org/D147
    
    GHC Trac Issues: #9435
    64151913