Remove false dependency on the destination of the popcnt instruction
fryguybob writes in D3539,
Some Intel processors appear to have a false dependency on the destination of the popcnt instruction. This could lead to poor performance. A simple way to prevent this is to clear the destination register immediately before the popcnt instruction. Currently I can't produce code from GHC where the source and destination registers are not the same (perhaps someone is interested in producing a test case that does). I'm putting this here in case anyone is interested in investigating further.
I'm opening this ticket so I can bump the diff out of the review queue, in hopes that someone might some day pick it up.