I am seeing this fail under the NCG as well. Specifically, I have this program:
test(bits32buffer){bits64ret;(ret)=prim%bswap64(%neg(%zx64(bits16[buffer+(12::bits32)])));// bad return(ret);}
{-# LANGUAGE NegativeLiterals #-}{-# LANGUAGE UnboxedTuples #-}{-# LANGUAGE MagicHash #-}{-# LANGUAGE ForeignFunctionInterface #-}{-# LANGUAGE GHCForeignImportPrim #-}{-# LANGUAGE UnliftedFFITypes #-}importNumericimportData.BitsimportGHC.PrimimportGHC.WordimportGHC.IntimportGHC.IOimportGHC.PtrimportData.ListimportqualifiedData.ByteStringasBSforeignimportprim"test"c_test::Addr#->State#RealWorld->(#State#RealWorld,Word64##)main::IO()main=doletbs=BS.pack$take100000[fromIntegrali|i<-[(1::Int)..]]n<-BS.useAsCStringbs$\(Ptraddr)->IO$\s->casec_testaddrsof(#s',n#)->(#s',W64#n#)print$showHexn""
which produces 0xf3f1fffff3f1ffff; note the repetition of the 32-bit bit pattern.
I suspect that the problem here is register allocation. Specifically, the register liveness pass produces:
Note at point (1) we are clobbering H2 (which contains the negated high word) prior to byteswapping. The culpable movl is generated by the GHC.CmmToAsm.X86.CodeGen.genByteSwap, although it is writing to the destination of the byteswap operation as one would expect.
The problem appears to be that both MO_S_Neg and MO_BSwap are treating H2 as (part of) their destination. It appears that H2 came into being as a result of a getNewReg64 in Neg. Consequently, it's rather perplexing that genByteSwap seemingly considers H2 to be both part of its source and destination.