AArch64 ncg: `signExtendReg` and `truncateReg` seem suspicious.
This is half a note to myself, half a call to fix/improve the situation around these. I haven't (yet) found anything going wrong with these but if nothing else it's confusing.
Sign extend looks like that:
-- | Instructions to sign-extend the value in the given register from width @w@
-- up to width @w'@.
signExtendReg :: Width -> Width -> Reg -> NatM (Reg, OrdList Instr)
signExtendReg w w' r =
case w of
W64 -> noop
W32
| w' == W32 -> noop
| otherwise -> extend SXTH
W16 -> extend SXTH
W8 -> extend SXTB
_ -> panic "intOp"
where
noop = return (r, nilOL)
extend instr = do
r' <- getNewRegNat II64
return (r', unitOL $ instr (OpReg w' r') (OpReg w' r))
What does up to width w'
really mean here? Looking at SXTB
for example:
Signed Extend Byte extracts an 8-bit value from a register, sign-extends it to the size of the register, and writes the result to the destination register.
But if we have signExtendReg r W8 W16
we get SXTB w0 w0
which will sign extend w0
up to 32 bits. At best this is confusing at worst this is a bug in waiting. But it's probably fine to just change the docs to say:
Instructions to sign-extend the value in the given register from width @w@ up to width @opRegWidth w'@.
And maybe mention that it doesn't uphold the invariant layed out in Note [Signed arithmetic on AArch64]
... For simplicity we maintain the invariant that a register containing a sub-word-size value always contains the zero-extended form of that value in between operations. ...
truncateReg has different and more correctness relevant issue.
-- | Instructions to truncate the value in the given register from width @w@
-- down to width @w'@.
truncateReg :: Width -> Width -> Reg -> OrdList Instr
truncateReg w w' r =
case w of
W64 -> nilOL
W32
| w' == W32 -> nilOL
_ -> unitOL $ UBFM (OpReg w r)
(OpReg w r)
(OpImm (ImmInt 0))
(OpImm $ ImmInt $ widthInBits w' - 1)
If we have x0 = 0xff...ff at 64bit width and we try to truncate to W8 to uphold the invariant for subwords we simply do nothing leaving the high bits as is.
That just seems wrong. Luckily I believe currently we will always truncate from 32bit if we deal with subwords so things still work out. But I haven't verified this and it's definitely a bug in waiting.