Implement operations on sized boxed types with sizes <= with 32bit ops instead.

Currently we have these kind of implementations:

instance Num Int8 where
    (I8# x#) + (I8# y#)    = I8# (x# `plusInt8#` y#)
    (I8# x#) - (I8# y#)    = I8# (x# `subInt8#` y#)
    (I8# x#) * (I8# y#)    = I8# (x# `timesInt8#` y#)
    negate (I8# x#)        = I8# (negateInt8# x#)
    abs x | x >= 0         = x
          | otherwise      = negate x
    signum x | x > 0       = 1
    signum 0               = 0
    signum _               = -1
    fromInteger i          = I8# (intToInt8# (integerToInt# i))

However in #20405 we realized this kind of implementation can cause partial register stalls and we reverted most of the ops back to using sized primOps. See !8519 (merged)

However we seem to have missed the Num Instances (at the very least). Further for x86 it would be better to use 32bit operations rather than 64bit operations. E.g. use plusInt32# instead of plusInt# for anything sized <= 32. But I don't know how the situation is on ARM in that aspect.

It might also be annoying (and perhaps inefficient) to implement this without #22219.

I don't think there is a large pressure to act on this immediately. But I think it's worth looking into in the medium term.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information