Lower level memcpy primop
I'm doing some really low-level buffer stuff, and it would be quite helpful if there were lower-level memcpy primitives.
I'm thinking something like:
copyBytes# :: Addr# -> Addr# -> Int# -> State# s -> State# s copyBytes dst src size _ = ...
I would expect it to compile, on x86, into "rep movsb", which is frequently optimal on Ivy Bridge and Haswell.
The other memcpy primitives use Array# or MutableArray#, and it's not clear how, if all you have is an Addr#, how to get the corresponding Array# or MutableArray#. (Can you just cast?)
It may be appropriate to have corresponding 16-bit and 32-bit operations; "rep movsw" and "rep movsd", respectively.