Fix memset unroll for small bytearrays
Fixes #16052 (closed)
- DWORD alignment requirement for unrolling is removed.
- When the offset in
setByteArray#
is statically known, we provide better alignment guarantees then just 1 byte. - Also, memset itself can now use 64-bit wide MOVs.
So setByteArray s 0# 24# 1#
will be nicely inlined as
movq $72340172838076673,%rcx
movq %rcx,0(%rbx)
movq %rcx,8(%rbx)
movq %rcx,16(%rbx)
P.S. The current memset intrinsic is not optimal, and can be improved for the case when we know that we deal with
(baseAddress at known alignment) + offset
For instance, on 64-bit
setByteArray# s 1# 23# 0#
given that bytearray is 8 bytes aligned could be unrolled into
movb, movw, movl, movq, movq
but currently it is (sadly)
movb x23
since alignment of 1 is all info we can embed into MO_Memset op.
I tried to do this here but stopped when figured out that I'd need to embed both baseAddrAlign
and offset
into MO_Memset
to do things correctly. It looked like it would require a change in Cmm parsing which is a bit too hairy for me now.