Skip to content

Fix memset unroll for small bytearrays

Artem Pyanykh requested to merge artempyanykh/ghc:T16052-ba-memset-unroll into master

Fixes #16052 (closed)

  • DWORD alignment requirement for unrolling is removed.
  • When the offset in setByteArray# is statically known, we provide better alignment guarantees then just 1 byte.
  • Also, memset itself can now use 64-bit wide MOVs.

So setByteArray s 0# 24# 1# will be nicely inlined as

movq $72340172838076673,%rcx
movq %rcx,0(%rbx)
movq %rcx,8(%rbx)
movq %rcx,16(%rbx)

P.S. The current memset intrinsic is not optimal, and can be improved for the case when we know that we deal with

(baseAddress at known alignment) + offset

For instance, on 64-bit

setByteArray# s 1# 23# 0#

given that bytearray is 8 bytes aligned could be unrolled into

movb, movw, movl, movq, movq

but currently it is (sadly)

movb x23

since alignment of 1 is all info we can embed into MO_Memset op.

I tried to do this here but stopped when figured out that I'd need to embed both baseAddrAlign and offset into MO_Memset to do things correctly. It looked like it would require a change in Cmm parsing which is a bit too hairy for me now.

Merge request reports