Avoid calling memcpy/memset for MO_Memcpy/MO_Memset in the wasm backend
Currently, the wasm backend always lowers a MO_Memcpy
/MO_Memset
instruction to a ccall to memcpy
/memset
. The -fmax-inline-memcpy-insns=⟨n⟩
/-fmax-inline-memset-insns=⟨n⟩
GHC flags are ignored completely.
We can do something similar to the X86 NCG though: check these flags, and unroll small MO_Memcpy
/MO_Memset
instructions to inplace memory opcodes, avoiding the need for a ccall. When the size exceeds the threshold, we can emit a single memory.copy
/memory.fill
opcode and avoid the ccall anyway, since we already rely on the wasm bulk-memory extension.
Implementation is straightforward. The tricky part is coming up with an appropriate threshold value for wasm. memory.copy
/memory.fill
may be slower for small sizes, due to implementation detail of V8 and other runtimes, but we don't want to come up with an arbitrary number here. This issue should be revisited only after we have the infra to run benchmark suites like nofib
.