Improve performance of a few functions in Foreign.Marshal.*
A number of functions in Foreign.Marshal.* are relatively slow. The reasons for it are:
- Division and multiplication operations when determining the size of memory block in words (bit shifts should be used instead).
- The functions do not get inlined and so do not optimize away things dependent on the data type in question.
A couple of patches fix at least some of the performance issues. With both of them applied, the results of performance improvement, as tested by a basic benchmark in non-threaded RTS are:
TEST NAME BEFORE AFTER withCString: 146.391 ns 133.646 ns alloca: 51.424 ns 15.208 ns allocaBytes: 31.872 ns 14.501 ns mallocForeignPointer: 34.630 ns 17.498 ns bytestring: 94.872 ns 58.938 ns mvar: 61.473 ns 54.806 ns alloca+advancePtr: 54.480 ns 14.687 ns new/finalizerFree: 61.172 ns 44.144 ns with: 69.096 ns 14.600 ns
Please could someone take a look at the patches I offer and merge them into the repository?
One of them is for the runtime system (definitions for Cmm), another one is for Foreign.Marsha.*.