adding locality levels to prefetch# and friends
currently in HEAD / 7.7, the prefetch primop only does the equivalent of __builtin_prefetch(ptr,0,3) (0 here denoting a read prefetch, 3 denoting "high locality, keep this in cache for a while")
On modern hardware, we can do better! For many natural use case of prefetch in haskell, we have a streaming workload, so we want a prefetch to also hint that "once we've worked on a piece of memory, no need to keep it around for any period of time, don't pollute our memory with it"
the attached patch takes each prefetchBlah# operation (where blah=ByteArray,MutableByteArray, or Addr) and replaces it with prefetchBlah0# through prefetchBlah3#, and passes the integer information through into the code generators.
To make the engineering reasonable, I enriched MO_Prefetch_Data with an Int parameter (which must be a value between ranging 0-3). Theres probably a better way to model the locality paramter, but that maps directly to how its used in llvm code gen.
This patch does not include a test case yet. (interestingly, currently the test suite doesn't have any tests for the prefetch primops as yet!). So that needs to be added.
Also, theres no good reason for the prefetch ops to be LLVM only.
worst case we could just treat them as noop's and drop them. But at least for x86, should be easy to add the support (though if anyone wants to add support for the ppc and sparc stuff, or help me do that, that'd be awesome too!) . I"ll look into doing that in a few days.
theres probably some other things i'm overlooking.
anyways, i'll attach a preliminary patch for feedback now