exploring calling convention changes and related engineering for 7.10
I'm creating this as a master ticket for systematically exploring and benchmarking (possibly breaking) changes to the GHC calling convention. This work will initially focus on x86/x86_64, but may have larger scope.
- if possible, improve performance systematically for code run on recent CPU micro architecture generations (eg sandybridge, haswell, and future extensions like Knights Landing and Skylake), ideally without pessimizing code on other or older x86_64 micro-architectures
- Try to bring native and llvm codegens closer to feature parity (or at the very least, do not widen their capability gap)
- a few other pieces too, will amend this ticket as ideas / plans clarify