Merge branch 'wip/simd'
This merge revises and extends the current SIMD support in GHC. Notable features: * Support for AVX, AVX2, and AVX-512. Support for AVX-512 is untested. * SIMD primops are currently LLVM-only and documented in compiler/prelude/primops.txt.pp. * By default only 128-bit wide SIMD vectors are passed in registers, and then only on the X86_64 architecture. There is a "hidden" flag, -fllvm-pass-vectors-in-regs, that causes GHC to generate LLVM code that assumes all vectors are passed in registers by LLVM. This can be used with a suitably patched version of LLVM, and if we get LLVM 3.4 patched, we can consider turning it on by default for LLVM 3.4+. This would mean that we couldn't mix LLVM <3.4-compiled object files with LLVM >=3.4-compiled object files, but I don't see that as much of a problem. * utils/genprimcode has been hacked up to allow us to write vector operations once and have them instantiated at multiple vector types. I'm not thrilled with this solution, but after discussing with Simon PJ, what I've implemented seems to be the minimal reasonable solution to the problem of exploding primop boilerplate. The changes are documented in compiler/prelude/primops.txt.pp. * Error handling is sub-optimal. My patch checks to make sure that vector primops can be compiled efficiently based on the current set of dynamic flags. For example, if -mavx is not specified and the user tries to use a primop that adds together two 256-bit wide vectors of double-precision elements, the user will see an error message like: ghc-stage2: sorry! (unimplemented feature or known bug) (GHC version 7.7.20130916 for x86_64-unknown-linux): 256-bit wide floating point SIMD vector instructions require at least -mavx.