• Simon Marlow's avatar
    Implement SSE2 floating-point support in the x86 native code generator (#594) · 335b9f36
    Simon Marlow authored
    The new flag -msse2 enables code generation for SSE2 on x86.  It
    results in substantially faster floating-point performance; the main
    reason for doing this was that our x87 code generation is appallingly
    bad, and since we plan to drop -fvia-C soon, we need a way to generate
    half-decent floating-point code.
    The catch is that SSE2 is only available on CPUs that support it (P4+,
    AMD K8+).  We'll have to think hard about whether we should enable it
    by default for the libraries we ship.  In the meantime, at least
    -msse2 should be an acceptable replacement for "-fvia-C
    -optc-ffast-math -fexcess-precision".
    SSE2 also has the advantage of performing all operations at the
    correct precision, so floating-point results are consistent with other
    I also tweaked the x87 code generation a bit while I was here, now
    it's slighlty less bad than before.