GHC generates poor code for repeated uses of min/max

Consider the following module, which intends to implement a branchless ray-AABB intersection test:

module SimpleGeom where

data Vec3 = Vec3
    {-# UNPACK #-} !Double
    {-# UNPACK #-} !Double
    {-# UNPACK #-} !Double
data Ray = Ray !Vec3 !Vec3 !Vec3
data AABB = AABB !Vec3 !Vec3

testRayAABBIntersection :: Ray -> AABB -> Bool
testRayAABBIntersection (Ray (Vec3 ox oy oz) _ (Vec3 invDx invDy invDz))
        (AABB (Vec3 minX minY minZ) (Vec3 maxX maxY maxZ)) =
    let tx1 = (minX - ox) * invDx
        tx2 = (maxX - ox) * invDx
        ty1 = (minY - oy) * invDy
        ty2 = (maxY - oy) * invDy
        tz1 = (minZ - oz) * invDz
        tz2 = (maxZ - oz) * invDz
        tmin = min tx1 tx2 `max` min ty1 ty2 `max` min tz1 tz2
        tmax = max tx1 tx2 `min` max ty1 ty2 `min` max tz1 tz2
    in tmax >= max 0 tmin

Everything is strict primitive operations, so GHC should generate very simple, fast code, right? But upon compiling with ghc -O -ddump-simpl -ddump-to-file SimpleGeom, I found a mess of nested local lambdas and similar performance-killing expression forms. (See the attached output file.)

There are two possible issues I can see here.

GHC is trying to expand out all of the branches recursively (I would presume via case-of-cases transformation), which is a bad idea in this instance compared to just performing the cases sequentially and storing their results.
GHC is generating branches for floating-point min/max. Instruction sets like SSE2 include non-branching floating-point min/max instructions, which is exactly what this algorithm was designed to exploit, but GHC does not generate code that could take advantage of these instructions.

Trac metadata

Trac field	Value
Version	7.8.2
Type	Bug
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system	Windows
Architecture

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message

GHC generates poor code for repeated uses of min/max