Skip to content

Potential optimization when quotRem divisor is a power of 2

Reported by @cocreature on Twitter: https://twitter.com/cocreature/status/1158643061121015808

Here's a smaller reproducer: https://gist.github.com/osa1/391be0dfdf0f291468f750a0d2859e50

Comment out the quotRem line and uncomment the next two lines. Results:

-- quotRem:
benchmarking mangling new
time                 918.8 ns   (916.1 ns .. 922.3 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 919.6 ns   (916.8 ns .. 923.1 ns)
std dev              10.39 ns   (7.976 ns .. 13.56 ns)

-- shift + mask
benchmarking mangling new
time                 405.3 ns   (404.0 ns .. 406.6 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 405.7 ns   (404.4 ns .. 407.1 ns)
std dev              4.365 ns   (3.488 ns .. 5.456 ns)

So the shift + mask implementation is twice as fast.

As far as I can see we don't have any optimizations when quotRem divisor is a power of 2, and the implementation on Int uses a primop, which is lowered to a div instruction on x86.

It may worth adding a rewrite rule for this case and generate a shift + mask.

Environment

  • GHC version used: 8.6.5
Edited by Ömer Sinan Ağacan
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information