stimes and stimesMonoid are too slow
Summary
stimes
and stimesMonoid
do not specialize, ruining performance.
Steps to reproduce
Define a Semigroup
instance for a type and use stimes
/stimesMonoid
.
Here's an example:
{-# LANGUAGE BangPatterns #-}
import Criterion.Main
import qualified Data.Semigroup as S
main :: IO ()
main = defaultMain
[ bench "S.stimes" $ whnf (S.stimes exponent) (Modulo 42)
, bench "stimesDefault" $ whnf (stimesDefault exponent) (Modulo 42)
]
where
!exponent = 10^9 :: Int
modulus :: Int
modulus = 10^9 + 7
newtype Modulo = Modulo Int
instance Semigroup Modulo where
Modulo x <> Modulo y = Modulo (x * y `mod` modulus)
-- From base/Data/Semigroup/Internal.hs
stimesDefault :: (Integral b, Semigroup a) => b -> a -> a
stimesDefault y0 x0
| y0 <= 0 = errorWithoutStackTrace "stimes: positive multiplier expected"
| otherwise = f x0 y0
where
f x y
| even y = f (x <> x) (y `quot` 2)
| y == 1 = x
| otherwise = g (x <> x) (y `quot` 2) x -- See Note [Half of y - 1]
g x y z
| even y = g (x <> x) (y `quot` 2) z
| y == 1 = x <> z
| otherwise = g (x <> x) (y `quot` 2) (x <> z) -- See Note [Half of y - 1]
On GHC 9.2.5 with -O2:
benchmarking S.stimes
time 1.942 μs (1.937 μs .. 1.949 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 1.946 μs (1.942 μs .. 1.950 μs)
std dev 14.23 ns (9.975 ns .. 21.48 ns)
benchmarking stimesDefault
time 270.6 ns (270.5 ns .. 270.8 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 270.7 ns (270.6 ns .. 270.8 ns)
std dev 461.5 ps (323.6 ps .. 844.1 ps)
Which makes it ~7x slower.
Expected behavior
stimes
and stimesMonoid
should be fast.
Environment
- GHC version used: 9.2.5