Skip to content

stimes and stimesMonoid are too slow

Summary

stimes and stimesMonoid do not specialize, ruining performance.

Steps to reproduce

Define a Semigroup instance for a type and use stimes/stimesMonoid.
Here's an example:

{-# LANGUAGE BangPatterns #-}
import Criterion.Main
import qualified Data.Semigroup as S

main :: IO ()
main = defaultMain
  [ bench "S.stimes" $ whnf (S.stimes exponent) (Modulo 42)
  , bench "stimesDefault" $ whnf (stimesDefault exponent) (Modulo 42)
  ]
  where
    !exponent = 10^9 :: Int

modulus :: Int
modulus = 10^9 + 7

newtype Modulo = Modulo Int

instance Semigroup Modulo where
  Modulo x <> Modulo y = Modulo (x * y `mod` modulus)

-- From base/Data/Semigroup/Internal.hs
stimesDefault :: (Integral b, Semigroup a) => b -> a -> a
stimesDefault y0 x0
  | y0 <= 0   = errorWithoutStackTrace "stimes: positive multiplier expected"
  | otherwise = f x0 y0
  where
    f x y
      | even y = f (x <> x) (y `quot` 2)
      | y == 1 = x
      | otherwise = g (x <> x) (y `quot` 2) x        -- See Note [Half of y - 1]
    g x y z
      | even y = g (x <> x) (y `quot` 2) z
      | y == 1 = x <> z
      | otherwise = g (x <> x) (y `quot` 2) (x <> z) -- See Note [Half of y - 1]

On GHC 9.2.5 with -O2:

benchmarking S.stimes
time                 1.942 μs   (1.937 μs .. 1.949 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.946 μs   (1.942 μs .. 1.950 μs)
std dev              14.23 ns   (9.975 ns .. 21.48 ns)

benchmarking stimesDefault
time                 270.6 ns   (270.5 ns .. 270.8 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 270.7 ns   (270.6 ns .. 270.8 ns)
std dev              461.5 ps   (323.6 ps .. 844.1 ps)

Which makes it ~7x slower.

Expected behavior

stimes and stimesMonoid should be fast.

Environment

  • GHC version used: 9.2.5
Edited by meooow
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information