Perf improvement for Enum instances
Fixes
Fix #15185 (closed) by ensuring that we export stable (unoptimised) unfoldings for Enum
list producers like enumFromTo
in the interface file so we can do list fusion at usage sites.
See below for benchmark results.
Agreeable side effects:
- Partially fixes #8763 (closed), namely for
[from..to]
withIntN
&WordN
whereN
in8, 16, 32, 64
. - Partially fixes (don't close this, GitLab) #18178, namely for
enumFromTo
andenumFromThenTo
(but notenumFrom
andenumFromThen
).
Change Summary
-
Add
INLINABLE
pragmas toenumFrom
enumFromThen
enumFromTo
enumFromThenTo
boundedEnumFrom
boundedEnumFromThen
-
Use more perfomant implementations in
instance Enum Word64
.
Compiler Performance
nofib
shows unchanged allocations and binary sizes. Allocations on compiling Cabal also unchanged.
Benchmark
benchmarking fact 20/fact @Int
time 25.31 ns (23.67 ns .. 27.03 ns)
0.974 R² (0.962 R² .. 0.988 R²)
mean 23.79 ns (22.82 ns .. 25.16 ns)
std dev 3.528 ns (2.745 ns .. 5.157 ns)
variance introduced by outliers: 96% (severely inflated)
benchmarking fact 20/fact @Int64
time 22.79 ns (21.28 ns .. 24.19 ns)
0.984 R² (0.978 R² .. 0.995 R²)
mean 21.89 ns (21.41 ns .. 22.62 ns)
std dev 1.934 ns (1.534 ns .. 2.698 ns)
variance introduced by outliers: 90% (severely inflated)
benchmarking fact 20/fact @Int32
time 23.66 ns (23.06 ns .. 24.34 ns)
0.994 R² (0.991 R² .. 0.996 R²)
mean 25.12 ns (24.46 ns .. 25.76 ns)
std dev 2.263 ns (1.961 ns .. 2.637 ns)
variance introduced by outliers: 90% (severely inflated)
benchmarking fact 20/fact @Int16
time 21.07 ns (19.90 ns .. 21.99 ns)
0.986 R² (0.981 R² .. 0.991 R²)
mean 20.56 ns (19.95 ns .. 21.24 ns)
std dev 2.272 ns (2.019 ns .. 2.580 ns)
variance introduced by outliers: 93% (severely inflated)
benchmarking fact 20/fact @Int8
time 24.68 ns (23.87 ns .. 25.63 ns)
0.992 R² (0.988 R² .. 0.996 R²)
mean 25.44 ns (24.64 ns .. 26.08 ns)
std dev 2.220 ns (1.888 ns .. 2.714 ns)
variance introduced by outliers: 89% (severely inflated)
benchmarking fact 20/fact @Word
time 21.93 ns (21.40 ns .. 22.63 ns)
0.992 R² (0.988 R² .. 0.995 R²)
mean 23.92 ns (23.15 ns .. 24.74 ns)
std dev 2.877 ns (2.340 ns .. 3.707 ns)
variance introduced by outliers: 94% (severely inflated)
benchmarking fact 20/fact @Word64
time 220.8 ns (213.9 ns .. 229.6 ns) -- Edit: fixed now, see below
0.990 R² (0.985 R² .. 0.995 R²)
mean 232.6 ns (226.2 ns .. 240.5 ns)
std dev 24.42 ns (20.00 ns .. 31.41 ns)
variance introduced by outliers: 91% (severely inflated)
benchmarking fact 20/fact @Word32
time 25.81 ns (24.91 ns .. 26.63 ns)
0.993 R² (0.990 R² .. 0.997 R²)
mean 26.20 ns (25.55 ns .. 26.77 ns)
std dev 2.062 ns (1.765 ns .. 2.466 ns)
variance introduced by outliers: 87% (severely inflated)
benchmarking fact 20/fact @Word16
time 25.00 ns (24.36 ns .. 25.54 ns)
0.996 R² (0.995 R² .. 0.997 R²)
mean 24.30 ns (23.81 ns .. 24.82 ns)
std dev 1.623 ns (1.379 ns .. 1.917 ns)
variance introduced by outliers: 83% (severely inflated)
benchmarking fact 20/fact @Word8
time 24.89 ns (23.13 ns .. 27.63 ns)
0.969 R² (0.946 R² .. 0.997 R²)
mean 24.17 ns (23.60 ns .. 25.36 ns)
std dev 2.597 ns (1.488 ns .. 4.933 ns)
variance introduced by outliers: 93% (severely inflated)
{-# LANGUAGE TypeApplications #-}
module Main where
import Criterion.Main
import Data.Int
import Data.Word
fact :: Integral t => t -> t
fact n = product [1..n]
main :: IO ()
main = defaultMain
[ bgroup "fact 20"
[ bench "fact @Int" $ whnf (fact @Int) 20
, bench "fact @Int64" $ whnf (fact @Int64) 20
, bench "fact @Int32" $ whnf (fact @Int32) 20
, bench "fact @Int16" $ whnf (fact @Int16) 20
, bench "fact @Int8" $ whnf (fact @Int8) 20
, bench "fact @Word" $ whnf (fact @Word) 20
, bench "fact @Word64" $ whnf (fact @Word64) 20
, bench "fact @Word32" $ whnf (fact @Word32) 20
, bench "fact @Word16" $ whnf (fact @Word16) 20
, bench "fact @Word8" $ whnf (fact @Word8) 20
]
]