Enum instance for IntX / WordX are inefficient
The following factorial benchmark show high discrepancy in performance between Int
and Int64
(and similarly between Word
and Word64
) when using product [1..n]
to compute factorial:
{-# LANGUAGE TypeApplications #-}
import Criterion.Main
import Data.Int
import Data.Word
import Numeric.Natural
fact :: Integral t => t -> t
fact n = product [1..n]
fact2 :: Integral t => t -> t
fact2 n = go 1 n
where go acc 1 = acc
go acc n = go (acc * n) (n - 1)
main :: IO ()
main = defaultMain [
bgroup "fact 20" [
bench "Int" $ whnf (fact @Int) 20
, bench "Word" $ whnf (fact @Word) 20
, bench "Int64" $ whnf (fact @Int64) 20
, bench "Word64" $ whnf (fact @Word64) 20
, bench "Integer" $ whnf (fact @Integer) 20
, bench "Natural" $ whnf (fact @Natural) 20
]
,
bgroup "fact2 20" [
bench "Int" $ whnf (fact @Int) 20
, bench "Word" $ whnf (fact @Word) 20
, bench "Int64" $ whnf (fact @Int64) 20
, bench "Word64" $ whnf (fact @Word64) 20
, bench "Integer" $ whnf (fact @Integer) 20
, bench "Natural" $ whnf (fact @Natural) 20
]
]
benchmarking fact 20/Int
time 17.33 ns (17.22 ns .. 17.41 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 17.21 ns (17.16 ns .. 17.29 ns)
std dev 222.4 ps (184.5 ps .. 282.3 ps)
variance introduced by outliers: 15% (moderately inflated)
benchmarking fact 20/Word
time 17.25 ns (17.01 ns .. 17.63 ns)
0.999 R² (0.997 R² .. 1.000 R²)
mean 17.19 ns (17.11 ns .. 17.36 ns)
std dev 358.6 ps (199.4 ps .. 674.3 ps)
variance introduced by outliers: 32% (moderately inflated)
benchmarking fact 20/Int64
time 166.7 ns (165.4 ns .. 168.5 ns)
0.999 R² (0.997 R² .. 1.000 R²)
mean 166.8 ns (166.0 ns .. 168.5 ns)
std dev 4.077 ns (1.854 ns .. 6.592 ns)
variance introduced by outliers: 35% (moderately inflated)
benchmarking fact 20/Word64
time 560.5 ns (550.6 ns .. 575.0 ns)
0.995 R² (0.989 R² .. 1.000 R²)
mean 556.5 ns (549.1 ns .. 575.9 ns)
std dev 36.08 ns (9.195 ns .. 60.69 ns)
variance introduced by outliers: 78% (severely inflated)
benchmarking fact 20/Integer
time 421.9 ns (419.8 ns .. 425.3 ns)
1.000 R² (0.999 R² .. 1.000 R²)
mean 427.7 ns (425.5 ns .. 431.3 ns)
std dev 9.238 ns (6.311 ns .. 16.19 ns)
variance introduced by outliers: 28% (moderately inflated)
benchmarking fact 20/Natural
time 534.5 ns (532.6 ns .. 536.3 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 533.7 ns (532.6 ns .. 535.3 ns)
std dev 4.535 ns (3.652 ns .. 6.226 ns)
benchmarking fact2 20/Int
time 17.14 ns (17.07 ns .. 17.23 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 17.15 ns (17.10 ns .. 17.24 ns)
std dev 227.1 ps (167.4 ps .. 286.9 ps)
variance introduced by outliers: 16% (moderately inflated)
benchmarking fact2 20/Word
time 16.93 ns (16.86 ns .. 17.02 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 16.90 ns (16.86 ns .. 16.95 ns)
std dev 149.8 ps (107.4 ps .. 192.5 ps)
benchmarking fact2 20/Int64
time 165.1 ns (164.6 ns .. 165.7 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 164.8 ns (164.5 ns .. 165.1 ns)
std dev 1.033 ns (778.6 ps .. 1.363 ns)
benchmarking fact2 20/Word64
time 545.3 ns (542.8 ns .. 547.9 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 543.6 ns (542.4 ns .. 545.2 ns)
std dev 4.741 ns (3.736 ns .. 5.824 ns)
benchmarking fact2 20/Integer
time 419.1 ns (417.2 ns .. 421.1 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 418.2 ns (416.8 ns .. 420.2 ns)
std dev 5.415 ns (4.094 ns .. 7.147 ns)
variance introduced by outliers: 12% (moderately inflated)
benchmarking fact2 20/Natural
time 533.8 ns (532.2 ns .. 536.1 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 533.2 ns (532.1 ns .. 534.8 ns)
std dev 4.428 ns (3.217 ns .. 5.741 ns)
In fact
, Int
and Word
are efficient, Int64
and Word64
are respectively 10x and 20x slowers, when they should be identical on 64 bit platform. Word64
is even slower than Integer
!
Replacing fact
by a handwritten recursion, in fact2
gives something more coherent where all the 64 bits types have roughly the same efficiency.
I observed in the source code that Int
and Int64
are not implemented as type synonym.
Using -ddump-rule-firings
, I observed that a rule fold/build
fired and removed the fold
for the Int
case, and not for the Int64
, I don't understand much ;(
This is not dramatic because most people will use Int
instead of Int64
, but it is a surprising behavior. I also observed that smaller integral (such as Int8
, Int16
, Int32
) behaves as badly as Int64
) and this is much more important because there is no other default and more efficient type synonym.
Note that I'll be interested to provide a patch myself if mentored a bit. Especially, I'm thinking about merging the Int
and Int64
implementation (with respect to the architecture default word size).
Tested on GHC 8.4.2
Trac metadata
Trac field | Value |
---|---|
Version | 8.4.2 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Core Libraries |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |