Performance regression from 8.8.4 to 8.10.2
Summary
I wrote a small piece of cellular automata code as a stencil array computation using the massiv library here.
The core of the logic is:
s1 :: Massiv.Array U Ix2 Position -> Int
s1 = countOccupied . (updateUntilEq update)
countOccupied ::
Massiv.Array U Ix2 Position -> Int
countOccupied =
Massiv.foldlS
(\acc pos -> acc + (fromEnum (pos == Occupied)))
0
updateUntilEq
:: (Massiv.Array U Ix2 Position -> Massiv.Array U Ix2 Position)
-> Massiv.Array U Ix2 Position
-> Massiv.Array U Ix2 Position
updateUntilEq upFun =
Massiv.iterateUntil
(const (==))
(const upFun)
update :: Massiv.Array U Ix2 Position -> Massiv.Array U Ix2 Position
update =
Massiv.compute . Massiv.mapStencil border updateStencil
border :: Massiv.Border Position
border = Massiv.Fill Floor
updateStencil :: Massiv.Stencil Ix2 Position Position
updateStencil =
Massiv.makeStencilDef Floor (Sz (3 :. 3)) (1 :. 1) $ \ get ->
let
chair = get (0 :. 0)
isOcc c = fromEnum . (== Occupied) <$> (get c)
occ :: Massiv.Value Int
!occ =
( isOcc (-1 :. -1) + isOcc (-1 :. 0) + isOcc (-1 :. 1) +
isOcc ( 0 :. -1) + isOcc ( 0 :. 1) +
isOcc ( 1 :. -1) + isOcc ( 1 :. 0) + isOcc ( 1 :. 1)
)
in
if (unsafeCoerce chair) == Floor
then pure Floor
else liftA2 threshold chair occ
{-# inline updateStencil #-}
threshold :: Position -> Int -> Position
threshold !p !n | n == 0 = Occupied
| n >= 4 = Empty
| otherwise = p
{-# inline threshold #-}
This has the following performance across 8.8.4 and 8.10.2 (as measured with criterion):
- ghc-8.8.4: llvm
benchmarking massiv-example/run
time 11.48 ms (11.40 ms .. 11.56 ms)
0.999 R² (0.999 R² .. 1.000 R²)
mean 11.57 ms (11.52 ms .. 11.63 ms)
std dev 140.5 μs (98.34 μs .. 200.3 μs)
- ghc-8.8.4: ncg
benchmarking massiv-example/run
time 27.48 ms (26.73 ms .. 28.18 ms)
0.998 R² (0.997 R² .. 1.000 R²)
mean 26.90 ms (26.69 ms .. 27.19 ms)
std dev 507.9 μs (358.3 μs .. 715.2 μs)
- ghc-8.10.2: llvm
benchmarking massiv-example/run
time 77.22 ms (74.31 ms .. 79.26 ms)
0.998 R² (0.996 R² .. 1.000 R²)
mean 74.52 ms (73.65 ms .. 75.69 ms)
std dev 1.824 ms (1.269 ms .. 2.846 ms)
- ghc-8.10.2: ncg
benchmarking massiv-example/run
time 82.77 ms (79.52 ms .. 85.60 ms)
0.998 R² (0.997 R² .. 1.000 R²)
mean 81.42 ms (80.50 ms .. 82.61 ms)
std dev 1.723 ms (1.321 ms .. 2.185 ms)
[It is also striking here how much better llvm does on 8.8.4.]
The repo also includes the core output for each version here I have not tried running this code with 9.0 as that proved beyond my abilities with cabal.
Steps to reproduce
- Clone repo
- run:
cabal configure --with-compiler=[8.8.4 | 8.10.2]
cabal build
cabal run bench
Expected behavior
The Performance to be roughly similar for 8.8.4 and 8.10.2.