Skip to content

Perf improvement for Enum instances

Vilem-Benjamin Liepelt requested to merge wip/buggymcbugfix/15185-enum-int into master

Fixes

Fix #15185 (closed) by ensuring that we export stable (unoptimised) unfoldings for Enum list producers like enumFromTo in the interface file so we can do list fusion at usage sites.

See below for benchmark results.

Agreeable side effects:

  • Partially fixes #8763 (closed), namely for [from..to] with IntN & WordN where N in 8, 16, 32, 64.
  • Partially fixes (don't close this, GitLab) #18178, namely for enumFromTo and enumFromThenTo (but not enumFrom and enumFromThen).

Change Summary

  1. Add INLINABLE pragmas to

    • enumFrom
    • enumFromThen
    • enumFromTo
    • enumFromThenTo
    • boundedEnumFrom
    • boundedEnumFromThen
  2. Use more perfomant implementations in instance Enum Word64.

Compiler Performance

nofib shows unchanged allocations and binary sizes. Allocations on compiling Cabal also unchanged.

Benchmark

benchmarking fact 20/fact @Int
time                 25.31 ns   (23.67 ns .. 27.03 ns)
                     0.974 R²   (0.962 R² .. 0.988 R²)
mean                 23.79 ns   (22.82 ns .. 25.16 ns)
std dev              3.528 ns   (2.745 ns .. 5.157 ns)
variance introduced by outliers: 96% (severely inflated)

benchmarking fact 20/fact @Int64
time                 22.79 ns   (21.28 ns .. 24.19 ns)
                     0.984 R²   (0.978 R² .. 0.995 R²)
mean                 21.89 ns   (21.41 ns .. 22.62 ns)
std dev              1.934 ns   (1.534 ns .. 2.698 ns)
variance introduced by outliers: 90% (severely inflated)

benchmarking fact 20/fact @Int32
time                 23.66 ns   (23.06 ns .. 24.34 ns)
                     0.994 R²   (0.991 R² .. 0.996 R²)
mean                 25.12 ns   (24.46 ns .. 25.76 ns)
std dev              2.263 ns   (1.961 ns .. 2.637 ns)
variance introduced by outliers: 90% (severely inflated)

benchmarking fact 20/fact @Int16
time                 21.07 ns   (19.90 ns .. 21.99 ns)
                     0.986 R²   (0.981 R² .. 0.991 R²)
mean                 20.56 ns   (19.95 ns .. 21.24 ns)
std dev              2.272 ns   (2.019 ns .. 2.580 ns)
variance introduced by outliers: 93% (severely inflated)

benchmarking fact 20/fact @Int8
time                 24.68 ns   (23.87 ns .. 25.63 ns)
                     0.992 R²   (0.988 R² .. 0.996 R²)
mean                 25.44 ns   (24.64 ns .. 26.08 ns)
std dev              2.220 ns   (1.888 ns .. 2.714 ns)
variance introduced by outliers: 89% (severely inflated)

benchmarking fact 20/fact @Word
time                 21.93 ns   (21.40 ns .. 22.63 ns)
                     0.992 R²   (0.988 R² .. 0.995 R²)
mean                 23.92 ns   (23.15 ns .. 24.74 ns)
std dev              2.877 ns   (2.340 ns .. 3.707 ns)
variance introduced by outliers: 94% (severely inflated)

benchmarking fact 20/fact @Word64
time                 220.8 ns   (213.9 ns .. 229.6 ns)   -- Edit: fixed now, see below
                     0.990 R²   (0.985 R² .. 0.995 R²)
mean                 232.6 ns   (226.2 ns .. 240.5 ns)
std dev              24.42 ns   (20.00 ns .. 31.41 ns)
variance introduced by outliers: 91% (severely inflated)

benchmarking fact 20/fact @Word32
time                 25.81 ns   (24.91 ns .. 26.63 ns)
                     0.993 R²   (0.990 R² .. 0.997 R²)
mean                 26.20 ns   (25.55 ns .. 26.77 ns)
std dev              2.062 ns   (1.765 ns .. 2.466 ns)
variance introduced by outliers: 87% (severely inflated)

benchmarking fact 20/fact @Word16
time                 25.00 ns   (24.36 ns .. 25.54 ns)
                     0.996 R²   (0.995 R² .. 0.997 R²)
mean                 24.30 ns   (23.81 ns .. 24.82 ns)
std dev              1.623 ns   (1.379 ns .. 1.917 ns)
variance introduced by outliers: 83% (severely inflated)

benchmarking fact 20/fact @Word8
time                 24.89 ns   (23.13 ns .. 27.63 ns)
                     0.969 R²   (0.946 R² .. 0.997 R²)
mean                 24.17 ns   (23.60 ns .. 25.36 ns)
std dev              2.597 ns   (1.488 ns .. 4.933 ns)
variance introduced by outliers: 93% (severely inflated)
{-# LANGUAGE TypeApplications #-}

module Main where

import Criterion.Main
import Data.Int
import Data.Word

fact :: Integral t => t -> t
fact n = product [1..n]

main :: IO ()
main = defaultMain
  [ bgroup "fact 20"
      [ bench "fact @Int" $ whnf (fact @Int) 20
      , bench "fact @Int64" $ whnf (fact @Int64) 20
      , bench "fact @Int32" $ whnf (fact @Int32) 20
      , bench "fact @Int16" $ whnf (fact @Int16) 20
      , bench "fact @Int8" $ whnf (fact @Int8) 20
      , bench "fact @Word" $ whnf (fact @Word) 20
      , bench "fact @Word64" $ whnf (fact @Word64) 20
      , bench "fact @Word32" $ whnf (fact @Word32) 20
      , bench "fact @Word16" $ whnf (fact @Word16) 20
      , bench "fact @Word8" $ whnf (fact @Word8) 20
      ]
  ]
Edited by Andreas Klebinger

Merge request reports