Unboxing-Related Performance Issue with Polymorphic Functions
My team has observed a 2x performance degradation in code that makes use of StateT
that appears to be related to strictness and unboxing, even when built with -O2
. Our code makes heavy use of the state monad, and when GHC fails to optimise this usage, the performance becomes untenably slow.
We've managed to minimise the behaviour to the following reproducer (that ignores state entirely), consisting of two files. It depends on criterion
Lib.hs
:
{-# LANGUAGE BangPatterns #-}
module Lib where
-- A type to take the place of state
data X a = X { runX :: !a }
test1 :: Int -> Int
test1 = \(!i) -> go i where
go = \(!i) -> if i > 0
then go $! i - 1
else i
{-# NOINLINE test1 #-}
test2 :: Int -> Int
test2 = \(!i) -> runX (go i) where
go = \(!i) -> if i > 0
then go $! i - 1
else X i
{-# NOINLINE test2 #-}
Main.hs
:
{-# LANGUAGE Strict #-}
module Main where
import Lib
import Criterion
import Criterion.Main
main :: IO ()
main = defaultMain
[ bgroup "main"
[ bgroup "small"
[ bench "without state" $ whnf test1 100000000
, bench "with state" $ whnf test2 100000000
]
]
]
Run as above, the code takes twice as long to execute test2
as it does test1
. However, when the signature for runX
is changed to runX :: !Int
, the code in test2
exhibits identical performance to test1
.
It has been reproduced across multiple linux64 machines, but not tested on any other architecture or operating system.
Please find the full (stack - yes, I know) project as an attachment. You can simply stack run
to observe the issue. If you have any further queries please let me know.