Pure syntax transformation affects performance.

Hi! Let's consider the following code (compiled with -O2, mtl and criterion needed):

{-# LANGUAGE FlexibleContexts #-}

module Main where

import Prelude
import Criterion.Main
import Control.Monad.State.Strict
import Control.Monad.Identity

repeatM :: Monad m => m a -> Int -> m ()
repeatM f = go where
    go 0 = pure ()
    go i = f >> go (i - 1)
{-# INLINE repeatM #-}

incState :: MonadState Int m => m ()
incState = modify' (1+) ; {-# INLINE incState #-}

test1, test2 :: Int -> Int
test1 = \n -> (runIdentity . flip evalStateT 0 . (\a -> repeatM incState a >> get)) n ; {-# INLINE test1 #-}
test2 = \n -> runIdentity . flip evalStateT 0 $ repeatM incState n >> get ; {-# INLINE test2 #-}

main :: IO ()
main = do
    defaultMain
        [ bgroup "monad transformers overhead"
            [ bench "test1"     $ nf test1 100000000
            , bench "test2"     $ nf test2 100000000
            ]
        ]

Functions test1 and test2 differ only syntactically and this difference should not affect GHC's inliner, because their implementations use fully saturated calls. The generated core for test1 and test2 is practically identical (there is an additional alias created for test1: test1 = lvl1_rhor 'cast' ...).

The problem is that test1 runs **3 times faster** than test2.

As a side note - if we add more state transformers to test1, it optimizes them all away, while test2 runs slower with each new transformer applied.

Edited Mar 10, 2019 by Ben Gamari

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information