Skip to content

GHC 8.4 performance regressions when using newtypes

Here is some serious performance regression in the following code:

{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE FlexibleContexts #-}

module Main where

import Prelude
import qualified Foreign.Storable as Storable
import qualified Control.Monad.State.Strict as S
import Control.Monad.IO.Class
import Foreign.Marshal.Alloc       (mallocBytes)
import Criterion.Main

newtype Foo a = Foo a

intSize :: Int
intSize = Storable.sizeOf (undefined :: Int)

slow :: Int -> IO ()
slow i = do
    ptr <- mallocBytes $ 2 * intSize
    Storable.pokeByteOff ptr intSize (0 :: Int)
    let go 0 = pure ()
        go j = do
            Foo (!_, !off) <- S.get
            !(x :: Int) <- liftIO $ Storable.peekByteOff ptr off
            liftIO $ Storable.pokeByteOff ptr off $! (x + 1)
            go (j - 1)
    S.evalStateT (go i) (Foo ((0::Int),(intSize::Int)))


fast :: Int -> IO ()
fast i = do
    ptr <- mallocBytes $ 2 * intSize
    Storable.pokeByteOff ptr intSize (0 :: Int)
    let go 0 = pure ()
        go j = do
            (!_, !off) <- S.get
            !(x :: Int) <- liftIO $ Storable.peekByteOff ptr off
            liftIO $ Storable.pokeByteOff ptr off $! (x + 1)
            go (j - 1)
    S.evalStateT (go i) ((0::Int),(intSize::Int))


main :: IO ()
main = defaultMain
    [ bgroup "slow"
        $ (\(i :: Int) -> bench ("10e" <> show i)
        $ perRunEnv (return ())
        $ \v -> slow (10 ^ i))  <$> [7..8]
    
    , bgroup "fast"
        $ (\(i :: Int) -> bench ("10e" <> show i)
        $ perRunEnv (return ())
        $ \v -> fast (10 ^ i))  <$> [7..8]
    ]

Compiled with flags: -threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -fexcess-precision -fexpose-all-unfoldings -flate-dmd-anal -fspec-constr-keen -fspecialise-aggressively -fstatic-argument-transformation -fmax-worker-args=200

The slow function executes 2 times slower than the fast one. The only difference is that the state is wrapped in a newtype. It was working properly in GHC 8.2 (both functions were equally fast - as fast as the current fast function).

Edited by Ben Gamari
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information