Skip to content

CPR related performance issue

By default CRP analysis can be too aggressive in trying to pass as much as possible in unboxed tuples, in general it's not a problem but when one big datatype is passed to several consumers it might end up pushed to stack several times instead of once - to heap, things are getting worse when there are sufficient fields to cause stack overflow which otherwise is possible to avoid - in our codebase adding one field with ExistentialQuantification (unused, but that prevents ghc from doing CRP transformation) reduces number of stack overflow by a factor of 1000 and increases overall performance by 10%.

In provided example performance for both A and B should be identical and yet B is consistently faster by 3-5%

It's possible to increase this performance gap by adding more and more fields.

I was able to replicate this issue in ghc 7.8.3 and 7.10,1rc2

{-# LANGUAGE ExistentialQuantification #-}

module Blah where

import Criterion
import Criterion.Main
import Data.Typeable

data A = A ()
    !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int
    !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int

data B = forall rep. (Typeable rep) => B rep
    !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int
    !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int

a :: A
a = A () 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

b :: B
b = B () 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

{-# NOINLINE a1 #-}
a1 :: A -> Int
a1 (A _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f1

{-# NOINLINE a2 #-}
a2 :: A -> Int
a2 (A _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f2

{-# NOINLINE a3 #-}
a3 :: A -> Int
a3 (A _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f3

{-# NOINLINE a4 #-}
a4 :: A -> Int
a4 (A _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f4

{-# NOINLINE b1 #-}
b1 :: B -> Int
b1 (B _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f1

{-# NOINLINE b2 #-}
b2 :: B -> Int
b2 (B _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f2

{-# NOINLINE b3 #-}
b3 :: B -> Int
b3 (B _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f3

{-# NOINLINE b4 #-}
b4 :: B -> Int
b4 (B _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f4

{-# NOINLINE fa #-}
fa :: A -> Int
fa a = a1 a + a2 a + a3 a + a4 a

{-# NOINLINE fb #-}
fb :: B -> Int
fb b = b1 b + b2 b + b3 b + b4 b

main :: IO ()
main = defaultMain [
   bgroup "single call" [
     bench "A" $ whnf fa a
   , bench "B" $ whnf fb b
   ]
   ]

Trac metadata
Trac field Value
Version 7.10.1-rc2
Type Bug
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information