Skip to content

GHC 7.0.4 Performance Regression (Possibly Vector)

I have noticed ~100% performance degradation for my code when I switched from 6.12.3 to 7.0.4. This might be related to vector performance ticket 5623 but I noticed it was filed for performance regression of 7.2.1 relative to 7.0.4, and 6.12.1 vs 7.0.4 performance was reported as ok. So, I am filing it as new bug report.

I am attaching an edited version of my code below which reproduces the issue. It is from actual production code which is used for db driver. Relevant performance benchmarks (~95% degradation):

GHC 6.12.3 MUT Time: 0.48s

GHC 7.0.4 MUT Time: 0.95s

In actual code, performance degrades by ~100%, from ~1.3s to ~2.6s. So, I can't move from 6.12.3 to 7.0.4 or 7.4+ if I want to keep the performance :(

Code below - the comment block at the end shows how to compile, and reproduce the issue - I will be happy to provide more information to fix the issue:

{-# LANGUAGE BangPatterns #-}
import qualified Data.Vector.Storable as SV
import qualified Data.Vector.Storable.Mutable as MSV
import qualified Data.Vector as V
import Foreign (sizeOf)
import Foreign.C.Types (CChar)
import GHC.Int
import System.IO.Unsafe (unsafePerformIO)
import Control.Exception (evaluate)


data Elems = IV {-# UNPACK #-} !(SV.Vector Int32)
             | SV {-#UNPACK #-} !Int {-# UNPACK #-} !(SV.Vector CChar) -- Int stores the number of null-terminated C Strings
             | T {-# UNPACK #-} !Int {-# UNPACK #-} !(V.Vector Elems) -- Int stores total bytes needed to copy vectors to ByteString
             | L {-# UNPACK #-} !(V.Vector Elems) -- General list of elements
                deriving (Show)

-- | Function to return total size in bytes taken to store the data from Elems
size :: Elems -> Int
size (IV x) = 6 + (sizeOf (undefined :: Int32)) * (SV.length x)
size (SV _ x) =  6 + (sizeOf (undefined :: CChar)) * (SV.length x)
size (T n _) = n
size (L x) = V.foldl' (\x y -> x + size y) 6 x
{-# INLINE size #-}

fillS :: [[CChar]] -> Elems
fillS x = let (x',y') = createS x
            in SV x' y'
{-# INLINE fillS #-}

createS :: [[CChar]] -> (Int, SV.Vector CChar)
createS cl = unsafePerformIO $ do
            v <- MSV.new (Prelude.length . Prelude.concat $ cl)
            fill v 0 $ Prelude.concat cl
            SV.unsafeFreeze v >>= \x -> return (Prelude.length cl,x)
          where
            fill v _ [] = return ()
            fill v i (x:xs) = MSV.unsafeWrite v i x >> fill v (i + 1) xs
{-# INLINE createS #-}

-- | Constructor for T - a db table - we must always build it using this function
fillT :: V.Vector Elems -> Elems
fillT !xs = T (V.foldl' (\x y -> x + size y) 3 xs) xs -- 2 bytes for table header + 1 additional byte for dict type header => 3     bytes additional overhead
{-# INLINE fillT #-}

main = do
  let il1 = IV $ SV.enumFromN 1 50000000
      il2 = IV $ SV.enumFromN 1 50000000
      il3 = IV $ SV.enumFromN 1 50000000
      l1 = L (V.fromList [il1,il2,il3])
      sl1 = fillS [[97,0],[98,0],[99,0]]
  evaluate $ fillT (V.fromList [sl1,l1])
  return ()

{-- GHC 6.12.3:

 $ ghc -O2 --make test.hs -fforce-recomp -rtsopts -fasm && ./test +RTS -s
[1 of 1] Compiling Main             ( test.hs, test.o )
Linking test ...
./test +RTS -s
     600,843,536 bytes allocated in the heap
           8,336 bytes copied during GC
     200,002,504 bytes maximum residency (2 sample(s))
         793,936 bytes maximum slop
             574 MB total memory in use (9 MB lost due to fragmentation)

  Generation 0:     2 collections,     0 parallel,  0.00s,  0.00s elapsed
  Generation 1:     2 collections,     0 parallel,  0.00s,  0.00s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    0.48s  (  0.97s elapsed)
  GC    time    0.00s  (  0.00s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    0.48s  (  0.97s elapsed)

  %GC time       0.2%  (0.1% elapsed)

  Alloc rate    1,259,822,857 bytes per MUT second

  Productivity  99.6% of total user, 49.0% of total elapsed

-----
GHC 7.0.4:

 $ ghc -O2 --make test.hs -fforce-recomp -rtsopts -fasm && ./test +RTS -s
[1 of 1] Compiling Main             ( test.hs, test.o )
Linking test ...
./test +RTS -s
     600,836,872 bytes allocated in the heap
           7,664 bytes copied during GC
     200,002,224 bytes maximum residency (2 sample(s))
         794,216 bytes maximum slop
             574 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:     2 collections,     0 parallel,  0.00s,  0.00s elapsed
  Generation 1:     2 collections,     0 parallel,  0.11s,  0.11s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    0.94s  (  1.01s elapsed)
  GC    time    0.11s  (  0.11s elapsed)
  EXIT  time    0.00s  (  0.11s elapsed)
  Total time    1.05s  (  1.12s elapsed)

  %GC time      10.2%  (9.7% elapsed)

  Alloc rate    638,055,951 bytes per MUT second

  Productivity  89.4% of total user, 83.4% of total elapsed
--}
Trac metadata
Trac field Value
Version 7.0.4
Type Bug
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information