Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
GHC
GHC
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 4,324
    • Issues 4,324
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
    • Iterations
  • Merge Requests 373
    • Merge Requests 373
  • Requirements
    • Requirements
    • List
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Security & Compliance
    • Security & Compliance
    • Dependency List
    • License Compliance
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI / CD
    • Code Review
    • Insights
    • Issue
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • Glasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #14013

Closed
Open
Opened Jul 22, 2017 by danilo2@trac-danilo2

Bad monads performance

Hi! We've been struggling with a very strange GHC behavior on IRC today. Let's consider the following code (needs mtl and criterion to be compiled):

module Main where

import Prelude
import Criterion.Main
import qualified Control.Monad.State.Strict as Strict
import qualified Control.Monad.State.Class  as State
import Control.DeepSeq (NFData, rnf, force)
import GHC.IO          (evaluate)
import Data.Monoid


-----------------------------
-- === Criterion utils === --
-----------------------------

eval :: NFData a => a -> IO a
eval = evaluate . force ; {-# INLINE eval #-}

liftExp :: (Int -> a) -> (Int -> a)
liftExp f = f . (10^) ; {-# INLINE liftExp #-}

expCodeGen :: NFData a => (Int -> a) -> (Int -> IO a)
expCodeGen f i = do
    putStrLn $ "generating input code (10e" <> show i <> " chars)"
    out <- eval $ liftExp f i
    putStrLn "code generated sucessfully"
    return out
{-# INLINE expCodeGen #-}

expCodeGenBench :: (NFData a, NFData b) => (Int -> a) -> (a -> b) -> Int -> Benchmark
expCodeGenBench f p i = env (expCodeGen f i) $ bench ("10e" <> show i) . nf p ; {-# INLINE expCodeGenBench #-}


-------------------------------
-- === (a*) list parsing === --
-------------------------------

genList_a :: Int -> [Char]
genList_a i = replicate i 'a' ; {-# INLINE genList_a #-}

pureListParser_a :: [Char] -> Bool
pureListParser_a = \case
    'a':s -> pureListParser_a s
    []    -> True
    _     -> False
{-# INLINE pureListParser_a #-}

mtlStateListParser_a :: State.MonadState [Char] m => m Bool
mtlStateListParser_a = State.get >>= \case
    'a':s -> State.put s >> mtlStateListParser_a
    []    -> return True
    _     -> return False
{-# INLINE mtlStateListParser_a #-}

mtlStateListParser_a_typed :: Strict.State [Char] Bool
mtlStateListParser_a_typed = State.get >>= \case
    'a':s -> State.put s >> mtlStateListParser_a_typed
    []    -> return True
    _     -> return False
{-# INLINE mtlStateListParser_a_typed #-}

mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool
mtlStateListParser_a_let = go where
    go = Strict.get >>= \case
        'a':s -> Strict.put s >> go
        []    -> return True
        _     -> return False
{-# INLINE mtlStateListParser_a_let #-}


{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}
{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}


main = do
    defaultMain
        [ bgroup "a*" $
            [ bgroup "pure"                    $ expCodeGenBench genList_a pureListParser_a                              <$> [6..6]
            , bgroup "mtl.State.Strict"        $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a)       <$> [6..6]
            , bgroup "mtl.State.Strict typed"  $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_typed) <$> [6..6]
            , bgroup "mtl.State.Strict let"    $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_let)   <$> [6..6]
            ]
        ]

The code was compiled with following options (and many other variations): -threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick-factor=1000 -flate-dmd-anal

Everything in this code has INLINE pragma. The important part we should focus on are these two functions:

pureListParser_a :: [Char] -> Bool
pureListParser_a = \case
    'a':s -> pureListParser_a s
    []    -> True
    _     -> False
{-# INLINE pureListParser_a #-}

mtlStateListParser_a :: State.MonadState [Char] m => m Bool
mtlStateListParser_a = State.get >>= \case
    'a':s -> State.put s >> mtlStateListParser_a
    []    -> return True
    _     -> return False
{-# INLINE mtlStateListParser_a #-}

Which are just "parsers" accepting strings containing only 'a' characters. The former is pure one, while the later uses State to keep the remaining input. The following list contains performance related observations:

  1. For the rest of the points, let's call the performance of pureListParser_a a "good" one and everything worse a "bad" one.
  2. The performance of mtlStateListParser_a is bad, it runs 10 times slower than pureListParser_a. Inspecting CORE we can observe that GHC jumps between (# a,b #) and (a,b) representations all the time.
  3. If we add a specialize pragma {-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}, the performance of mtlStateListParser_a is good (exactly the same as pureListParser_a).
  4. If we do NOT use specialize pragma, but we use explicite, non-polymorphic type signature mtlStateListParser_a_typed :: Strict.State [Char] Bool, the performance is bad (!), identical to the polymorphic version without specialization.
  5. If we use SPECIALIZE pragma together with explicite, non-polymorphic type, so we use BOTH mtlStateListParser_a_typed :: Strict.State [Char] Bool AND {-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-} we get the good performance.
  6. If we transform pureListParser_a to
mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool
mtlStateListParser_a_let = go where
    go = Strict.get >>= \case
        'a':s -> Strict.put s >> go
        []    -> return True
        _     -> return False
{-# INLINE mtlStateListParser_a_let #-}

we again get the good performance without the need to use SPECIALIZE pragmas.

  1. The performance of all the functions that are not optimized as good as pureListParser_a is a lot worse in GHC 8.2.1-rc3 than in 8.0.2.
  2. The not-yet documented flag -fspecialise-aggressively does NOT affect the results (https://ghc.haskell.org/trac/ghc/ticket/12463).
  3. If you do NOT use INLINE pragma on functions mtlStateListParser_a and mtlStateListParser_a_typed their performance is good (so INLINE pragma makes it bad until we provide explicit specialization). Moreover, if we use INLINABLE pragma instead of INLINE on these functions (which logically makes more sense, because they are recursive), performance of the polymorphic one mtlStateListParser_a is good, while performance of the explicitly typed mtlStateListParser_a_typed is bad until we provide explicite specialization.

The above points raise the following questions:

  1. Why GHC does not optimize mtlStateListParser_a the same way as pureListParser_a and where the jumping between (# a,b #) and (a,b) comes from?
  2. Is there any way to tell GHC to automatically insert SPECIALIZE pragmas, especially in performance critical code?
  3. Why providing very-explicite type signature mtlStateListParser_a_typed :: Strict.State [Char] Bool does not solve the problem unless we use SPECIALIZE pragma that tells the same as the signature? (GHC even warns: SPECIALISE pragma for non-overloaded function ‘mtlStateListParser_a_typed’ but it affects performance.)
  4. Why the trick to alias the body of recursive function to a local variable go affects the performance in any way, especially when it does NOT bring any variable to the local let scope?

We've been testing this behavior in GHC 8.0.2 and 8.2.1-rc3 and several people reported exactly the same observations.

Edited Mar 10, 2019 by danilo2
Assignee
Assign to
9.2.1
Milestone
9.2.1
Assign milestone
Time tracking
None
Due date
None
Reference: ghc/ghc#14013