GHC issueshttps://gitlab.haskell.org/ghc/ghc/-/issues2019-07-07T18:17:25Zhttps://gitlab.haskell.org/ghc/ghc/-/issues/14336ghci leaks memory2019-07-07T18:17:25ZNeil Mitchellghci leaks memoryThe following script spawns ghci, and that spawned ghci then goes on to leak memory:
```hs
import Control.Concurrent
import Control.Monad
import System.IO
import System.Process
main = do
(Just hin, Nothing, Nothing, pid) <- createP...The following script spawns ghci, and that spawned ghci then goes on to leak memory:
```hs
import Control.Concurrent
import Control.Monad
import System.IO
import System.Process
main = do
(Just hin, Nothing, Nothing, pid) <- createProcess (proc "ghci" ["+RTS","-S"]){std_in=CreatePipe}
forever $ do
threadDelay 100000 -- 0.1s
hPutStrLn hin "\"this is a test of outputting stuff\""
hFlush hin
```
This script just writes a string to GHCi, which then echos it back. The `+RTS -S` is useful to watch the live memory tick up in realtime, but it leaks without it, and the leak can be seen in process explorer (87Mb to 700Mb over about 30 minutes).
While repeatedly writing commands may not be a standard usage of ghci, it is when driven by tools such as ghcid (https://hackage.haskell.org/package/ghcid) and other IDE-like uses.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.2.1 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | GHCi |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"ghci leaks memory","status":"New","operating_system":"","component":"GHCi","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.2.1","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"The following script spawns ghci, and that spawned ghci then goes on to leak memory:\r\n\r\n{{{#!hs\r\nimport Control.Concurrent\r\nimport Control.Monad\r\nimport System.IO\r\nimport System.Process\r\n\r\nmain = do\r\n (Just hin, Nothing, Nothing, pid) <- createProcess (proc \"ghci\" [\"+RTS\",\"-S\"]){std_in=CreatePipe}\r\n forever $ do\r\n threadDelay 100000 -- 0.1s\r\n hPutStrLn hin \"\\\"this is a test of outputting stuff\\\"\"\r\n hFlush hin\r\n}}}\r\n\r\nThis script just writes a string to GHCi, which then echos it back. The {{{+RTS -S}}} is useful to watch the live memory tick up in realtime, but it leaks without it, and the leak can be seen in process explorer (87Mb to 700Mb over about 30 minutes).\r\n\r\nWhile repeatedly writing commands may not be a standard usage of ghci, it is when driven by tools such as ghcid (https://hackage.haskell.org/package/ghcid) and other IDE-like uses.","type_of_failure":"OtherFailure","blocking":[]} -->8.8.1https://gitlab.haskell.org/ghc/ghc/-/issues/12525Internal identifiers creeping into :show bindings2019-07-07T18:26:14ZmniipInternal identifiers creeping into :show bindingsWhen binding variables the "new" way, or defining typeclasses, some things that are better left unseen manage to creep into the `:show Bindings` list.
```html
<pre class="wiki">
GHCi, version 8.1.20160725: http://www.haskell.org/ghc/ :...When binding variables the "new" way, or defining typeclasses, some things that are better left unseen manage to creep into the `:show Bindings` list.
```html
<pre class="wiki">
GHCi, version 8.1.20160725: http://www.haskell.org/ghc/ :? for help
> :show bindings
> x = ()
> :show bindings
<b>$trModule :: GHC.Types.Module = _</b>
x :: () = _
> class Foo a
> :show bindings
x :: () = _
class Foo a
<b>$tcFoo :: GHC.Types.TyCon = _
$tc'C:Foo :: GHC.Types.TyCon = _
$trModule :: GHC.Types.Module = _</b>
</pre>
```
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.1 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | low |
| Resolution | Unresolved |
| Component | GHCi |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Internal identifiers creeping into :show bindings","status":"New","operating_system":"","component":"GHCi","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.1","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"When binding variables the \"new\" way, or defining typeclasses, some things that are better left unseen manage to creep into the `:show Bindings` list.\r\n\r\n{{{#!html\r\n<pre class=\"wiki\">\r\nGHCi, version 8.1.20160725: http://www.haskell.org/ghc/ :? for help\r\n> :show bindings\r\n> x = ()\r\n> :show bindings\r\n<b>$trModule :: GHC.Types.Module = _</b>\r\nx :: () = _\r\n> class Foo a\r\n> :show bindings\r\nx :: () = _\r\nclass Foo a\r\n<b>$tcFoo :: GHC.Types.TyCon = _\r\n$tc'C:Foo :: GHC.Types.TyCon = _\r\n$trModule :: GHC.Types.Module = _</b>\r\n</pre>\r\n}}}","type_of_failure":"OtherFailure","blocking":[]} -->8.8.1Roland SennRoland Sennhttps://gitlab.haskell.org/ghc/ghc/-/issues/10069CPR related performance issue2019-07-07T18:37:40ZpacakCPR related performance issueBy default CRP analysis can be too aggressive in trying to pass as much as possible in unboxed tuples, in general it's not a problem but when one big datatype is passed to several consumers it might end up pushed to stack several times i...By default CRP analysis can be too aggressive in trying to pass as much as possible in unboxed tuples, in general it's not a problem but when one big datatype is passed to several consumers it might end up pushed to stack several times instead of once - to heap, things are getting worse when there are sufficient fields to cause stack overflow which otherwise is possible to avoid - in our codebase adding one field with ExistentialQuantification (unused, but that prevents ghc from doing CRP transformation) reduces number of stack overflow by a factor of 1000 and increases overall performance by 10%.
In provided example performance for both A and B should be identical and yet B is consistently faster by 3-5%
It's possible to increase this performance gap by adding more and more fields.
I was able to replicate this issue in ghc 7.8.3 and 7.10,1rc2
```hs
{-# LANGUAGE ExistentialQuantification #-}
module Blah where
import Criterion
import Criterion.Main
import Data.Typeable
data A = A ()
!Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int
!Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int
data B = forall rep. (Typeable rep) => B rep
!Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int
!Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int
a :: A
a = A () 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
b :: B
b = B () 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
{-# NOINLINE a1 #-}
a1 :: A -> Int
a1 (A _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f1
{-# NOINLINE a2 #-}
a2 :: A -> Int
a2 (A _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f2
{-# NOINLINE a3 #-}
a3 :: A -> Int
a3 (A _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f3
{-# NOINLINE a4 #-}
a4 :: A -> Int
a4 (A _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f4
{-# NOINLINE b1 #-}
b1 :: B -> Int
b1 (B _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f1
{-# NOINLINE b2 #-}
b2 :: B -> Int
b2 (B _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f2
{-# NOINLINE b3 #-}
b3 :: B -> Int
b3 (B _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f3
{-# NOINLINE b4 #-}
b4 :: B -> Int
b4 (B _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f4
{-# NOINLINE fa #-}
fa :: A -> Int
fa a = a1 a + a2 a + a3 a + a4 a
{-# NOINLINE fb #-}
fb :: B -> Int
fb b = b1 b + b2 b + b3 b + b4 b
main :: IO ()
main = defaultMain [
bgroup "single call" [
bench "A" $ whnf fa a
, bench "B" $ whnf fb b
]
]
```
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 7.10.1-rc2 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"CPR related performance issue","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"7.10.1-rc2","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"By default CRP analysis can be too aggressive in trying to pass as much as possible in unboxed tuples, in general it's not a problem but when one big datatype is passed to several consumers it might end up pushed to stack several times instead of once - to heap, things are getting worse when there are sufficient fields to cause stack overflow which otherwise is possible to avoid - in our codebase adding one field with ExistentialQuantification (unused, but that prevents ghc from doing CRP transformation) reduces number of stack overflow by a factor of 1000 and increases overall performance by 10%. \r\n\r\nIn provided example performance for both A and B should be identical and yet B is consistently faster by 3-5%\r\n\r\nIt's possible to increase this performance gap by adding more and more fields.\r\n\r\nI was able to replicate this issue in ghc 7.8.3 and 7.10,1rc2\r\n\r\n{{{#!hs\r\n\r\n{-# LANGUAGE ExistentialQuantification #-}\r\n\r\nmodule Blah where\r\n\r\nimport Criterion\r\nimport Criterion.Main\r\nimport Data.Typeable\r\n\r\ndata A = A ()\r\n !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int\r\n !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int\r\n\r\ndata B = forall rep. (Typeable rep) => B rep\r\n !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int\r\n !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int\r\n\r\na :: A\r\na = A () 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8\r\n\r\nb :: B\r\nb = B () 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8\r\n\r\n{-# NOINLINE a1 #-}\r\na1 :: A -> Int\r\na1 (A _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f1\r\n\r\n{-# NOINLINE a2 #-}\r\na2 :: A -> Int\r\na2 (A _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f2\r\n\r\n{-# NOINLINE a3 #-}\r\na3 :: A -> Int\r\na3 (A _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f3\r\n\r\n{-# NOINLINE a4 #-}\r\na4 :: A -> Int\r\na4 (A _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f4\r\n\r\n{-# NOINLINE b1 #-}\r\nb1 :: B -> Int\r\nb1 (B _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f1\r\n\r\n{-# NOINLINE b2 #-}\r\nb2 :: B -> Int\r\nb2 (B _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f2\r\n\r\n{-# NOINLINE b3 #-}\r\nb3 :: B -> Int\r\nb3 (B _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f3\r\n\r\n{-# NOINLINE b4 #-}\r\nb4 :: B -> Int\r\nb4 (B _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f4\r\n\r\n{-# NOINLINE fa #-}\r\nfa :: A -> Int\r\nfa a = a1 a + a2 a + a3 a + a4 a\r\n\r\n{-# NOINLINE fb #-}\r\nfb :: B -> Int\r\nfb b = b1 b + b2 b + b3 b + b4 b\r\n\r\nmain :: IO ()\r\nmain = defaultMain [\r\n bgroup \"single call\" [\r\n bench \"A\" $ whnf fa a\r\n , bench \"B\" $ whnf fb b\r\n ]\r\n ]\r\n\r\n\r\n}}}","type_of_failure":"OtherFailure","blocking":[]} -->8.8.1https://gitlab.haskell.org/ghc/ghc/-/issues/8763forM_ [1..N] does not get fused (allocates 50% more)2022-08-23T12:37:25ZNiklas Hambüchenmail@nh2.meforM_ [1..N] does not get fused (allocates 50% more)Apparently idiomatic code like `forM_ [1.._N]` does not get fused away.
This can give serious performance problems when unnoticed.
```hs
-- Slow:
forM_ [0.._N-1] $ \i -> do ...
-- Around 10 times faster:
loop _N $ \i -> do ...
{-# IN...Apparently idiomatic code like `forM_ [1.._N]` does not get fused away.
This can give serious performance problems when unnoticed.
```hs
-- Slow:
forM_ [0.._N-1] $ \i -> do ...
-- Around 10 times faster:
loop _N $ \i -> do ...
{-# INLINE loop #-}
loop :: (Monad m) => Int -> (Int -> m ()) -> m ()
loop bex f = go 0
where
go !n | n == bex = return ()
| otherwise = f n >> go (n+1)
```
Full code example: https://gist.github.com/nh2/8905997 - the relevant alternatives are commented.8.8.1