A simple loop in ST takes much longer to complete with 9.4.4 than 9.4.3 (NCG)
Consider the following simple program:
st_loop.hs:
module Main where
import Control.Monad.ST
import Data.STRef
programST :: STRef s Integer -> ST s Integer
programST ref = do
n <- readSTRef ref
if n <= 0
then pure n
else do
writeSTRef ref $! n - 1
programST ref
countdownST :: Integer -> (Integer, Integer)
countdownST n = runST $ do
ref <- newSTRef n
a <- programST ref
s <- readSTRef ref
pure (a, s)
main :: IO ()
main = putStrLn . show $ countdownST 100000000
Now:
unknown@electronics haskell $ /usr/local/haskell/ghc-9.4.3/bin/ghc -fforce-recomp -O --make st_loop.hs
[1 of 2] Compiling Main ( st_loop.hs, st_loop.o )
[2 of 2] Linking st_loop [Objects changed]
unknown@electronics haskell $ ./st_loop +RTS -s
(0,0)
1,600,050,840 bytes allocated in the heap
25,120 bytes copied during GC
44,328 bytes maximum residency (2 sample(s))
29,400 bytes maximum slop
5 MiB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 380 colls, 0 par 0.001s 0.001s 0.0000s 0.0000s
Gen 1 2 colls, 0 par 0.000s 0.000s 0.0001s 0.0001s
INIT time 0.000s ( 0.000s elapsed)
MUT time 0.333s ( 0.333s elapsed)
GC time 0.001s ( 0.001s elapsed)
EXIT time 0.000s ( 0.006s elapsed)
Total time 0.334s ( 0.340s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 4,804,291,700 bytes per MUT second
Productivity 99.7% of total user, 97.9% of total elapsed
unknown@electronics haskell $ /usr/local/haskell/ghc-9.4.4/bin/ghc -fforce-recomp -O --make st_loop.hs
[1 of 2] Compiling Main ( st_loop.hs, st_loop.o )
[2 of 2] Linking st_loop [Objects changed]
unknown@electronics haskell $ ./st_loop +RTS -s
(0,0)
1,600,050,840 bytes allocated in the heap
25,120 bytes copied during GC
44,328 bytes maximum residency (2 sample(s))
29,400 bytes maximum slop
5 MiB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 380 colls, 0 par 0.001s 0.001s 0.0000s 0.0000s
Gen 1 2 colls, 0 par 0.000s 0.000s 0.0001s 0.0001s
INIT time 0.000s ( 0.000s elapsed)
MUT time 1.843s ( 1.843s elapsed)
GC time 0.001s ( 0.001s elapsed)
EXIT time 0.000s ( 0.006s elapsed)
Total time 1.844s ( 1.850s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 868,062,260 bytes per MUT second
Productivity 99.9% of total user, 99.6% of total elapsed
It takes 6 times longer to complete with 9.4.4 than with 9.4.3.
However, with LLVM there is almost no difference:
unknown@electronics haskell $ /usr/local/haskell/ghc-9.4.3/bin/ghc -fforce-recomp -O -fllvm --make st_loop.hs
[1 of 2] Compiling Main ( st_loop.hs, st_loop.o )
You are using an unsupported version of LLVM!
Currently only 10 up to 14 (non inclusive) is supported. System LLVM version: 14.0.6
We will try though...
[2 of 2] Linking st_loop [Objects changed]
unknown@electronics haskell $ ./st_loop +RTS -s
(0,0)
1,600,050,840 bytes allocated in the heap
25,120 bytes copied during GC
44,328 bytes maximum residency (2 sample(s))
29,400 bytes maximum slop
5 MiB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 380 colls, 0 par 0.001s 0.001s 0.0000s 0.0000s
Gen 1 2 colls, 0 par 0.000s 0.000s 0.0001s 0.0001s
INIT time 0.000s ( 0.000s elapsed)
MUT time 0.355s ( 0.355s elapsed)
GC time 0.001s ( 0.001s elapsed)
EXIT time 0.000s ( 0.004s elapsed)
Total time 0.356s ( 0.360s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 4,502,712,001 bytes per MUT second
Productivity 99.7% of total user, 98.7% of total elapsed
unknown@electronics haskell $ /usr/local/haskell/ghc-9.4.4/bin/ghc -fforce-recomp -O -fllvm --make st_loop.hs
[1 of 2] Compiling Main ( st_loop.hs, st_loop.o )
You are using an unsupported version of LLVM!
Currently only 10 up to 14 (non inclusive) is supported. System LLVM version: 14.0.6
We will try though...
[2 of 2] Linking st_loop [Objects changed]
unknown@electronics haskell $ ./st_loop +RTS -s
(0,0)
1,600,050,840 bytes allocated in the heap
25,120 bytes copied during GC
44,328 bytes maximum residency (2 sample(s))
29,400 bytes maximum slop
5 MiB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 380 colls, 0 par 0.001s 0.001s 0.0000s 0.0000s
Gen 1 2 colls, 0 par 0.000s 0.000s 0.0001s 0.0001s
INIT time 0.000s ( 0.000s elapsed)
MUT time 0.396s ( 0.396s elapsed)
GC time 0.001s ( 0.001s elapsed)
EXIT time 0.000s ( 0.003s elapsed)
Total time 0.397s ( 0.400s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 4,043,468,789 bytes per MUT second
Productivity 99.7% of total user, 98.9% of total elapsed
Generated Core and STG seem to be the same for both versions, something weird must be going on in the NCG.
Edited by Andrzej Rybczak