A benchmark compiled with ghc-9.5.20220830 is 25% slower and allocates 87% more than with 9.2.4 (and 9.0.2)
This is another bonus benchmark that appeared when splitting #21715 (closed), with up to date repro instructions and known causes ruled out. In particular, this uses HEAD ghc-9.5.20220830 plus !7847 (closed) (and copious amounts of manual SPECIALIZE pragmas, despite -fexpose-all-unfoldings -fspecialise-aggressively
) to rule out simple specialization problems.
To reproduce:
- git clone git@github.com:Mikolaj/horde-ad.git
- git checkout 9.4-vs-HEAD-alloc-slow
- cabal bench longProdBench -w ~/r/ghc/_build/stage1/bin/ghc --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n1 -m glob "5e7/grad_vec" +RTS -s'
- cabal bench longProdBench -w ghc-9.2.4 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n1 -m glob "5e7/grad_vec" +RTS -s'
- compare
Let me also attach the results I'm getting:
~/r/horde-ad$ cabal bench longProdBench -w ghc-9.0.2 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n1 -m glob "5e7/grad_vec" +RTS -s'
Benchmark longProdBench: RUNNING...
benchmarking 5e7/grad_vec
35,844,466,832 bytes allocated in the heap
30,562,600,528 bytes copied during GC
9,429,216,480 bytes maximum residency (6 sample(s))
215,469,824 bytes maximum slop
17520 MiB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 34 colls, 0 par 6.720s 6.724s 0.1978s 0.7041s
Gen 1 6 colls, 0 par 8.194s 8.199s 1.3665s 4.3635s
INIT time 0.007s ( 0.007s elapsed)
MUT time 7.757s ( 7.761s elapsed)
GC time 14.914s ( 14.923s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 22.677s ( 22.691s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 4,621,008,995 bytes per MUT second
Productivity 34.2% of total user, 34.2% of total elapsed
Benchmark longProdBench: FINISH
~/r/horde-ad$ cabal bench longProdBench -w ghc-9.2.4 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n1 -m glob "5e7/grad_vec" +RTS -s'
Linking /home/mikolaj/r/horde-ad/dist-newstyle/build/x86_64-linux/ghc-9.2.4/horde-ad-0.1.0.0/b/longProdBench/build/longProdBench/longProdBench ...
Running 1 benchmarks...
Benchmark longProdBench: RUNNING...
benchmarking 5e7/grad_vec
35,844,471,080 bytes allocated in the heap
30,562,597,960 bytes copied during GC
9,429,215,048 bytes maximum residency (6 sample(s))
215,462,048 bytes maximum slop
17521 MiB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 34 colls, 0 par 6.924s 6.928s 0.2038s 0.7221s
Gen 1 6 colls, 0 par 8.243s 8.249s 1.3749s 4.3277s
INIT time 0.008s ( 0.008s elapsed)
MUT time 7.520s ( 7.524s elapsed)
GC time 15.167s ( 15.177s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 22.695s ( 22.710s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 4,766,633,282 bytes per MUT second
Productivity 33.1% of total user, 33.1% of total elapsed
Benchmark longProdBench: FINISH
~/r/horde-ad$ cabal bench longProdBench -w ghc-9.4.2 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n1 -m glob "5e7/grad_vec" +RTS -s'
Benchmark longProdBench: RUNNING...
benchmarking 5e7/grad_vec
67,217,913,640 bytes allocated in the heap
36,966,538,056 bytes copied during GC
10,423,577,544 bytes maximum residency (7 sample(s))
315,996,120 bytes maximum slop
20504 MiB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 72 colls, 0 par 7.635s 7.642s 0.1061s 0.6795s
Gen 1 7 colls, 0 par 10.905s 10.915s 1.5592s 4.4384s
INIT time 0.007s ( 0.007s elapsed)
MUT time 9.688s ( 9.683s elapsed)
GC time 18.540s ( 18.557s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 28.236s ( 28.247s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 6,938,350,873 bytes per MUT second
Productivity 34.3% of total user, 34.3% of total elapsed
Benchmark longProdBench: FINISH
~/r/horde-ad$ cabal bench longProdBench -w ~/r/ghc/_build/stage1/bin/ghc --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n1 -m glob "5e7/grad_vec" +RTS -s'
Benchmark longProdBench: RUNNING...
benchmarking 5e7/grad_vec
67,217,779,480 bytes allocated in the heap
36,705,264,480 bytes copied during GC
10,477,856,456 bytes maximum residency (7 sample(s))
307,244,408 bytes maximum slop
20535 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 72 colls, 0 par 7.681s 7.686s 0.1067s 0.6958s
Gen 1 7 colls, 0 par 10.724s 10.740s 1.5342s 4.5879s
INIT time 0.008s ( 0.008s elapsed)
MUT time 9.912s ( 9.908s elapsed)
GC time 18.405s ( 18.425s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 28.325s ( 28.341s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 6,781,208,825 bytes per MUT second
Productivity 35.0% of total user, 35.0% of total elapsed
Benchmark longProdBench: FINISH