A benchmark compiled with ghc-9.5.20220830 is 10% slower and allocates 16 times more than with 9.2.4
This is the first half of #21715 (closed) split off, with updated repro instructions and known causes ruled out. In particular, this uses HEAD ghc-9.5.20220830 plus !7847 (closed) (and copious amounts of manual SPECIALIZE pragmas, despite -fexpose-all-unfoldings -fspecialise-aggressively
) to rule out simple specialization problems. The code is also made simpler to avoid depending on fancy GHC optimizations, as captured in #21736 and at !7847 (comment 446575).
To reproduce:
- git clone git@github.com:Mikolaj/horde-ad.git
- git checkout big-alloc-9.4.2-and-HEAD
- cabal bench longMnistBench -w ghc-9.2.4 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 10 -m prefix "2-hidden-layer MNIST nn with samples: 400/test 500" +RTS -s'
- cabal bench longMnistBench -w ~/r/ghc/_build/stage1/bin/ghc --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 10 -m prefix "2-hidden-layer MNIST nn with samples: 400/test 500" +RTS -s'
- compare
Let me also attach the results I'm getting:
~/r/horde-ad$ cabal bench longMnistBench -w ghc-9.0.2 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 10 -m prefix "2-hidden-layer MNIST nn with samples: 400/test 500" +RTS -s'
Resolving dependencies...
Build profile: -w ghc-9.0.2 -O1
In order, the following will be built (use -v for more details):
- horde-ad-0.1.0.0 (bench:longMnistBench) (first run)
Preprocessing benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Building benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Running 1 benchmarks...
Benchmark longMnistBench: RUNNING...
benchmarking 2-hidden-layer MNIST nn with samples: 400/test 500|150 s469160 v0 m0=469160
92,143,606,480 bytes allocated in the heap
6,037,760 bytes copied during GC
74,995,488 bytes maximum residency (2 sample(s))
1,829,088 bytes maximum slop
1674 MiB total memory in use (15 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 60 colls, 0 par 0.032s 0.032s 0.0005s 0.0057s
Gen 1 2 colls, 0 par 0.006s 0.006s 0.0031s 0.0034s
INIT time 0.008s ( 0.008s elapsed)
MUT time 20.295s ( 20.307s elapsed)
GC time 0.038s ( 0.038s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 20.340s ( 20.353s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 4,540,243,877 bytes per MUT second
Productivity 99.8% of total user, 99.8% of total elapsed
Benchmark longMnistBench: FINISH
~/r/horde-ad$ cabal bench longMnistBench -w ghc-9.2.4 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 10 -m prefix "2-hidden-layer MNIST nn with samples: 400/test 500" +RTS -s'
Resolving dependencies...
Build profile: -w ghc-9.2.4 -O1
In order, the following will be built (use -v for more details):
- horde-ad-0.1.0.0 (bench:longMnistBench) (first run)
Preprocessing benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Building benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Running 1 benchmarks...
Benchmark longMnistBench: RUNNING...
benchmarking 2-hidden-layer MNIST nn with samples: 400/test 500|150 s469160 v0 m0=469160
2,026,518,112 bytes allocated in the heap
646,496 bytes copied during GC
83,280 bytes maximum residency (1 sample(s))
47,792 bytes maximum slop
1705 MiB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 1 colls, 0 par 0.006s 0.006s 0.0056s 0.0056s
Gen 1 1 colls, 0 par 0.003s 0.003s 0.0032s 0.0032s
INIT time 0.007s ( 0.008s elapsed)
MUT time 6.535s ( 6.538s elapsed)
GC time 0.009s ( 0.009s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 6.551s ( 6.555s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 310,108,511 bytes per MUT second
Productivity 99.8% of total user, 99.8% of total elapsed
Benchmark longMnistBench: FINISH
~/r/horde-ad$ cabal bench longMnistBench -w ghc-9.4.2 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 10 -m prefix "2-hidden-layer MNIST nn with samples: 400/test 500" +RTS -s'
Resolving dependencies...
Build profile: -w ghc-9.4.2 -O1
In order, the following will be built (use -v for more details):
- horde-ad-0.1.0.0 (bench:longMnistBench) (first run)
Preprocessing benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Building benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Running 1 benchmarks...
Benchmark longMnistBench: RUNNING...
benchmarking 2-hidden-layer MNIST nn with samples: 400/test 500|150 s469160 v0 m0=469160
32,054,564,784 bytes allocated in the heap
5,376,688 bytes copied during GC
75,004,272 bytes maximum residency (2 sample(s))
1,828,496 bytes maximum slop
1672 MiB total memory in use (15 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 20 colls, 0 par 0.032s 0.032s 0.0016s 0.0056s
Gen 1 2 colls, 0 par 0.006s 0.006s 0.0032s 0.0035s
INIT time 0.008s ( 0.008s elapsed)
MUT time 7.099s ( 7.099s elapsed)
GC time 0.039s ( 0.039s elapsed)
EXIT time 0.000s ( 0.004s elapsed)
Total time 7.146s ( 7.150s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 4,515,420,597 bytes per MUT second
Productivity 99.3% of total user, 99.3% of total elapsed
Benchmark longMnistBench: FINISH
~/r/horde-ad$ cabal bench longMnistBench -w ~/r/ghc/_build/stage1/bin/ghc --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 10 -m prefix "2-hidden-layer MNIST nn with samples: 400/test 500" +RTS -s'
Resolving dependencies...
Build profile: -w ghc-9.5.20220830 -O1
In order, the following will be built (use -v for more details):
- horde-ad-0.1.0.0 (bench:longMnistBench) (first run)
Preprocessing benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Building benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Running 1 benchmarks...
Benchmark longMnistBench: RUNNING...
benchmarking 2-hidden-layer MNIST nn with samples: 400/test 500|150 s469160 v0 m0=469160
32,049,456,152 bytes allocated in the heap
5,440,240 bytes copied during GC
75,006,528 bytes maximum residency (2 sample(s))
1,830,336 bytes maximum slop
1672 MiB total memory in use (15 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 20 colls, 0 par 0.031s 0.031s 0.0015s 0.0058s
Gen 1 2 colls, 0 par 0.007s 0.007s 0.0033s 0.0035s
INIT time 0.007s ( 0.007s elapsed)
MUT time 7.150s ( 7.151s elapsed)
GC time 0.037s ( 0.037s elapsed)
EXIT time 0.000s ( 0.005s elapsed)
Total time 7.195s ( 7.200s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 4,482,414,055 bytes per MUT second
Productivity 99.4% of total user, 99.3% of total elapsed
Benchmark longMnistBench: FINISH