A benchmark compiled with ghc-9.5.20220830 is 20% slower and allocates 40% more than with 9.4.2 (which, in turn, is much faster than 9.2.4 and 9.0.2)
This is a bonus benchmark that appeared when splitting #21715 (closed), with up to date repro instructions and known causes ruled out. In particular, this uses HEAD ghc-9.5.20220830 plus !7847 (closed) (and copious amounts of manual SPECIALIZE pragmas, despite -fexpose-all-unfoldings -fspecialise-aggressively
) to rule out simple specialization problems.
To reproduce:
- git clone git@github.com:Mikolaj/horde-ad.git
- git checkout 9.4-vs-HEAD-alloc-slow
- cabal bench longMnistBench -w ghc-9.4.2 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 1 -m prefix "2-hidden-layer MNIST nn with samples: 400/500" +RTS -s'
- cabal bench longMnistBench -w ~/r/ghc/_build/stage1/bin/ghc --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 1 -m prefix "2-hidden-layer MNIST nn with samples: 400/500" +RTS -s'
- compare
Let me also attach the results I'm getting:
~/r/horde-ad$ cabal bench longMnistBench -w ghc-9.0.2 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 1 -m prefix "2-hidden-layer MNIST nn with samples: 400/500" +RTS -s'
Resolving dependencies...
Build profile: -w ghc-9.0.2 -O1
In order, the following will be built (use -v for more details):
- horde-ad-0.1.0.0 (bench:longMnistBench) (first run)
Preprocessing benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Building benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Running 1 benchmarks...
Benchmark longMnistBench: RUNNING...
benchmarking 2-hidden-layer MNIST nn with samples: 400/500|150 s469160 v0 m0=469160
261,545,705,080 bytes allocated in the heap
5,464,347,736 bytes copied during GC
142,367,832 bytes maximum residency (2 sample(s))
2,769,832 bytes maximum slop
1786 MiB total memory in use (50 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 171 colls, 0 par 3.084s 3.086s 0.0180s 0.0344s
Gen 1 2 colls, 0 par 0.047s 0.047s 0.0236s 0.0412s
INIT time 0.008s ( 0.008s elapsed)
MUT time 50.058s ( 50.087s elapsed)
GC time 3.131s ( 3.133s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 53.197s ( 53.228s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 5,224,892,914 bytes per MUT second
Productivity 94.1% of total user, 94.1% of total elapsed
Benchmark longMnistBench: FINISH
~/r/horde-ad$ cabal bench longMnistBench -w ghc-9.2.4 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 1 -m prefix "2-hidden-layer MNIST nn with samples: 400/500" +RTS -s'
Resolving dependencies...
Build profile: -w ghc-9.2.4 -O1
In order, the following will be built (use -v for more details):
- horde-ad-0.1.0.0 (bench:longMnistBench) (first run)
Preprocessing benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Building benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Running 1 benchmarks...
Benchmark longMnistBench: RUNNING...
benchmarking 2-hidden-layer MNIST nn with samples: 400/500|150 s469160 v0 m0=469160
261,482,315,704 bytes allocated in the heap
5,717,353,736 bytes copied during GC
135,071,544 bytes maximum residency (2 sample(s))
2,767,048 bytes maximum slop
1772 MiB total memory in use (30 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 172 colls, 0 par 3.264s 3.266s 0.0190s 0.0344s
Gen 1 2 colls, 0 par 0.041s 0.041s 0.0203s 0.0353s
INIT time 0.007s ( 0.007s elapsed)
MUT time 50.009s ( 50.038s elapsed)
GC time 3.304s ( 3.306s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 53.320s ( 53.351s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 5,228,725,969 bytes per MUT second
Productivity 93.8% of total user, 93.8% of total elapsed
Benchmark longMnistBench: FINISH
~/r/horde-ad$ cabal bench longMnistBench -w ghc-9.4.2 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 1 -m prefix "2-hidden-layer MNIST nn with samples: 400/500" +RTS -s'
Resolving dependencies...
Build profile: -w ghc-9.4.2 -O1
In order, the following will be built (use -v for more details):
- horde-ad-0.1.0.0 (bench:longMnistBench) (first run)
Preprocessing benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Building benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Running 1 benchmarks...
Benchmark longMnistBench: RUNNING...
benchmarking 2-hidden-layer MNIST nn with samples: 400/500|150 s469160 v0 m0=469160
261,016,793,440 bytes allocated in the heap
5,828,274,000 bytes copied during GC
138,711,264 bytes maximum residency (2 sample(s))
2,772,768 bytes maximum slop
1775 MiB total memory in use (30 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 172 colls, 0 par 3.298s 3.300s 0.0192s 0.0343s
Gen 1 2 colls, 0 par 0.042s 0.042s 0.0209s 0.0365s
INIT time 0.008s ( 0.008s elapsed)
MUT time 38.294s ( 38.300s elapsed)
GC time 3.339s ( 3.342s elapsed)
EXIT time 0.000s ( 0.001s elapsed)
Total time 41.641s ( 41.650s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 6,816,085,810 bytes per MUT second
Productivity 92.0% of total user, 92.0% of total elapsed
Benchmark longMnistBench: FINISH
~/r/horde-ad$ cabal bench longMnistBench -w ~/r/ghc/_build/stage1/bin/ghc --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 1 -m prefix "2-hidden-layer MNIST nn with samples: 400/500" +RTS -s'
Resolving dependencies...
Build profile: -w ghc-9.5.20220830 -O1
In order, the following will be built (use -v for more details):
- horde-ad-0.1.0.0 (bench:longMnistBench) (first run)
Preprocessing benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Building benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Running 1 benchmarks...
Benchmark longMnistBench: RUNNING...
benchmarking 2-hidden-layer MNIST nn with samples: 400/500|150 s469160 v0 m0=469160
344,738,025,920 bytes allocated in the heap
7,200,223,128 bytes copied during GC
149,138,568 bytes maximum residency (2 sample(s))
2,794,360 bytes maximum slop
1772 MiB total memory in use (19 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 232 colls, 0 par 4.206s 4.208s 0.0181s 0.0371s
Gen 1 2 colls, 0 par 0.049s 0.049s 0.0245s 0.0439s
INIT time 0.008s ( 0.008s elapsed)
MUT time 47.095s ( 47.097s elapsed)
GC time 4.255s ( 4.257s elapsed)
EXIT time 0.000s ( 0.009s elapsed)
Total time 51.358s ( 51.370s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 7,319,993,667 bytes per MUT second
Productivity 91.7% of total user, 91.7% of total elapsed
Benchmark longMnistBench: FINISH