A benchmark compiled with ghc-9.5.20220830 is 14% slower and allocates 1% more than with 9.0.2
This is the second half of #21715 (closed) split off, with updated repro instructions and known causes ruled out. In particular, this uses HEAD ghc-9.5.20220830 plus !7847 (closed) (and some manual SPECIALIZE pragmas, despite -fexpose-all-unfoldings -fspecialise-aggressively
) to rule out simple specialization problems.
IMHO, this is the least actionable of the 3 issues stemming from #21715 (closed), given the long timespan between 9.0.2 and now, low volume of regression and the fact that I can't reproduce this on any other branch of my project, while the other regressions crop up often, in various variants and magnitudes.
To reproduce:
- git clone git@github.com:Mikolaj/horde-ad.git
- git checkout ghc-report-specialize
- cabal bench mnist -w ghc-9.0.2 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 1 -m prefix "2-hidden-layer MNIST nn with samples: 500/500" +RTS -s'
- cabal bench mnist -w ~/r/ghc/_build/stage1/bin/ghc --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 1 -m prefix "2-hidden-layer MNIST nn with samples: 500/500" +RTS -s'
- compare
Let me also attach the results I'm getting:
~/r/horde-ad$ cabal bench mnist -w ghc-9.0.2 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 1 -m prefix "2-hidden-layer MNIST nn with samples: 500/500" +RTS -s'
Resolving dependencies...
Build profile: -w ghc-9.0.2 -O1
In order, the following will be built (use -v for more details):
- horde-ad-0.1.0.0 (bench:mnist) (first run)
Preprocessing benchmark 'mnist' for horde-ad-0.1.0.0..
Building benchmark 'mnist' for horde-ad-0.1.0.0..
Running 1 benchmarks...
Benchmark mnist: RUNNING...
benchmarking 2-hidden-layer MNIST nn with samples: 500/500|150 s469160 v0 m0=469160
29,993,613,568 bytes allocated in the heap
36,535,841,440 bytes copied during GC
119,503,400 bytes maximum residency (253 sample(s))
2,890,432 bytes maximum slop
268 MiB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 22918 colls, 0 par 11.906s 11.920s 0.0005s 0.0017s
Gen 1 253 colls, 0 par 1.966s 1.967s 0.0078s 0.0306s
INIT time 0.000s ( 0.000s elapsed)
MUT time 7.949s ( 7.947s elapsed)
GC time 13.872s ( 13.887s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 21.821s ( 21.834s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 3,773,309,905 bytes per MUT second
Productivity 36.4% of total user, 36.4% of total elapsed
Benchmark mnist: FINISH
~/r/horde-ad$ cabal bench mnist -w ghc-9.2.4 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 1 -m prefix "2-hidden-layer MNIST nn with samples: 500/500" +RTS -s'
Resolving dependencies...
Build profile: -w ghc-9.2.4 -O1
In order, the following will be built (use -v for more details):
- horde-ad-0.1.0.0 (bench:mnist) (first run)
Preprocessing benchmark 'mnist' for horde-ad-0.1.0.0..
Building benchmark 'mnist' for horde-ad-0.1.0.0..
Running 1 benchmarks...
Benchmark mnist: RUNNING...
benchmarking 2-hidden-layer MNIST nn with samples: 500/500|150 s469160 v0 m0=469160
73,220,505,408 bytes allocated in the heap
36,244,304,376 bytes copied during GC
119,595,768 bytes maximum residency (198 sample(s))
2,917,136 bytes maximum slop
277 MiB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 16083 colls, 0 par 14.166s 14.186s 0.0009s 0.0026s
Gen 1 198 colls, 0 par 3.142s 3.145s 0.0159s 0.0313s
INIT time 0.000s ( 0.000s elapsed)
MUT time 26.487s ( 26.497s elapsed)
GC time 17.309s ( 17.331s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 43.796s ( 43.828s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 2,764,370,213 bytes per MUT second
Productivity 60.5% of total user, 60.5% of total elapsed
Benchmark mnist: FINISH
~/r/horde-ad$ cabal bench mnist -w ghc-9.4.2 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 1 -m prefix "2-hidden-layer MNIST nn with samples: 500/500" +RTS -s'
Resolving dependencies...
Build profile: -w ghc-9.4.2 -O1
In order, the following will be built (use -v for more details):
- horde-ad-0.1.0.0 (bench:mnist) (first run)
Preprocessing benchmark 'mnist' for horde-ad-0.1.0.0..
Building benchmark 'mnist' for horde-ad-0.1.0.0..
Running 1 benchmarks...
Benchmark mnist: RUNNING...
benchmarking 2-hidden-layer MNIST nn with samples: 500/500|150 s469160 v0 m0=469160
30,633,303,424 bytes allocated in the heap
34,157,584,608 bytes copied during GC
115,291,200 bytes maximum residency (176 sample(s))
2,864,464 bytes maximum slop
268 MiB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 5609 colls, 0 par 12.459s 12.471s 0.0022s 0.0062s
Gen 1 176 colls, 0 par 3.195s 3.197s 0.0182s 0.0294s
INIT time 0.000s ( 0.000s elapsed)
MUT time 7.686s ( 7.674s elapsed)
GC time 15.654s ( 15.668s elapsed)
EXIT time 0.000s ( 0.008s elapsed)
Total time 23.340s ( 23.350s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 3,985,560,410 bytes per MUT second
Productivity 32.9% of total user, 32.9% of total elapsed
Benchmark mnist: FINISH
~/r/horde-ad$ cabal bench mnist -w ~/r/ghc/_build/stage1/bin/ghc --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 1 -m prefix "2-hidden-layer MNIST nn with samples: 500/500" +RTS -s'
Resolving dependencies...
Build profile: -w ghc-9.5.20220830 -O1
In order, the following will be built (use -v for more details):
- horde-ad-0.1.0.0 (bench:mnist) (first run)
Preprocessing benchmark 'mnist' for horde-ad-0.1.0.0..
Building benchmark 'mnist' for horde-ad-0.1.0.0..
Running 1 benchmarks...
Benchmark mnist: RUNNING...
benchmarking 2-hidden-layer MNIST nn with samples: 500/500|150 s469160 v0 m0=469160
30,345,381,232 bytes allocated in the heap
36,181,972,552 bytes copied during GC
115,757,168 bytes maximum residency (176 sample(s))
2,859,000 bytes maximum slop
267 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 5609 colls, 0 par 13.610s 13.623s 0.0024s 0.0063s
Gen 1 176 colls, 0 par 3.459s 3.465s 0.0197s 0.0311s
INIT time 0.000s ( 0.000s elapsed)
MUT time 7.862s ( 7.850s elapsed)
GC time 17.069s ( 17.088s elapsed)
EXIT time 0.000s ( 0.002s elapsed)
Total time 24.931s ( 24.940s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 3,859,837,479 bytes per MUT second
Productivity 31.5% of total user, 31.5% of total elapsed
Benchmark mnist: FINISH