Skip to content

A benchmark compiled with ghc-9.5.20220830 is 20% slower and allocates 40% more than with 9.4.2 (which, in turn, is much faster than 9.2.4 and 9.0.2)

This is a bonus benchmark that appeared when splitting #21715 (closed), with up to date repro instructions and known causes ruled out. In particular, this uses HEAD ghc-9.5.20220830 plus !7847 (closed) (and copious amounts of manual SPECIALIZE pragmas, despite -fexpose-all-unfoldings -fspecialise-aggressively) to rule out simple specialization problems.

To reproduce:

  1. git clone git@github.com:Mikolaj/horde-ad.git
  2. git checkout 9.4-vs-HEAD-alloc-slow
  3. cabal bench longMnistBench -w ghc-9.4.2 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 1 -m prefix "2-hidden-layer MNIST nn with samples: 400/500" +RTS -s'
  4. cabal bench longMnistBench -w ~/r/ghc/_build/stage1/bin/ghc --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 1 -m prefix "2-hidden-layer MNIST nn with samples: 400/500" +RTS -s'
  5. compare

Let me also attach the results I'm getting:

~/r/horde-ad$ cabal bench longMnistBench -w ghc-9.0.2 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 1 -m prefix "2-hidden-layer MNIST nn with samples: 400/500" +RTS -s'
Resolving dependencies...
Build profile: -w ghc-9.0.2 -O1
In order, the following will be built (use -v for more details):
 - horde-ad-0.1.0.0 (bench:longMnistBench) (first run)
Preprocessing benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Building benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Running 1 benchmarks...
Benchmark longMnistBench: RUNNING...
benchmarking 2-hidden-layer MNIST nn with samples: 400/500|150 s469160 v0 m0=469160
 261,545,705,080 bytes allocated in the heap
   5,464,347,736 bytes copied during GC
     142,367,832 bytes maximum residency (2 sample(s))
       2,769,832 bytes maximum slop
            1786 MiB total memory in use (50 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       171 colls,     0 par    3.084s   3.086s     0.0180s    0.0344s
  Gen  1         2 colls,     0 par    0.047s   0.047s     0.0236s    0.0412s

  INIT    time    0.008s  (  0.008s elapsed)
  MUT     time   50.058s  ( 50.087s elapsed)
  GC      time    3.131s  (  3.133s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time   53.197s  ( 53.228s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    5,224,892,914 bytes per MUT second

  Productivity  94.1% of total user, 94.1% of total elapsed

Benchmark longMnistBench: FINISH

~/r/horde-ad$ cabal bench longMnistBench -w ghc-9.2.4 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 1 -m prefix "2-hidden-layer MNIST nn with samples: 400/500" +RTS -s'
Resolving dependencies...
Build profile: -w ghc-9.2.4 -O1
In order, the following will be built (use -v for more details):
 - horde-ad-0.1.0.0 (bench:longMnistBench) (first run)
Preprocessing benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Building benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Running 1 benchmarks...
Benchmark longMnistBench: RUNNING...
benchmarking 2-hidden-layer MNIST nn with samples: 400/500|150 s469160 v0 m0=469160
 261,482,315,704 bytes allocated in the heap
   5,717,353,736 bytes copied during GC
     135,071,544 bytes maximum residency (2 sample(s))
       2,767,048 bytes maximum slop
            1772 MiB total memory in use (30 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       172 colls,     0 par    3.264s   3.266s     0.0190s    0.0344s
  Gen  1         2 colls,     0 par    0.041s   0.041s     0.0203s    0.0353s

  INIT    time    0.007s  (  0.007s elapsed)
  MUT     time   50.009s  ( 50.038s elapsed)
  GC      time    3.304s  (  3.306s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time   53.320s  ( 53.351s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    5,228,725,969 bytes per MUT second

  Productivity  93.8% of total user, 93.8% of total elapsed

Benchmark longMnistBench: FINISH

~/r/horde-ad$ cabal bench longMnistBench -w ghc-9.4.2 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 1 -m prefix "2-hidden-layer MNIST nn with samples: 400/500" +RTS -s'
Resolving dependencies...
Build profile: -w ghc-9.4.2 -O1
In order, the following will be built (use -v for more details):
 - horde-ad-0.1.0.0 (bench:longMnistBench) (first run)
Preprocessing benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Building benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Running 1 benchmarks...
Benchmark longMnistBench: RUNNING...
benchmarking 2-hidden-layer MNIST nn with samples: 400/500|150 s469160 v0 m0=469160
 261,016,793,440 bytes allocated in the heap
   5,828,274,000 bytes copied during GC
     138,711,264 bytes maximum residency (2 sample(s))
       2,772,768 bytes maximum slop
            1775 MiB total memory in use (30 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       172 colls,     0 par    3.298s   3.300s     0.0192s    0.0343s
  Gen  1         2 colls,     0 par    0.042s   0.042s     0.0209s    0.0365s

  INIT    time    0.008s  (  0.008s elapsed)
  MUT     time   38.294s  ( 38.300s elapsed)
  GC      time    3.339s  (  3.342s elapsed)
  EXIT    time    0.000s  (  0.001s elapsed)
  Total   time   41.641s  ( 41.650s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    6,816,085,810 bytes per MUT second

  Productivity  92.0% of total user, 92.0% of total elapsed

Benchmark longMnistBench: FINISH

~/r/horde-ad$ cabal bench longMnistBench -w ~/r/ghc/_build/stage1/bin/ghc --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 1 -m prefix "2-hidden-layer MNIST nn with samples: 400/500" +RTS -s'
Resolving dependencies...
Build profile: -w ghc-9.5.20220830 -O1
In order, the following will be built (use -v for more details):
 - horde-ad-0.1.0.0 (bench:longMnistBench) (first run)
Preprocessing benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Building benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Running 1 benchmarks...
Benchmark longMnistBench: RUNNING...
benchmarking 2-hidden-layer MNIST nn with samples: 400/500|150 s469160 v0 m0=469160
 344,738,025,920 bytes allocated in the heap
   7,200,223,128 bytes copied during GC
     149,138,568 bytes maximum residency (2 sample(s))
       2,794,360 bytes maximum slop
            1772 MiB total memory in use (19 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       232 colls,     0 par    4.206s   4.208s     0.0181s    0.0371s
  Gen  1         2 colls,     0 par    0.049s   0.049s     0.0245s    0.0439s

  INIT    time    0.008s  (  0.008s elapsed)
  MUT     time   47.095s  ( 47.097s elapsed)
  GC      time    4.255s  (  4.257s elapsed)
  EXIT    time    0.000s  (  0.009s elapsed)
  Total   time   51.358s  ( 51.370s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    7,319,993,667 bytes per MUT second

  Productivity  91.7% of total user, 91.7% of total elapsed

Benchmark longMnistBench: FINISH
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information