Skip to content

A benchmark compiled with ghc-9.5.20220830 is 10% slower and allocates 16 times more than with 9.2.4

This is the first half of #21715 (closed) split off, with updated repro instructions and known causes ruled out. In particular, this uses HEAD ghc-9.5.20220830 plus !7847 (closed) (and copious amounts of manual SPECIALIZE pragmas, despite -fexpose-all-unfoldings -fspecialise-aggressively) to rule out simple specialization problems. The code is also made simpler to avoid depending on fancy GHC optimizations, as captured in #21736 and at !7847 (comment 446575).

To reproduce:

  1. git clone git@github.com:Mikolaj/horde-ad.git
  2. git checkout big-alloc-9.4.2-and-HEAD
  3. cabal bench longMnistBench -w ghc-9.2.4 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 10 -m prefix "2-hidden-layer MNIST nn with samples: 400/test 500" +RTS -s'
  4. cabal bench longMnistBench -w ~/r/ghc/_build/stage1/bin/ghc --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 10 -m prefix "2-hidden-layer MNIST nn with samples: 400/test 500" +RTS -s'
  5. compare

Let me also attach the results I'm getting:

~/r/horde-ad$ cabal bench longMnistBench -w ghc-9.0.2 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 10 -m prefix "2-hidden-layer MNIST nn with samples: 400/test 500" +RTS -s'
Resolving dependencies...
Build profile: -w ghc-9.0.2 -O1
In order, the following will be built (use -v for more details):
 - horde-ad-0.1.0.0 (bench:longMnistBench) (first run)
Preprocessing benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Building benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Running 1 benchmarks...
Benchmark longMnistBench: RUNNING...
benchmarking 2-hidden-layer MNIST nn with samples: 400/test 500|150 s469160 v0 m0=469160
  92,143,606,480 bytes allocated in the heap
       6,037,760 bytes copied during GC
      74,995,488 bytes maximum residency (2 sample(s))
       1,829,088 bytes maximum slop
            1674 MiB total memory in use (15 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        60 colls,     0 par    0.032s   0.032s     0.0005s    0.0057s
  Gen  1         2 colls,     0 par    0.006s   0.006s     0.0031s    0.0034s

  INIT    time    0.008s  (  0.008s elapsed)
  MUT     time   20.295s  ( 20.307s elapsed)
  GC      time    0.038s  (  0.038s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time   20.340s  ( 20.353s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    4,540,243,877 bytes per MUT second

  Productivity  99.8% of total user, 99.8% of total elapsed

Benchmark longMnistBench: FINISH

~/r/horde-ad$ cabal bench longMnistBench -w ghc-9.2.4 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 10 -m prefix "2-hidden-layer MNIST nn with samples: 400/test 500" +RTS -s'
Resolving dependencies...
Build profile: -w ghc-9.2.4 -O1
In order, the following will be built (use -v for more details):
 - horde-ad-0.1.0.0 (bench:longMnistBench) (first run)
Preprocessing benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Building benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Running 1 benchmarks...
Benchmark longMnistBench: RUNNING...
benchmarking 2-hidden-layer MNIST nn with samples: 400/test 500|150 s469160 v0 m0=469160
   2,026,518,112 bytes allocated in the heap
         646,496 bytes copied during GC
          83,280 bytes maximum residency (1 sample(s))
          47,792 bytes maximum slop
            1705 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0         1 colls,     0 par    0.006s   0.006s     0.0056s    0.0056s
  Gen  1         1 colls,     0 par    0.003s   0.003s     0.0032s    0.0032s

  INIT    time    0.007s  (  0.008s elapsed)
  MUT     time    6.535s  (  6.538s elapsed)
  GC      time    0.009s  (  0.009s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    6.551s  (  6.555s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    310,108,511 bytes per MUT second

  Productivity  99.8% of total user, 99.8% of total elapsed

Benchmark longMnistBench: FINISH

~/r/horde-ad$ cabal bench longMnistBench -w ghc-9.4.2 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 10 -m prefix "2-hidden-layer MNIST nn with samples: 400/test 500" +RTS -s'
Resolving dependencies...
Build profile: -w ghc-9.4.2 -O1
In order, the following will be built (use -v for more details):
 - horde-ad-0.1.0.0 (bench:longMnistBench) (first run)
Preprocessing benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Building benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Running 1 benchmarks...
Benchmark longMnistBench: RUNNING...
benchmarking 2-hidden-layer MNIST nn with samples: 400/test 500|150 s469160 v0 m0=469160
  32,054,564,784 bytes allocated in the heap
       5,376,688 bytes copied during GC
      75,004,272 bytes maximum residency (2 sample(s))
       1,828,496 bytes maximum slop
            1672 MiB total memory in use (15 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        20 colls,     0 par    0.032s   0.032s     0.0016s    0.0056s
  Gen  1         2 colls,     0 par    0.006s   0.006s     0.0032s    0.0035s

  INIT    time    0.008s  (  0.008s elapsed)
  MUT     time    7.099s  (  7.099s elapsed)
  GC      time    0.039s  (  0.039s elapsed)
  EXIT    time    0.000s  (  0.004s elapsed)
  Total   time    7.146s  (  7.150s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    4,515,420,597 bytes per MUT second

  Productivity  99.3% of total user, 99.3% of total elapsed

Benchmark longMnistBench: FINISH

~/r/horde-ad$ cabal bench longMnistBench -w ~/r/ghc/_build/stage1/bin/ghc --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n 10 -m prefix "2-hidden-layer MNIST nn with samples: 400/test 500" +RTS -s'
Resolving dependencies...
Build profile: -w ghc-9.5.20220830 -O1
In order, the following will be built (use -v for more details):
 - horde-ad-0.1.0.0 (bench:longMnistBench) (first run)
Preprocessing benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Building benchmark 'longMnistBench' for horde-ad-0.1.0.0..
Running 1 benchmarks...
Benchmark longMnistBench: RUNNING...
benchmarking 2-hidden-layer MNIST nn with samples: 400/test 500|150 s469160 v0 m0=469160
  32,049,456,152 bytes allocated in the heap
       5,440,240 bytes copied during GC
      75,006,528 bytes maximum residency (2 sample(s))
       1,830,336 bytes maximum slop
            1672 MiB total memory in use (15 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        20 colls,     0 par    0.031s   0.031s     0.0015s    0.0058s
  Gen  1         2 colls,     0 par    0.007s   0.007s     0.0033s    0.0035s

  INIT    time    0.007s  (  0.007s elapsed)
  MUT     time    7.150s  (  7.151s elapsed)
  GC      time    0.037s  (  0.037s elapsed)
  EXIT    time    0.000s  (  0.005s elapsed)
  Total   time    7.195s  (  7.200s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    4,482,414,055 bytes per MUT second

  Productivity  99.4% of total user, 99.3% of total elapsed

Benchmark longMnistBench: FINISH
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information