Skip to content

A benchmark compiled with ghc-9.5.20220830 is 25% slower and allocates 87% more than with 9.2.4 (and 9.0.2)

This is another bonus benchmark that appeared when splitting #21715 (closed), with up to date repro instructions and known causes ruled out. In particular, this uses HEAD ghc-9.5.20220830 plus !7847 (closed) (and copious amounts of manual SPECIALIZE pragmas, despite -fexpose-all-unfoldings -fspecialise-aggressively) to rule out simple specialization problems.

To reproduce:

  1. git clone git@github.com:Mikolaj/horde-ad.git
  2. git checkout 9.4-vs-HEAD-alloc-slow
  3. cabal bench longProdBench -w ~/r/ghc/_build/stage1/bin/ghc --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n1 -m glob "5e7/grad_vec" +RTS -s'
  4. cabal bench longProdBench -w ghc-9.2.4 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n1 -m glob "5e7/grad_vec" +RTS -s'
  5. compare

Let me also attach the results I'm getting:

~/r/horde-ad$ cabal bench longProdBench -w ghc-9.0.2 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n1 -m glob "5e7/grad_vec" +RTS -s'
Benchmark longProdBench: RUNNING...
benchmarking 5e7/grad_vec
  35,844,466,832 bytes allocated in the heap
  30,562,600,528 bytes copied during GC
   9,429,216,480 bytes maximum residency (6 sample(s))
     215,469,824 bytes maximum slop
           17520 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        34 colls,     0 par    6.720s   6.724s     0.1978s    0.7041s
  Gen  1         6 colls,     0 par    8.194s   8.199s     1.3665s    4.3635s

  INIT    time    0.007s  (  0.007s elapsed)
  MUT     time    7.757s  (  7.761s elapsed)
  GC      time   14.914s  ( 14.923s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time   22.677s  ( 22.691s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    4,621,008,995 bytes per MUT second

  Productivity  34.2% of total user, 34.2% of total elapsed

Benchmark longProdBench: FINISH
~/r/horde-ad$ cabal bench longProdBench -w ghc-9.2.4 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n1 -m glob "5e7/grad_vec" +RTS -s'
Linking /home/mikolaj/r/horde-ad/dist-newstyle/build/x86_64-linux/ghc-9.2.4/horde-ad-0.1.0.0/b/longProdBench/build/longProdBench/longProdBench ...
Running 1 benchmarks...
Benchmark longProdBench: RUNNING...
benchmarking 5e7/grad_vec
  35,844,471,080 bytes allocated in the heap
  30,562,597,960 bytes copied during GC
   9,429,215,048 bytes maximum residency (6 sample(s))
     215,462,048 bytes maximum slop
           17521 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        34 colls,     0 par    6.924s   6.928s     0.2038s    0.7221s
  Gen  1         6 colls,     0 par    8.243s   8.249s     1.3749s    4.3277s

  INIT    time    0.008s  (  0.008s elapsed)
  MUT     time    7.520s  (  7.524s elapsed)
  GC      time   15.167s  ( 15.177s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time   22.695s  ( 22.710s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    4,766,633,282 bytes per MUT second

  Productivity  33.1% of total user, 33.1% of total elapsed

Benchmark longProdBench: FINISH
~/r/horde-ad$ cabal bench longProdBench -w ghc-9.4.2 --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n1 -m glob "5e7/grad_vec" +RTS -s'
Benchmark longProdBench: RUNNING...
benchmarking 5e7/grad_vec
  67,217,913,640 bytes allocated in the heap
  36,966,538,056 bytes copied during GC
  10,423,577,544 bytes maximum residency (7 sample(s))
     315,996,120 bytes maximum slop
           20504 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        72 colls,     0 par    7.635s   7.642s     0.1061s    0.6795s
  Gen  1         7 colls,     0 par   10.905s  10.915s     1.5592s    4.4384s

  INIT    time    0.007s  (  0.007s elapsed)
  MUT     time    9.688s  (  9.683s elapsed)
  GC      time   18.540s  ( 18.557s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time   28.236s  ( 28.247s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    6,938,350,873 bytes per MUT second

  Productivity  34.3% of total user, 34.3% of total elapsed

Benchmark longProdBench: FINISH

~/r/horde-ad$ cabal bench longProdBench -w ~/r/ghc/_build/stage1/bin/ghc --enable-optimization --constraint "vector < 0.13" --allow-newer --benchmark-options='-n1 -m glob "5e7/grad_vec" +RTS -s' 
Benchmark longProdBench: RUNNING...
benchmarking 5e7/grad_vec
  67,217,779,480 bytes allocated in the heap
  36,705,264,480 bytes copied during GC
  10,477,856,456 bytes maximum residency (7 sample(s))
     307,244,408 bytes maximum slop
           20535 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        72 colls,     0 par    7.681s   7.686s     0.1067s    0.6958s
  Gen  1         7 colls,     0 par   10.724s  10.740s     1.5342s    4.5879s

  INIT    time    0.008s  (  0.008s elapsed)
  MUT     time    9.912s  (  9.908s elapsed)
  GC      time   18.405s  ( 18.425s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time   28.325s  ( 28.341s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    6,781,208,825 bytes per MUT second

  Productivity  35.0% of total user, 35.0% of total elapsed

Benchmark longProdBench: FINISH
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information