Reproducer for #14062 is 11% slower with GHC 9.0.1 than with 8.10.4
$ ghc-8.10.4 -O2 T14062.hs && ./T14062
Loaded package environment from /home/simon/.ghc/x86_64-linux-8.10.4/environments/default
[1 of 1] Compiling Main ( T14062.hs, T14062.o )
Linking T14062 ...
benchmarking monad transformers overhead/test1
time 45.52 ms (45.49 ms .. 45.57 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 45.60 ms (45.54 ms .. 45.76 ms)
std dev 200.0 μs (47.44 μs .. 352.6 μs)
benchmarking monad transformers overhead/test2
time 45.54 ms (45.49 ms .. 45.61 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 45.55 ms (45.52 ms .. 45.57 ms)
std dev 49.30 μs (40.24 μs .. 60.50 μs)
$ ghc-9.0.1 -O2 T14062.hs && ./T14062
Loaded package environment from /home/simon/.ghc/x86_64-linux-9.0.1/environments/default
[1 of 1] Compiling Main ( T14062.hs, T14062.o )
Linking T14062 ...
benchmarking monad transformers overhead/test1
time 50.57 ms (50.54 ms .. 50.61 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 50.66 ms (50.61 ms .. 50.90 ms)
std dev 177.3 μs (44.23 μs .. 318.0 μs)
benchmarking monad transformers overhead/test2
time 50.58 ms (50.55 ms .. 50.61 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 50.63 ms (50.59 ms .. 50.74 ms)
std dev 121.8 μs (36.08 μs .. 211.5 μs)
I'm using mtl-2.2.2, transformers-0.5.6.2 and criterion-1.5.9.0 in both cases.
Adding -fproc-alignment=64 and +RTS -A32m doesn't make the difference go away.
(-fproc-alignment=64 actually results in a warnings:
/usr/bin/ld.gold: warning: T14062.o: section .rodata.str contains incorrectly aligned strings; the alignment of those strings won't be preserved
)
Edited by Simon Jakobi