Some bytestring benchmarks got slower in 9.2.0.20210331
Reproduction
- Checkout
bytestring
:
git clone https://github.com/haskell/bytestring
# git checkout ba3046d9f5bf5cac1246c39f519b8be7ac11f49c
Current master
is very close to the released bytestring-0.11.1.0
, which is bundled with GHC 9.2, but includes some benchmark fixes.
- Run benchmarks with GHC 9.0.1 (it will take around five minutes):
cabal bench -w ghc-9.0.1 --benchmark-options '--csv 9.0.1.csv --stdev 2 --hide-successes'
- Compare againt benchmarks with GHC 9.2.0.20210331 (it will take around five minutes):
cabal bench -w ghc-9.2.0.20210331 --allow-newer='tagged:template-haskell,splitmix:base' --benchmark-options '--baseline 9.0.1.csv --csv 9.2.0.csv --stdev 2 --fail-if-slower 10 --hide-successes'
Thanks to a featherlight dependency graph, tasty-bench
-based benchmark suites can already be compiled by GHC 9.2.
Expected results
No benchmarks became slower.
Actual results
All
Data.ByteString.Builder
Small payload
intHost 1: FAIL (4.99s)
540 ns ± 12 ns, 42% slower than baseline
UTF-8 String (naive): FAIL (5.25s)
577 ns ± 22 ns, 26% slower than baseline
UTF-8 String: FAIL (38.86s)
568 ns ± 6.8 ns, 22% slower than baseline
String (naive): FAIL (5.34s)
590 ns ± 24 ns, 27% slower than baseline
String: FAIL (5.22s)
579 ns ± 17 ns, 30% slower than baseline
Data.ByteString.Builder.Prim.ASCII
int32Dec (10000): FAIL (0.65s)
149 μs ± 4.0 μs, 12% slower than baseline
int8HexFixed (10000): FAIL (1.11s)
16 μs ± 584 ns, 12% slower than baseline
words
lorem ipsum: FAIL (0.92s)
3.4 μs ± 62 ns, 13% slower than baseline
folds
foldl'
1: FAIL (1.67s)
24 ns ± 226 ps, 12% slower than baseline
2: FAIL (1.11s)
32 ns ± 690 ps, 11% slower than baseline
4: FAIL (0.43s)
48 ns ± 1.4 ns, 15% slower than baseline
8: FAIL (0.74s)
84 ns ± 1.3 ns, 15% slower than baseline
64: FAIL (0.62s)
568 ns ± 12 ns, 13% slower than baseline
128: FAIL (0.59s)
1.1 μs ± 37 ns, 13% slower than baseline
256: FAIL (0.60s)
2.2 μs ± 65 ns, 14% slower than baseline
512: FAIL (1.17s)
4.3 μs ± 69 ns, 14% slower than baseline
1024: FAIL (0.32s)
8.9 μs ± 334 ns, 16% slower than baseline
2048: FAIL (0.60s)
18 μs ± 426 ns, 16% slower than baseline
4096: FAIL (0.59s)
35 μs ± 900 ns, 14% slower than baseline
8192: FAIL (0.59s)
70 μs ± 2.5 μs, 15% slower than baseline
16384: FAIL (0.60s)
142 μs ± 5.3 μs, 16% slower than baseline
32768: FAIL (0.59s)
282 μs ± 7.1 μs, 15% slower than baseline
65536: FAIL (4.71s)
571 μs ± 15 μs, 17% slower than baseline
foldr'
2: FAIL (0.57s)
32 ns ± 808 ps, 11% slower than baseline
4: FAIL (3.36s)
48 ns ± 934 ps, 13% slower than baseline
8: FAIL (11.36s)
84 ns ± 1.1 ns, 18% slower than baseline
64: FAIL (0.33s)
567 ns ± 21 ns, 12% slower than baseline
128: FAIL (1.18s)
1.1 μs ± 20 ns, 13% slower than baseline
256: FAIL (0.32s)
2.2 μs ± 85 ns, 13% slower than baseline
512: FAIL (0.61s)
4.4 μs ± 101 ns, 12% slower than baseline
1024: FAIL (0.60s)
8.8 μs ± 211 ns, 11% slower than baseline
2048: FAIL (0.60s)
18 μs ± 366 ns, 14% slower than baseline
4096: FAIL (0.61s)
36 μs ± 978 ns, 16% slower than baseline
8192: FAIL (0.32s)
71 μs ± 2.7 μs, 17% slower than baseline
16384: FAIL (1.19s)
142 μs ± 3.7 μs, 17% slower than baseline
32768: FAIL (1.18s)
281 μs ± 7.2 μs, 15% slower than baseline
65536: FAIL (0.60s)
569 μs ± 22 μs, 16% slower than baseline
37 out of 263 tests failed (371.06s)
On a brighter side, many other benchmarks got 15-20% faster, which is great!
CC @sjakobi