replace quadratic nub to fight byte code gen perf explosion
Despite this code having been present in the core-to-bytecode implementation, I have observed it in the wild starting with 9.2, causing enormous slowdown in certain situations.
My test case produces the following profiles:
Before:
total time = 559.77 secs (559766 ticks @ 1000 us, 1 processor)
total alloc = 513,985,665,640 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc ticks bytes
elem_by Data.OldList libraries/base/Data/OldList.hs:429:1-7 67.6 92.9 378282 477447404296
$c>>= GHC.Data.IOEnv <no location info> 6.9 0.6 38475 3020371232
After:
total time = 89.83 secs (89833 ticks @ 1000 us, 1 processor)
total alloc = 39,365,306,360 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc ticks bytes
$c>>= GHC.Data.IOEnv <no location info> 43.6 7.7 39156 3020403424
doCase GHC.StgToByteCode compiler/GHC/StgToByteCode.hs:(805,1)-(1054,53) 2.5 7.4 2246 2920777088
Native:
total time = 65.99 secs (65994 ticks @ 1000 us, 1 processor)
total alloc = 24,728,309,600 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc ticks bytes
$c>>= GHC.Data.IOEnv <no location info> 59.1 12.2 38981 3019962176
I have a reproducer and some stats in this branch: https://gitlab.haskell.org/torsten.schmits/ghc/-/tree/torsten.schmits/compile-perf-regression-1
Please advise:
- Should I add some comments?
- Is it fine that I rewrote the surrounding code slightly?
- Should I add a test, and what should it look like? My reproducer is runnable from the
perf/compiler
suite in the other branch, not sure whether that's worth anything.
Edited by Torsten Schmits