Skip to content

replace quadratic nub to fight byte code gen perf explosion

Despite this code having been present in the core-to-bytecode implementation, I have observed it in the wild starting with 9.2, causing enormous slowdown in certain situations.

My test case produces the following profiles:

Before:

total time  =      559.77 secs   (559766 ticks @ 1000 us, 1 processor)
total alloc = 513,985,665,640 bytes  (excludes profiling overheads)

COST CENTRE MODULE         SRC                                         %time %alloc  ticks     bytes

elem_by     Data.OldList   libraries/base/Data/OldList.hs:429:1-7       67.6   92.9  378282 477447404296
$c>>=       GHC.Data.IOEnv <no location info>                            6.9    0.6  38475 3020371232

After:

total time  =       89.83 secs   (89833 ticks @ 1000 us, 1 processor)
total alloc = 39,365,306,360 bytes  (excludes profiling overheads)

COST CENTRE           MODULE                SRC                                                                  %time %alloc  ticks     bytes

$c>>=                 GHC.Data.IOEnv        <no location info>                                                    43.6    7.7  39156 3020403424
doCase                GHC.StgToByteCode     compiler/GHC/StgToByteCode.hs:(805,1)-(1054,53)                        2.5    7.4   2246 2920777088

Native:

total time  =       65.99 secs   (65994 ticks @ 1000 us, 1 processor)
total alloc = 24,728,309,600 bytes  (excludes profiling overheads)

COST CENTRE           MODULE                  SRC                                       %time %alloc  ticks     bytes

$c>>=                 GHC.Data.IOEnv          <no location info>                         59.1   12.2  38981 3019962176

I have a reproducer and some stats in this branch: https://gitlab.haskell.org/torsten.schmits/ghc/-/tree/torsten.schmits/compile-perf-regression-1

Please advise:

  • Should I add some comments?
  • Is it fine that I rewrote the surrounding code slightly?
  • Should I add a test, and what should it look like? My reproducer is runnable from the perf/compiler suite in the other branch, not sure whether that's worth anything.
Edited by Torsten Schmits

Merge request reports