Draft: StgToByteCode: Update case expr BCO bitmap incrementally
As a followup to !8399 (closed), this change avoids the replication of previous work when generating the BCO bitmap for case expressions while descending into the call graph.
Whenever a new stack offset is added to the state as previously, the new implementation shifts the incremental bitmap as well, causing significant performance increase by avoiding the regeneration of the bitmap from the offset Map for each BCO.
See the haddock comment on BcEnv
for details.
My new implementation is a bit obscure, so I'm a bit sceptical about its correctness. I also speculated a lot when writing the documentation. I would appreciate very much if someone were to spend some time validating my assumptions!
One thing I haven't been able to reproduce is the case in which the "sequel" is nonzero. Other than that, the testsuite runs fine and I have verified that the bitmap is identical to the old implementation for all of the ghci tests.
The branch https://gitlab.haskell.org/torsten.schmits/ghc/-/tree/torsten.schmits/stg-to-bytecode-refactor contains my debugging setup that can toggle between the old and new implementation by setting the env var ENV_BITMAP
to nonzero.
The bench.sh
script prints compile times for both, primarily using the same test case as in the previous MR.
Some excerpts, using a recursive instance tree of depth N and width M, each step using C equality constraints:
M = 2; N = 2; C = 12
- old: 0.137s
- new: 0.124s
M = 8; N = 2; C = 12
- old: 6.827s
- new: 2.620s
M = 4; N = 4; C = 12
- old: 2.561s
- new: 1.421s
M = 5; N = 4; C = 12
- old: 12.813s
- new: 4.137s
- ghc 9.0: 4.735s
M = 5; N = 5; C = 6 (C = 12 causes stack depth overflow)
- old: 3m12.627s
- new: 50.288s
- ghc 9.0: 57.577s
This work is sponsored by Tweag