Improve interaction of exitification with spec constr
Summary
Exitification pass happens before spec constr pass. It moves exit path code out of a recursive function. However, this impacts the efficacy of the later spec constr pass.
See https://github.com/composewell/streamly/issues/703 for details.
Steps to reproduce
git clone https://github.com/composewell/streamly.git
git checkout ghc-spec-constr-keen
cabal build streamly
ghc -ddump-simpl -ddump-to-file -dsuppress-all -O2 -fmax-worker-args=16 -fspec-constr-recursive=16 -funfolding-use-threshold=600 fold-bench.hs
- time ./fold-bench +RTS -s
- Examine fold-bench.dump-simpl look for
W8#
.
We can see a W8# being passed to the join point $wstep3_s5F2:
jump $wstep3_s5F2
sc9_s5W3
sc8_s5W4
(PlainPtr sc7_s5W5)
sc6_s5W6
()
(W8# ipv8_a4F5)
sc_s5Wc
This join point then passes it to an exit point:
1# ->
jump exit1_X19
ww_s5EQ ww1_s5EV ww2_s5EW ww3_s5EX ww4_s5F0 w_s5EG w1_s5EH
The exit point examines this W8#:
exit1_X19 ww_s5EQ
ww1_s5EV
ww2_s5EW
ww3_s5EX
ww4_s5F0
w_s5EG
w1_s5EH
= case w_s5EG of { W8# x_a4tc ->
The W8# constructor is not removed by the spec-constr pass because the W8# constructor is not being examined by $wstep3_s5F2
. The code examining the W8# has been moved to the exit point by the exitification pass which happens before the spec-constr pass. So it becomes opaque to the spec-constr pass and not removed.
We can use -fspec-constr-keen
option in the compilation step above and see that the W8# is removed by it. However, using that option creates regressions in other benchmarks, it is not always good.
Another option is to do exitification pass after the spec-constr pass. See this note. We would not have this issue if we use the option D suggested in this note i.e. put exitification before the final simplifier pass. However, this is also not the best possible option because exitify before spec-constr + -fspec-constr-keen generates much better code and has 2x better performance.
We can checkout the branch ghc-spec-constr-before-exitify
and repeat the steps 3-6 above to see that the W8# constructor is gone when exitify is done after spec-constr.
Expected behavior
The point of this issue is to explore if we can keep exitify before spec-constr but do what -fspec-constr-keen does automatically without enabling it always but only surgically in cases like this. Can we make spec-constr keen through exit join points only?
The effect of removing the W8# in the above examples is 4x improvement because it in the fast path, called for every element of the stream. Depending on the use case we can get very significant gains if we can do spec-constr smartly to get the best results.
Environment
- GHC version used:
8.10.2 branch
Optional:
- Operating System: macOS
- System Architecture: x86_64