Performance regression due to stream fusion issue in GHC-9
Summary
See #19790 (closed) for background. This is likely to be a different root cause so raising another issue for this.
I updated the repo (https://github.com/composewell/streamly-ghc9-regression) with reproduction code for another regression. This time it is in the postscan
operation.
Steps to reproduce
You can pull the repo (master branch or postscan branch) and use the following command to build it:
$ ghc --make -O2 -fspec-constr-recursive=4 -ddump-to-file -ddump-simpl Main.hs
$ ./Main +RTS -s
If we build the code with ghc-8.10 vs ghc-9.3+!5658 (closed) we can see that allocations in ghc-8.10 core are much less because the code fuses. In the core generated by ghc-9.3 we can see the Yield/Skip/Stop constructors in the code because it does not fuse.
Expected behavior
GHC-9 should produce code as efficient as GHC-8.
Environment
- GHC version used: ghc-9.3+!5658 (closed)