Runtime Perf: Allocations increased by 15x in GHC-9 vs GHC-8
Summary
See #19790 (closed) and #19861 (closed) for background. @simonpj looked at those issues.
I updated the repo (https://github.com/composewell/streamly-ghc9-regression) with reproduction code for another regression. This time it is in the splitOnSeq
operation, a bit more complicated repro code than the previous ones.
Steps to reproduce
You can pull the repo (splitOnSeq
branch which is also the same as master
branch as of now) and use the following command to build it:
$ cabal build --with-compiler <compiler>
This will build the Main
executable and also produce .dump-simpl files which can be found in the dist-newstyle
directory tree.
After doing this step you can also build it directly using ghc
but make sure that you are providing the same optimization options as in the cabal file. For example:
$ ghc --make -O2 -ddump-to-file -ddump-simpl Main.hs
I have uploaded the core generated by ghc-8 and ghc-9 for quick review.
To run the program, first generate a 100MB input file by running this script:
$ ./mkInput.sh
Then run it:
$ time ./Main +RTS -s
The allocations with GHC-8 are 123MB while with GHC-9 around 2GB. The code processes each byte in the file in a loop to find the given substring. I have not yet been able to figure out what is it that is being allocated in ghc-9, it may be some parameter in the loop that is being passed boxed in ghc-9?
Let me know if I can help in any further investigation.
Expected behavior
GHC-9 should produce code as efficient as GHC-8.
Environment
- GHC version used: ghc-9.3+!5658 (closed)