Profiled program runs 2.5x faster than non-profiled
I was looking at benchmarks game (attached as fasta.ghc-2.hs). I have found that with the flags give there, this program on GHC 8.2.1 runs in about a second with
-prof -fprof-auto and 2.5 seconds without!
To run without profiling:
ghc --make -Wall -fforce-recomp -fllvm -O2 -XBangPatterns -threaded -rtsopts -XOverloadedStrings fasta.ghc-2.hs -o fasta.ghc-2.ghc_run && ./fasta.ghc-2.ghc_run +RTS -N4 -s -RTS 250000 > /dev/null
Same program with profiling
ghc --make -fforce-recomp -fllvm -prof -fprof-auto -O2 -XBangPatterns -threaded -rtsopts -XOverloadedStrings fasta.ghc-2.hs -o fasta.ghc-2.ghc_run && ./fasta.ghc-2.ghc_run +RTS -N4 -p -s -RTS 250000 > /dev/null
I also attach Core outputs for both profiled and unprofiled version.
To me this seems very strange: profiled version is somehow faster. Perhaps what's worse is that this means that there's some optimisation GHC is performing when profiling is not on that makes the program a lot slower than it could be!
This program is not minimised.