Performance drop 60 times on non-profiling binary
I have a rather large application that I have spent quite some time on tuning it as performance is just too bad. I have come to a point where the profiler reports about 14 seconds on a particular example, which is about 5x slower than I would like to see.
However, building the same binary without profiling ability results in that the example takes 14 minutes. If I just touch a file and build with profiling information, but not giving it any profiling related RTS options, it takes about 30 seconds (not 14 seconds, but that is probably due to profiling overhead being there).
How can performance drop 60x when I basically just relink it?
From what I can see using "top" and memory profiling, memory consumption is quite stable over time.