app runs 10 times faster when compiled with profilling information than without it
Hello, I've found interesting application code which when compiled with
-O2 -threaded --make
runs 10 times slower than the same code compiled with
-O2 -threaded --make -prof -auto-all -caf-all -fforce-recomp
the application is compiled by GHC 7.8.2 on i386-solaris2 platform. The profiling application is then run as normal application, i.e. without +RTS -p. The application itself is my crude beginner Haskell code which takes Wikipedia dump (XML file) and tries wiki pages from there import into XWiki using REST API. It uses Data.Text package for lazy load of Wikipedia data, whole logic of selection of page/title/content is done on Data.Text and later saved into XPage which is using ByteString. The application also uses Simon Marlow's Async/async/wait code which is copied from Simon Marlow's Parallel and Concurrent Programming in Haskell http://chimera.labs.oreilly.com/books/1230000000929 App attached.
My question is, what to do with it to help finding the issue? E.g. I assume it's issue when profiling app is that much faster than common optimized app. For your information, optimized app takes 9m50s to upload 1000 pages while profiled optimized app takes only 50s to upload the same amount of pages on my equipped with E5-2620 (2 GHz 6 core/12 threads xeon) Also the difference in time is that big that it's not caused by a noise in import of pages on XWiki side. I've tested several times of course.