GHCi 2x slower without -keep-tmp-files
In D3562, I've observed that -keep-tmp-files makes :load 3x faster on my test case.
I can't share my test case, but I've found a way to approximate it with MultiLayerModules
I just added in D3575.
Here are the steps:
# in ghc top dir
$ mkdir tmp
$ cd tmp
$ cp ../testsuite/tests/perf/compiler/genMultiLayerModules .
# edit genMultiLayerModules to say DEPTH=0, WIDTH=5000
$ ./genMultiLayerModules
$ echo ':load MultiLayerModules' | ../inplace/bin/ghc-stage2 --interactive +RTS -s
11,132,224,952 bytes allocated in the heap
1,004,238,408 bytes copied during GC
185,091,216 bytes maximum residency (14 sample(s))
2,813,504 bytes maximum slop
365 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 706 colls, 0 par 0.907s 0.906s 0.0013s 0.0125s
Gen 1 14 colls, 0 par 0.607s 0.606s 0.0433s 0.2244s
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.001s ( 0.000s elapsed)
MUT time 20.219s ( 20.493s elapsed)
GC time 1.514s ( 1.513s elapsed)
EXIT time 0.000s ( 0.005s elapsed)
Total time 21.733s ( 22.010s elapsed)
Alloc rate 550,585,275 bytes per MUT second
Productivity 93.0% of total user, 93.1% of total elapsed
$ echo ':load MultiLayerModules' | ../inplace/bin/ghc-stage2 --interactive -keep-tmp-files +RTS -s
4,603,831,672 bytes allocated in the heap
971,623,904 bytes copied during GC
184,019,808 bytes maximum residency (14 sample(s))
2,262,680 bytes maximum slop
365 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 448 colls, 0 par 0.724s 0.723s 0.0016s 0.0321s
Gen 1 14 colls, 0 par 0.621s 0.620s 0.0443s 0.2242s
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.001s ( 0.000s elapsed)
MUT time 7.966s ( 8.202s elapsed)
GC time 1.345s ( 1.344s elapsed)
EXIT time 0.000s ( 0.004s elapsed)
Total time 9.312s ( 9.550s elapsed)
Alloc rate 577,938,762 bytes per MUT second
Productivity 85.5% of total user, 85.9% of total elapsed
So it's 2x slower and allocates 2.5x more.
Profiling pointed to https://phabricator.haskell.org/diffusion/GHC/browse/master/compiler/main/SysTools.hs;8bf50d5026f92eb5a6768eb2ac38479802da1411$1074
We're creating dont_delete_set
a lot.
Looks like this was improved in D3111 recently.
Trac metadata
Trac field | Value |
---|---|
Version | 8.3 |
Type | Task |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | GHCi |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | dfeuer |
Operating system | |
Architecture |