Skip to content

GHCi 2x slower without -keep-tmp-files

In D3562, I've observed that -keep-tmp-files makes :load 3x faster on my test case. I can't share my test case, but I've found a way to approximate it with MultiLayerModules I just added in D3575.

Here are the steps:

# in ghc top dir
$ mkdir tmp
$ cd tmp
$ cp ../testsuite/tests/perf/compiler/genMultiLayerModules .
# edit genMultiLayerModules to say DEPTH=0, WIDTH=5000
$ ./genMultiLayerModules 
$ echo ':load MultiLayerModules' | ../inplace/bin/ghc-stage2 --interactive +RTS -s
  11,132,224,952 bytes allocated in the heap
   1,004,238,408 bytes copied during GC
     185,091,216 bytes maximum residency (14 sample(s))
       2,813,504 bytes maximum slop
             365 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       706 colls,     0 par    0.907s   0.906s     0.0013s    0.0125s
  Gen  1        14 colls,     0 par    0.607s   0.606s     0.0433s    0.2244s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.000s elapsed)
  MUT     time   20.219s  ( 20.493s elapsed)
  GC      time    1.514s  (  1.513s elapsed)
  EXIT    time    0.000s  (  0.005s elapsed)
  Total   time   21.733s  ( 22.010s elapsed)

  Alloc rate    550,585,275 bytes per MUT second

  Productivity  93.0% of total user, 93.1% of total elapsed
$ echo ':load MultiLayerModules' | ../inplace/bin/ghc-stage2 --interactive -keep-tmp-files +RTS -s
   4,603,831,672 bytes allocated in the heap
     971,623,904 bytes copied during GC
     184,019,808 bytes maximum residency (14 sample(s))
       2,262,680 bytes maximum slop
             365 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       448 colls,     0 par    0.724s   0.723s     0.0016s    0.0321s
  Gen  1        14 colls,     0 par    0.621s   0.620s     0.0443s    0.2242s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.000s elapsed)
  MUT     time    7.966s  (  8.202s elapsed)
  GC      time    1.345s  (  1.344s elapsed)
  EXIT    time    0.000s  (  0.004s elapsed)
  Total   time    9.312s  (  9.550s elapsed)

  Alloc rate    577,938,762 bytes per MUT second

  Productivity  85.5% of total user, 85.9% of total elapsed

So it's 2x slower and allocates 2.5x more.

Profiling pointed to https://phabricator.haskell.org/diffusion/GHC/browse/master/compiler/main/SysTools.hs;8bf50d5026f92eb5a6768eb2ac38479802da1411$1074 We're creating dont_delete_set a lot.

Looks like this was improved in D3111 recently.

Trac metadata
Trac field Value
Version 8.3
Type Task
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component GHCi
Test case
Differential revisions
BlockedBy
Related
Blocking
CC dfeuer
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information