Skip to content

Long compile time/high memory usage for modules with many deriving clauses

I've seen many different people complain about this in various different contexts. One of the most common I personally see is when using Persistent with a large number of entities, which results in multi-minute build times for that single module. Usually, these reports have come from private repositories, which makes for a difficult repro. Also, the usage of Template Haskell in those modules tends to confuse the issue.

So I'd like to report this issue from a completely separate project, with the guess that this represents an issue that is affecting many other users. Steps to reproduce:

  • cabal unpack fpco-api-1.2.0.1
  • cd fpco-api-1.2.0.1
  • cabal install
  • ghc --make -isrc/library src/library/FP/API/Types.hs
  • touch src/library/FP/API/Types.hs

Then, to see the time and memory usage of compiling just the one module, run:

  • time ghc -O2 --make -isrc/library src/library/FP/API/Types.hs +RTS -s

On my system (16GB RAM, quadcore i7), the results are:

$ time ghc -O2 --make -isrc/library src/library/FP/API/Types.hs +RTS -s
[3 of 3] Compiling FP.API.Types     ( src/library/FP/API/Types.hs, src/library/FP/API/Types.o )
  51,846,533,568 bytes allocated in the heap
   6,799,246,288 bytes copied during GC
     389,297,584 bytes maximum residency (22 sample(s))
      11,330,472 bytes maximum slop
            1041 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      2041 colls,     0 par    9.19s    9.19s     0.0045s    0.0998s
  Gen  1        22 colls,     0 par    4.33s    4.33s     0.1969s    0.5436s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   34.21s  ( 37.74s elapsed)
  GC      time   13.52s  ( 13.53s elapsed)
  EXIT    time    0.07s  (  0.07s elapsed)
  Total   time   47.81s  ( 51.34s elapsed)

  Alloc rate    1,515,662,937 bytes per MUT second

  Productivity  71.7% of total user, 66.8% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0

real    0m51.375s
user    0m50.864s
sys     0m0.456s

By contrast, with optimizations turned off:

$ time ghc -O0 --make -isrc/library src/library/FP/API/Types.hs +RTS -s
[3 of 3] Compiling FP.API.Types     ( src/library/FP/API/Types.hs, src/library/FP/API/Types.o )
  12,767,593,936 bytes allocated in the heap
   1,078,202,664 bytes copied during GC
     179,551,768 bytes maximum residency (13 sample(s))
       6,684,544 bytes maximum slop
             439 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0       623 colls,     0 par    1.24s    1.24s     0.0020s    0.0651s
  Gen  1        13 colls,     0 par    0.76s    0.76s     0.0586s    0.2397s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    8.28s  (  9.23s elapsed)
  GC      time    2.00s  (  2.00s elapsed)
  EXIT    time    0.05s  (  0.05s elapsed)
  Total   time   10.34s  ( 11.28s elapsed)

  Alloc rate    1,542,443,897 bytes per MUT second

  Productivity  80.6% of total user, 73.9% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0

real    0m11.304s
user    0m11.108s
sys     0m0.172s

and -O1:

$ time ghc -O1 --make -isrc/library src/library/FP/API/Types.hs +RTS -s
[3 of 3] Compiling FP.API.Types     ( src/library/FP/API/Types.hs, src/library/FP/API/Types.o )
  45,550,443,664 bytes allocated in the heap
   5,721,700,512 bytes copied during GC
     358,036,456 bytes maximum residency (21 sample(s))
       9,167,176 bytes maximum slop
             906 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1642 colls,     0 par    7.70s    7.70s     0.0047s    0.1031s
  Gen  1        21 colls,     0 par    3.68s    3.69s     0.1756s    0.4968s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   30.21s  ( 33.09s elapsed)
  GC      time   11.38s  ( 11.39s elapsed)
  EXIT    time    0.07s  (  0.07s elapsed)
  Total   time   41.68s  ( 44.55s elapsed)

  Alloc rate    1,507,571,106 bytes per MUT second

  Productivity  72.7% of total user, 68.0% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0

real    0m44.577s
user    0m44.164s
sys     0m0.368s

Note that this module is essentially just 900 lines of data declarations, with a large number of derived classes, and a few manually written helper functions and instances.

This high memory usage hasn't just resulted in user complaints: automated Jenkins and Travis jobs will often times fail without optimizations disabled, which can be problematic for proper testing and production code. In the case of fpco-api, I've worked around this by adding -O0 to the cabal file, but it would be much nicer to not have to do that.

For some other examples of complaints along these lines:

I believe there are a few other threads discussing this, if it would be helpful. I did my testing on GHC 7.8.3 64-bit, Ubuntu 12.04.

Trac metadata
Trac field Value
Version 7.8.3
Type Bug
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information