Skip to content

Compiling a function with a lot of alternatives bottlenecks on insertIntHeap

I have a module that looks like this:

module A10000 where

data A = A
  | A00001
  | A00002
...
  | A10000

f :: A -> Int
f A00001 = 19900001
f A00002 = 19900002
...
f A10000 = 19910000

Compiling with -s gives:

[1 of 1] Compiling A10000           ( A10000.hs, A10000.o )                                   
                                               
A10000.hs:10004:1: warning:                                                                   
    Pattern match checker exceeded (2000000) iterations in                                    
    an equation for ‘f’. (Use -fmax-pmcheck-iterations=n                                      
    to set the maximun number of iterations to n)                                                                                                                                            
      |
10004 | f A00001 = 19900001
      | ^^^^^^^^^^^^^^^^^^^...
  95,068,628,968 bytes allocated in the heap
  14,166,421,680 bytes copied during GC
     312,247,488 bytes maximum residency (19 sample(s))
       3,267,024 bytes maximum slop
             857 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     35163 colls,     0 par    5.191s   5.170s     0.0001s    0.0724s
  Gen  1        19 colls,     0 par    1.240s   1.236s     0.0650s    0.1872s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time   23.100s  ( 23.341s elapsed)
  GC      time    6.431s  (  6.405s elapsed)
  RP      time    0.000s  (  0.000s elapsed)
  PROF    time    0.000s  (  0.000s elapsed)
  EXIT    time    0.000s  (  0.001s elapsed)
  Total   time   29.532s  ( 29.748s elapsed)

With profiling enabled I get:

  total time  =       22.67 secs   (22673 ticks @ 1000 us, 1 processor)                                                                                                                      
  total alloc = 36,143,653,320 bytes  (excludes profiling overheads)                                                                                                                         
                                                                                                                                                                                             
COST CENTRE     MODULE         SRC                                                 %time %alloc  ticks     bytes                                                                             
                                                                                                                                                                                             
insertIntHeap   Hoopl.Dataflow compiler/cmm/Hoopl/Dataflow.hs:(450,1)-(454,38)      73.9   77.5  16746 28001680176                                                                           
SimplTopBinds   SimplCore      compiler/simplCore/SimplCore.hs:770:39-74             9.4    1.8   2136 647116224                                                                             
deSugar         HscMain        compiler/main/HscMain.hs:511:7-44                     2.6    4.2    599 1530056552                                                                            
pprNativeCode   AsmCodeGen     compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65)    2.3    2.9    524 1030493864                                                                            
StgCmm          HscMain        compiler/main/HscMain.hs:(1426,13)-(1427,62)          1.3    1.4    288 497401376                                                                             
RegAlloc-linear AsmCodeGen     compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55)    0.9    1.1    201 383789200                                                                             
tc_rn_src_decls TcRnDriver     compiler/typecheck/TcRnDriver.hs:(494,4)-(556,7)      0.8    1.2    176 441973152                                                                             
Parser          HscMain        compiler/main/HscMain.hs:(316,5)-(384,20)             0.5    1.1    113 411448880                                                                             

After a local patch that basically reverts https://phabricator.haskell.org/rGHC5a1a2633553 I get:

[1 of 1] Compiling A10000           ( A10000.hs, A10000.o )                                   
                                               
A10000.hs:10004:1: warning:                                                                   
    Pattern match checker exceeded (2000000) iterations in                                    
    an equation for ‘f’. (Use -fmax-pmcheck-iterations=n
    to set the maximun number of iterations to n)
      |
10004 | f A00001 = 19900001
      | ^^^^^^^^^^^^^^^^^^^...
  15,162,268,144 bytes allocated in the heap
   4,870,184,600 bytes copied during GC
     323,794,936 bytes maximum residency (19 sample(s))
       3,074,056 bytes maximum slop
             886 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       811 colls,     0 par    1.828s   1.821s     0.0022s    0.0770s
  Gen  1        19 colls,     0 par    1.217s   1.213s     0.0638s    0.1820s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    5.842s  (  6.144s elapsed)
  GC      time    3.046s  (  3.034s elapsed)
  RP      time    0.000s  (  0.000s elapsed)
  PROF    time    0.000s  (  0.000s elapsed)
  EXIT    time    0.000s  (  0.001s elapsed)
  Total   time    8.888s  (  9.179s elapsed)

For a file of smaller size with 1000 constructors, it still gives a 10% win.

This example is artificial, but it looks like something that someone could write for a sparse enum that looks like this in C++:

enum access_t { read = 1, write = 2, exec = 4 };
Trac metadata
Trac field Value
Version
Type Bug
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information