Compiling a function with a lot of alternatives bottlenecks on insertIntHeap
I have a module that looks like this:
module A10000 where
data A = A
| A00001
| A00002
...
| A10000
f :: A -> Int
f A00001 = 19900001
f A00002 = 19900002
...
f A10000 = 19910000
Compiling with -s
gives:
[1 of 1] Compiling A10000 ( A10000.hs, A10000.o )
A10000.hs:10004:1: warning:
Pattern match checker exceeded (2000000) iterations in
an equation for ‘f’. (Use -fmax-pmcheck-iterations=n
to set the maximun number of iterations to n)
|
10004 | f A00001 = 19900001
| ^^^^^^^^^^^^^^^^^^^...
95,068,628,968 bytes allocated in the heap
14,166,421,680 bytes copied during GC
312,247,488 bytes maximum residency (19 sample(s))
3,267,024 bytes maximum slop
857 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 35163 colls, 0 par 5.191s 5.170s 0.0001s 0.0724s
Gen 1 19 colls, 0 par 1.240s 1.236s 0.0650s 0.1872s
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.000s ( 0.000s elapsed)
MUT time 23.100s ( 23.341s elapsed)
GC time 6.431s ( 6.405s elapsed)
RP time 0.000s ( 0.000s elapsed)
PROF time 0.000s ( 0.000s elapsed)
EXIT time 0.000s ( 0.001s elapsed)
Total time 29.532s ( 29.748s elapsed)
With profiling enabled I get:
total time = 22.67 secs (22673 ticks @ 1000 us, 1 processor)
total alloc = 36,143,653,320 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc ticks bytes
insertIntHeap Hoopl.Dataflow compiler/cmm/Hoopl/Dataflow.hs:(450,1)-(454,38) 73.9 77.5 16746 28001680176
SimplTopBinds SimplCore compiler/simplCore/SimplCore.hs:770:39-74 9.4 1.8 2136 647116224
deSugar HscMain compiler/main/HscMain.hs:511:7-44 2.6 4.2 599 1530056552
pprNativeCode AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65) 2.3 2.9 524 1030493864
StgCmm HscMain compiler/main/HscMain.hs:(1426,13)-(1427,62) 1.3 1.4 288 497401376
RegAlloc-linear AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55) 0.9 1.1 201 383789200
tc_rn_src_decls TcRnDriver compiler/typecheck/TcRnDriver.hs:(494,4)-(556,7) 0.8 1.2 176 441973152
Parser HscMain compiler/main/HscMain.hs:(316,5)-(384,20) 0.5 1.1 113 411448880
After a local patch that basically reverts https://phabricator.haskell.org/rGHC5a1a2633553 I get:
[1 of 1] Compiling A10000 ( A10000.hs, A10000.o )
A10000.hs:10004:1: warning:
Pattern match checker exceeded (2000000) iterations in
an equation for ‘f’. (Use -fmax-pmcheck-iterations=n
to set the maximun number of iterations to n)
|
10004 | f A00001 = 19900001
| ^^^^^^^^^^^^^^^^^^^...
15,162,268,144 bytes allocated in the heap
4,870,184,600 bytes copied during GC
323,794,936 bytes maximum residency (19 sample(s))
3,074,056 bytes maximum slop
886 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 811 colls, 0 par 1.828s 1.821s 0.0022s 0.0770s
Gen 1 19 colls, 0 par 1.217s 1.213s 0.0638s 0.1820s
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.000s ( 0.000s elapsed)
MUT time 5.842s ( 6.144s elapsed)
GC time 3.046s ( 3.034s elapsed)
RP time 0.000s ( 0.000s elapsed)
PROF time 0.000s ( 0.000s elapsed)
EXIT time 0.000s ( 0.001s elapsed)
Total time 8.888s ( 9.179s elapsed)
For a file of smaller size with 1000 constructors, it still gives a 10% win.
This example is artificial, but it looks like something that someone could write for a sparse enum that looks like this in C++:
enum access_t { read = 1, write = 2, exec = 4 };
Trac metadata
Trac field | Value |
---|---|
Version | |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |