GHC issueshttps://gitlab.haskell.org/ghc/ghc/-/issues2019-07-07T18:16:05Zhttps://gitlab.haskell.org/ghc/ghc/-/issues/14667Compiling a function with a lot of alternatives bottlenecks on insertIntHeap2019-07-07T18:16:05ZniteriaCompiling a function with a lot of alternatives bottlenecks on insertIntHeapI have a module that looks like this:
```
module A10000 where
data A = A
| A00001
| A00002
...
| A10000
f :: A -> Int
f A00001 = 19900001
f A00002 = 19900002
...
f A10000 = 19910000
```
Compiling with `-s` gives:
```
[1 of 1] ...I have a module that looks like this:
```
module A10000 where
data A = A
| A00001
| A00002
...
| A10000
f :: A -> Int
f A00001 = 19900001
f A00002 = 19900002
...
f A10000 = 19910000
```
Compiling with `-s` gives:
```
[1 of 1] Compiling A10000 ( A10000.hs, A10000.o )
A10000.hs:10004:1: warning:
Pattern match checker exceeded (2000000) iterations in
an equation for ‘f’. (Use -fmax-pmcheck-iterations=n
to set the maximun number of iterations to n)
|
10004 | f A00001 = 19900001
| ^^^^^^^^^^^^^^^^^^^...
95,068,628,968 bytes allocated in the heap
14,166,421,680 bytes copied during GC
312,247,488 bytes maximum residency (19 sample(s))
3,267,024 bytes maximum slop
857 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 35163 colls, 0 par 5.191s 5.170s 0.0001s 0.0724s
Gen 1 19 colls, 0 par 1.240s 1.236s 0.0650s 0.1872s
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.000s ( 0.000s elapsed)
MUT time 23.100s ( 23.341s elapsed)
GC time 6.431s ( 6.405s elapsed)
RP time 0.000s ( 0.000s elapsed)
PROF time 0.000s ( 0.000s elapsed)
EXIT time 0.000s ( 0.001s elapsed)
Total time 29.532s ( 29.748s elapsed)
```
With profiling enabled I get:
```
total time = 22.67 secs (22673 ticks @ 1000 us, 1 processor)
total alloc = 36,143,653,320 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc ticks bytes
insertIntHeap Hoopl.Dataflow compiler/cmm/Hoopl/Dataflow.hs:(450,1)-(454,38) 73.9 77.5 16746 28001680176
SimplTopBinds SimplCore compiler/simplCore/SimplCore.hs:770:39-74 9.4 1.8 2136 647116224
deSugar HscMain compiler/main/HscMain.hs:511:7-44 2.6 4.2 599 1530056552
pprNativeCode AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65) 2.3 2.9 524 1030493864
StgCmm HscMain compiler/main/HscMain.hs:(1426,13)-(1427,62) 1.3 1.4 288 497401376
RegAlloc-linear AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55) 0.9 1.1 201 383789200
tc_rn_src_decls TcRnDriver compiler/typecheck/TcRnDriver.hs:(494,4)-(556,7) 0.8 1.2 176 441973152
Parser HscMain compiler/main/HscMain.hs:(316,5)-(384,20) 0.5 1.1 113 411448880
```
After a local patch that basically reverts https://phabricator.haskell.org/rGHC5a1a2633553 I get:
```
[1 of 1] Compiling A10000 ( A10000.hs, A10000.o )
A10000.hs:10004:1: warning:
Pattern match checker exceeded (2000000) iterations in
an equation for ‘f’. (Use -fmax-pmcheck-iterations=n
to set the maximun number of iterations to n)
|
10004 | f A00001 = 19900001
| ^^^^^^^^^^^^^^^^^^^...
15,162,268,144 bytes allocated in the heap
4,870,184,600 bytes copied during GC
323,794,936 bytes maximum residency (19 sample(s))
3,074,056 bytes maximum slop
886 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 811 colls, 0 par 1.828s 1.821s 0.0022s 0.0770s
Gen 1 19 colls, 0 par 1.217s 1.213s 0.0638s 0.1820s
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.000s ( 0.000s elapsed)
MUT time 5.842s ( 6.144s elapsed)
GC time 3.046s ( 3.034s elapsed)
RP time 0.000s ( 0.000s elapsed)
PROF time 0.000s ( 0.000s elapsed)
EXIT time 0.000s ( 0.001s elapsed)
Total time 8.888s ( 9.179s elapsed)
```
For a file of smaller size with 1000 constructors, it still gives a 10% win.
This example is artificial, but it looks like something that someone could write for a sparse enum that looks like this in C++:
```
enum access_t { read = 1, write = 2, exec = 4 };
```
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Compiling a function with a lot of alternatives bottlenecks on insertIntHeap","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"I have a module that looks like this:\r\n{{{\r\nmodule A10000 where\r\n\r\ndata A = A\r\n | A00001\r\n | A00002\r\n...\r\n | A10000\r\n\r\nf :: A -> Int\r\nf A00001 = 19900001\r\nf A00002 = 19900002\r\n...\r\nf A10000 = 19910000\r\n}}}\r\n\r\nCompiling with `-s` gives:\r\n{{{\r\n[1 of 1] Compiling A10000 ( A10000.hs, A10000.o ) \r\n \r\nA10000.hs:10004:1: warning: \r\n Pattern match checker exceeded (2000000) iterations in \r\n an equation for ‘f’. (Use -fmax-pmcheck-iterations=n \r\n to set the maximun number of iterations to n) \r\n |\r\n10004 | f A00001 = 19900001\r\n | ^^^^^^^^^^^^^^^^^^^...\r\n 95,068,628,968 bytes allocated in the heap\r\n 14,166,421,680 bytes copied during GC\r\n 312,247,488 bytes maximum residency (19 sample(s))\r\n 3,267,024 bytes maximum slop\r\n 857 MB total memory in use (0 MB lost due to fragmentation)\r\n\r\n Tot time (elapsed) Avg pause Max pause\r\n Gen 0 35163 colls, 0 par 5.191s 5.170s 0.0001s 0.0724s\r\n Gen 1 19 colls, 0 par 1.240s 1.236s 0.0650s 0.1872s\r\n\r\n TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)\r\n\r\n SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)\r\n\r\n INIT time 0.000s ( 0.000s elapsed)\r\n MUT time 23.100s ( 23.341s elapsed)\r\n GC time 6.431s ( 6.405s elapsed)\r\n RP time 0.000s ( 0.000s elapsed)\r\n PROF time 0.000s ( 0.000s elapsed)\r\n EXIT time 0.000s ( 0.001s elapsed)\r\n Total time 29.532s ( 29.748s elapsed)\r\n}}}\r\n\r\nWith profiling enabled I get:\r\n{{{\r\n total time = 22.67 secs (22673 ticks @ 1000 us, 1 processor) \r\n total alloc = 36,143,653,320 bytes (excludes profiling overheads) \r\n \r\nCOST CENTRE MODULE SRC %time %alloc ticks bytes \r\n \r\ninsertIntHeap Hoopl.Dataflow compiler/cmm/Hoopl/Dataflow.hs:(450,1)-(454,38) 73.9 77.5 16746 28001680176 \r\nSimplTopBinds SimplCore compiler/simplCore/SimplCore.hs:770:39-74 9.4 1.8 2136 647116224 \r\ndeSugar HscMain compiler/main/HscMain.hs:511:7-44 2.6 4.2 599 1530056552 \r\npprNativeCode AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65) 2.3 2.9 524 1030493864 \r\nStgCmm HscMain compiler/main/HscMain.hs:(1426,13)-(1427,62) 1.3 1.4 288 497401376 \r\nRegAlloc-linear AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55) 0.9 1.1 201 383789200 \r\ntc_rn_src_decls TcRnDriver compiler/typecheck/TcRnDriver.hs:(494,4)-(556,7) 0.8 1.2 176 441973152 \r\nParser HscMain compiler/main/HscMain.hs:(316,5)-(384,20) 0.5 1.1 113 411448880 \r\n}}}\r\n\r\n\r\nAfter a local patch that basically reverts https://phabricator.haskell.org/rGHC5a1a2633553 I get:\r\n{{{\r\n[1 of 1] Compiling A10000 ( A10000.hs, A10000.o ) \r\n \r\nA10000.hs:10004:1: warning: \r\n Pattern match checker exceeded (2000000) iterations in \r\n an equation for ‘f’. (Use -fmax-pmcheck-iterations=n\r\n to set the maximun number of iterations to n)\r\n |\r\n10004 | f A00001 = 19900001\r\n | ^^^^^^^^^^^^^^^^^^^...\r\n 15,162,268,144 bytes allocated in the heap\r\n 4,870,184,600 bytes copied during GC\r\n 323,794,936 bytes maximum residency (19 sample(s))\r\n 3,074,056 bytes maximum slop\r\n 886 MB total memory in use (0 MB lost due to fragmentation)\r\n\r\n Tot time (elapsed) Avg pause Max pause\r\n Gen 0 811 colls, 0 par 1.828s 1.821s 0.0022s 0.0770s\r\n Gen 1 19 colls, 0 par 1.217s 1.213s 0.0638s 0.1820s\r\n\r\n TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)\r\n\r\n SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)\r\n\r\n INIT time 0.000s ( 0.000s elapsed)\r\n MUT time 5.842s ( 6.144s elapsed)\r\n GC time 3.046s ( 3.034s elapsed)\r\n RP time 0.000s ( 0.000s elapsed)\r\n PROF time 0.000s ( 0.000s elapsed)\r\n EXIT time 0.000s ( 0.001s elapsed)\r\n Total time 8.888s ( 9.179s elapsed)\r\n}}}\r\n\r\nFor a file of smaller size with 1000 constructors, it still gives a 10% win.\r\n\r\nThis example is artificial, but it looks like something that someone could write for a sparse enum that looks like this in C++:\r\n{{{\r\nenum access_t { read = 1, write = 2, exec = 4 };\r\n}}}","type_of_failure":"OtherFailure","blocking":[]} -->8.6.1niterianiteria