Consider generating Core directly when deriving `Generic` instances
Deriving the Generic type class is often slow. There are many open issues related to this. For example: #5642, #9557, #16577 & #19204.
One interesting approach to speeding this up is to generate Core directly. Doing so would bypass the renamer, typechecker, and desugarer. Is this an avenue worth exploring?
I made !15818 as a proof of concept for this approach. The results are promising:
| N | Type | -O |
Alloc (MB, old) | Alloc (MB, new) | Time (s, old) | Time (s, new) |
|---|---|---|---|---|---|---|
| 1 | fields | 0 | 36 | 34 | 0.121s | 0.122s |
| 10 | fields | 0 | 46 | 42 | 0.127s | 0.127s |
| 100 | fields | 0 | 168 | 151 | 0.180s | 0.169s |
| 1000 | fields | 0 | 3,547 | 3,387 | 1.482s | 1.389s |
| 1 | fields | 2 | 42 | 41 | 0.119s | 0.125s |
| 10 | fields | 2 | 65 | 63 | 0.132s | 0.124s |
| 100 | fields | 2 | 356 | 342 | 0.244s | 0.238s |
| 1000 | fields | 2 | 7,058 | 6,913 | 2.398s | 2.287s |
| 1 | ctors | 0 | 36 | 34 | 0.123s | 0.118s |
| 10 | ctors | 0 | 54 | 50 | 0.132s | 0.125s |
| 100 | ctors | 0 | 313 | 277 | 0.337s | 0.261s |
| 1000 | ctors | 0 | 8,212 | 8,071 | 12.012s | 7.928s |
| 1 | ctors | 2 | 41 | 40 | 0.123s | 0.116s |
| 10 | ctors | 2 | 86 | 82 | 0.133s | 0.139s |
| 100 | ctors | 2 | 477 | 444 | 0.435s | 0.360s |
| 1000 | ctors | 2 | 9,332 | 9,285 | 17.387s | 12.080s |
Those benchmarks compare the traditional HsSyn derivation path against the direct Core generation path (-fdirect-core-generic-deriving). Each benchmark compiles a single module containing one data type with the specified number of fields or constructors, deriving Generic.
All times are elapsed (wall clock). Measured on AArch64 Linux with a stage1 GHC. Exact benchmark is here: https://gist.github.com/tfausak/3a17bb415a836612ca8f070b146785bf.