Improve compile-time performance by wiring in constraint tuples
While inspecting the -ddump-if-trace
output for the T12150
perf-related test case recently, I noticed something unusual: it spends quite a bit of time loading constraint tuples from interface files:
Starting fork { Type synonym HasCallStack
Need decl for IP
Considering whether to load GHC.Classes {- SYSTEM -}
Reading interface for ghc-prim-0.7.0:GHC.Classes;
reason: Need decl for IP
readIFace /home/rgscott/Software/ghc5/libraries/ghc-prim/dist-install/build/GHC/Classes.hi
lookup_orig GHC.Classes C:(%%)
lookup_orig GHC.Classes C:(%%)
lookup_orig GHC.Classes C:(%,%)
lookup_orig GHC.Classes C:(%,%)
lookup_orig GHC.Classes $p1(%,%)
lookup_orig GHC.Classes $p2(%,%)
lookup_orig GHC.Classes C:(%,,%)
lookup_orig GHC.Classes C:(%,,%)
lookup_orig GHC.Classes $p1(%,,%)
lookup_orig GHC.Classes $p2(%,,%)
lookup_orig GHC.Classes $p3(%,,%)
lookup_orig GHC.Classes C:(%,,,%)
lookup_orig GHC.Classes C:(%,,,%)
lookup_orig GHC.Classes $p1(%,,,%)
lookup_orig GHC.Classes $p2(%,,,%)
lookup_orig GHC.Classes $p3(%,,,%)
lookup_orig GHC.Classes $p4(%,,,%)
lookup_orig GHC.Classes C:(%,,,,%)
lookup_orig GHC.Classes C:(%,,,,%)
lookup_orig GHC.Classes $p1(%,,,,%)
lookup_orig GHC.Classes $p2(%,,,,%)
lookup_orig GHC.Classes $p3(%,,,,%)
lookup_orig GHC.Classes $p4(%,,,,%)
lookup_orig GHC.Classes $p5(%,,,,%)
lookup_orig GHC.Classes C:(%,,,,,%)
lookup_orig GHC.Classes C:(%,,,,,%)
lookup_orig GHC.Classes $p1(%,,,,,%)
lookup_orig GHC.Classes $p2(%,,,,,%)
lookup_orig GHC.Classes $p3(%,,,,,%)
lookup_orig GHC.Classes $p4(%,,,,,%)
lookup_orig GHC.Classes $p5(%,,,,,%)
lookup_orig GHC.Classes $p6(%,,,,,%)
...
This goes on and on until all constraint tuples (up to size 63!) are loaded from GHC.Classes
. What's especially silly about this is that T12150
doesn't make use of constraint tuples in any shape or form. Indeed, the only reason that it loads GHC.Classes
in the first place is that it needs HasCallStack
(which undefined
uses).
Why do we spend so much time loading constraint tuples and not, say, tuple data types? The difference is that unlike constraint tuples, the definitions of tuple data types are wired directly into GHC, so there is never any need to load them from an interface file. (Similarly for unboxed tuples and sums.) In contrast, constraint tuples are ordinary¹ class definitions in GHC.Classes
, so they must be read from interface files in order to be used.
This led me to wonder: what would happen if we wired in constraint tuples? Besides being consistent with every other form of tuple type, this could potentially save on compile times by not having to read all 63 constraint tuples any time GHC.Classes
is loaded. And indeed, an initial experiment where I wired constraint tuples into GHC proves to be promising. Here are the highlights:
Performance Metrics (test environment: "x86_64-linux-deb9-hadrian"):
Conversions(normal) runtime/bytes allocated 106936.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 106936.000 [unchanged, 0.0%]
DeriveNull(normal) runtime/bytes allocated 112050664.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 112050664.000 [unchanged, 0.0%]
InlineArrayAlloc(normal) runtime/bytes allocated 1600040824.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 1600040824.000 [unchanged, 0.0%]
InlineByteArrayAlloc(normal) runtime/bytes allocated 1440040824.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 1440040824.000 [unchanged, 0.0%]
InlineCloneArrayAlloc(normal) runtime/bytes allocated 1600040984.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 1600040984.000 [unchanged, 0.0%]
ManyAlternatives(normal) compile_time/bytes allocated 814495736.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 814123824.000 [unchanged, 0.0%]
ManyConstructors(normal) compile_time/bytes allocated 4508765656.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 4509342966.000 [unchanged, -0.0%]
MethSharing(normal) runtime/bytes allocated 480097840.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 480097840.000 [unchanged, 0.0%]
MethSharing(normal) runtime/peak_megabytes_allocated 2.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 2.000 [unchanged, 0.0%]
MultiLayerModules(normal) compile_time/bytes allocated 6040538408.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 6040307600.000 [unchanged, 0.0%]
Naperian(optasm) compile_time/bytes allocated 59667928.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 59661824.000 [unchanged, 0.0%]
PmSeriesG(normal) compile_time/bytes allocated 58765080.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 58537226.000 [unchanged, 0.4%]
PmSeriesS(normal) compile_time/bytes allocated 69952368.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 69729004.000 [unchanged, 0.3%]
PmSeriesT(normal) compile_time/bytes allocated 105161112.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 104938918.000 [unchanged, 0.2%]
PmSeriesV(normal) compile_time/bytes allocated 69577912.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 69356350.000 [unchanged, 0.3%]
T10359(normal) runtime/bytes allocated 386624.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 386624.000 [unchanged, 0.0%]
T10370(optasm) compile_time/max_bytes_used 37280544.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 37194228.000 [unchanged, 0.2%]
T10370(optasm) compile_time/peak_megabytes_allocated 95.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 95.000 [unchanged, 0.0%]
T10421(normal) compile_time/bytes allocated 131901496.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 139094078.000 [decreased, -5.2%]
T10421a(normal) compile_time/bytes allocated 93464592.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 93243070.000 [unchanged, 0.2%]
T10547(normal) compile_time/bytes allocated 35712024.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 35393536.000 [unchanged, 0.9%]
T10678(normal) runtime/bytes allocated 56041248.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 56041248.000 [unchanged, 0.0%]
T10858(normal) compile_time/bytes allocated 210712880.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 220915994.000 [unchanged, -4.6%]
T11195(normal) compile_time/bytes allocated 309500224.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 316596360.000 [unchanged, -2.2%]
T11276(normal) compile_time/bytes allocated 143694688.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 150763332.000 [unchanged, -4.7%]
T11303b(normal) compile_time/bytes allocated 52843264.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 52610232.000 [unchanged, 0.4%]
T11374(normal) compile_time/bytes allocated 240620784.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 250419846.000 [unchanged, -3.9%]
T11822(normal) compile_time/bytes allocated 163413360.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 173521824.000 [unchanged, -5.8%]
T12150(optasm) compile_time/bytes allocated 94472576.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 104170290.000 [decreased, -9.3%]
T12227(normal) compile_time/bytes allocated 565702792.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 579630592.000 [decreased, -2.4%]
T12234(optasm) compile_time/bytes allocated 67921648.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 77919432.000 [decreased, -12.8%]
T12425(optasm) compile_time/bytes allocated 120919864.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 130553642.000 [decreased, -7.4%]
T12545(normal) compile_time/bytes allocated 2145932096.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 2145618978.000 [unchanged, 0.0%]
T12707(normal) compile_time/bytes allocated 1080980600.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 1091269266.000 [unchanged, -0.9%]
T12791(normal) runtime/max_bytes_used 289520.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 289520.000 [unchanged, 0.0%]
T12791(normal) runtime/peak_megabytes_allocated 1.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 1.000 [unchanged, 0.0%]
T12990(normal) runtime/bytes allocated 14440840.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 14440840.000 [unchanged, 0.0%]
T12996(normal) runtime/bytes allocated 80368.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 80368.000 [unchanged, 0.0%]
T13001(normal) runtime/bytes allocated 50512.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 50512.000 [unchanged, 0.0%]
T13035(normal) compile_time/bytes allocated 118281496.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 118052926.000 [unchanged, 0.2%]
T13056(optasm) compile_time/bytes allocated 395366840.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 405335860.000 [decreased, -2.5%]
T13191(normal) runtime/bytes allocated 185942520.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 185942520.000 [unchanged, 0.0%]
T13218(normal) runtime/bytes allocated 82005352.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 82005352.000 [unchanged, 0.0%]
T13218(normal) runtime/max_bytes_used 332488.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 332488.000 [unchanged, 0.0%]
T13253(normal) compile_time/bytes allocated 440070432.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 447773280.000 [unchanged, -1.7%]
T13253-spj(normal) compile_time/bytes allocated 174177440.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 184627300.000 [decreased, -5.7%]
T13379(normal) compile_time/bytes allocated 408332776.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 408110848.000 [unchanged, 0.1%]
T13536a(optasm) runtime/bytes allocated 86536.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 86536.000 [unchanged, 0.0%]
T13623(normal) runtime/bytes allocated 50696.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 50696.000 [unchanged, 0.0%]
T13701(normal) compile_time/bytes allocated 2489097376.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 2488868578.000 [unchanged, 0.0%]
T13719(normal) compile_time/bytes allocated 4793166352.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 4793088316.000 [unchanged, 0.0%]
T14052(ghci) compile_time/bytes allocated 2233914384.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 2228372756.000 [unchanged, 0.2%]
T14683(normal) compile_time/bytes allocated 3518113920.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 3537749020.000 [unchanged, -0.6%]
T14697(normal) compile_time/bytes allocated 370871640.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 370629170.000 [unchanged, 0.1%]
T14936(normal) runtime/bytes allocated 51032.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 51032.000 [unchanged, 0.0%]
T14955(normal) runtime/bytes allocated 48050552.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 48050552.000 [unchanged, 0.0%]
T15164(normal) compile_time/bytes allocated 1906610184.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 1906361312.000 [unchanged, 0.0%]
T15185(normal) runtime/bytes allocated 41144.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 41144.000 [unchanged, 0.0%]
T15226(normal) runtime/bytes allocated 40848.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 40848.000 [unchanged, 0.0%]
T15226a(normal) runtime/bytes allocated 40848.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 40848.000 [unchanged, 0.0%]
T15263(normal) runtime/bytes allocated 1062200.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 1062200.000 [unchanged, 0.0%]
T15426(normal) runtime/bytes allocated 192041032.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 192041032.000 [unchanged, 0.0%]
T15578(normal) runtime/bytes allocated 800041128.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 800041128.000 [unchanged, 0.0%]
T15630(normal) compile_time/bytes allocated 193766296.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 193548364.000 [unchanged, 0.1%]
T15630(normal) compile_time/max_bytes_used 8958040.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 8835908.000 [unchanged, 1.4%]
T15630(normal) compile_time/peak_megabytes_allocated 22.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 21.750 [unchanged, 1.1%]
T16190(normal) compile_time/bytes allocated 289083120.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 297600232.000 [unchanged, -2.9%]
T17096(normal) compile_time/bytes allocated 335126560.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 345064034.000 [unchanged, -2.9%]
T17499(normal) runtime/bytes allocated 96040840.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 96040840.000 [unchanged, 0.0%]
T17516(normal) compile_time/bytes allocated 1349172888.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 1359413668.000 [unchanged, -0.8%]
T17977(normal) compile_time/bytes allocated 66487912.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 66259382.000 [unchanged, 0.3%]
T17977b(normal) compile_time/bytes allocated 49566912.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 49343400.000 [unchanged, 0.5%]
T18140(normal) compile_time/bytes allocated 121711656.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 121495312.000 [unchanged, 0.2%]
T18282(normal) compile_time/bytes allocated 171978032.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 179362804.000 [decreased, -4.1%]
T18304(normal) compile_time/bytes allocated 105305456.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 115209276.000 [decreased, -8.6%]
T18478(normal) compile_time/bytes allocated 1490263872.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 1500765604.000 [unchanged, -0.7%]
T1969(normal) compile_time/bytes allocated 867662712.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 867542506.000 [unchanged, 0.0%]
T1969(normal) compile_time/max_bytes_used 20395240.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 19771132.000 [unchanged, 3.2%]
T1969(normal) compile_time/peak_megabytes_allocated 53.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 52.000 [unchanged, 1.9%]
T2762(normal) runtime/peak_megabytes_allocated 2.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 2.000 [unchanged, 0.0%]
T3064(normal) compile_time/bytes allocated 212776040.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 212548500.000 [unchanged, 0.1%]
T3064(normal) compile_time/max_bytes_used 17720680.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 17645770.000 [unchanged, 0.4%]
T3064(normal) compile_time/peak_megabytes_allocated 46.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 46.000 [unchanged, 0.0%]
T3294(normal) compile_time/bytes allocated 1790232936.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 1794850360.000 [unchanged, -0.3%]
T3294(normal) compile_time/max_bytes_used 36572504.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 38032538.000 [unchanged, -3.8%]
T3294(normal) compile_time/peak_megabytes_allocated 96.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 100.000 [unchanged, -4.0%]
T3474(normal) runtime/max_bytes_used 44376.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 44376.000 [unchanged, 0.0%]
T3586(normal) runtime/bytes allocated 16101656.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 16101656.000 [unchanged, 0.0%]
T3586(normal) runtime/max_bytes_used 44376.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 44376.000 [unchanged, 0.0%]
T3586(normal) runtime/peak_megabytes_allocated 17.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 17.000 [unchanged, 0.0%]
T3738(normal) runtime/bytes allocated 50368.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 50368.000 [unchanged, 0.0%]
T3738(normal) runtime/peak_megabytes_allocated 2.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 2.000 [unchanged, 0.0%]
T3924(normal) runtime/bytes allocated 50608.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 50608.000 [unchanged, 0.0%]
T4029(ghci) runtime/max_bytes_used 16507984.000
(baseline @ 4517a38215eb72a4824c72d97377b9325059bf55) 18007862.000 [unchanged, -8.3%]
That's nine metric decreases! One of those metric decreases is T12150
, in fact:
=====> T12150(optasm) 3743 of 7695 [0, 0, 0]
cd "/builds/RyanGlScott/ghc/tmp/ghctest-io4ap4oc/test spaces/testsuite/tests/perf/compiler/T12150.run" && "/builds/RyanGlScott/ghc/_build/install/bin/ghc" -c T12150.hs -no-user-package-db -rtsopts -fno-warn-missed-specialisations -fshow-warning-groups -fdiagnostics-color=never -fno-diagnostics-show-caret -Werror=compat -dno-debug-output -O -fasm +RTS -V0 -tT12150.comp.stats --machine-readable -RTS<
compile_time/bytes allocated decreased from "x86_64-linux-deb9-hadrian" baseline @ 4517a38215eb72a4824c72d97377b9325059bf55:
Expected T12150 (optasm) compile_time/bytes allocated: 104170290.0 +/-2%
Lower bound T12150 (optasm) compile_time/bytes allocated: 102086884
Upper bound T12150 (optasm) compile_time/bytes allocated: 106253696
Actual T12150 (optasm) compile_time/bytes allocated: 94472576
Deviation T12150 (optasm) compile_time/bytes allocated: -9.3 %
Not bad!
¹ Ordinary in the sense that GHC.Classes
defines class (a, b) => (a, b)
et al. like any other classes. Normally, the parser would reject trying to steal built-in syntax like this, but parser carves out a special exception for the constraint tuples in GHC.Classes
.