Make the occurrence analyser smarter about join points
This MR addresses #22404 (closed). There is a big Note [Occurrence analysis for join points]
that explains it all. Significant changes
-
New field
occ_join_points
inOccEnv
-
The
NonRec
case of occAnalBind splits into two cases: one for existing join points (which does the special magic forNote [Occurrence analysis for join points]
, and one for other bindings. -
mkOneOcc
adds in info fromocc_join_points
. -
All "bring into scope" activity is centralised in the new function
addInScope
. -
I made a local data type
LocalOcc
for use inside the occurrence analyser It is likeOccInfo
, but lacksIAmDead
andIAmALoopBreaker
, which in turn makes computationns over it simpler and more efficient. -
I found quite a bit of allocation in GHC.Core.Rules.getRules so I optimised it a bit.
More minor changes
- Renamed data constructor
WithUsageDetails
toWUD
, andWithTailUsageDetails
toWTUD
--------- Compiler perf -----------
I spent quite a time on performance tuning, so even though it does more than before, the occurrence analyser runs slightly faster on average. Here are the compile-time allocation changes over 1%
CoOpt_Read(normal) ghc/alloc 766,003,076 748,985,544 -2.2% GOOD
T10858(normal) ghc/alloc 120,782,748 118,735,744 -1.7%
T11545(normal) ghc/alloc 79,829,332 78,722,128 -1.4%
T12150(optasm) ghc/alloc 73,881,192 72,854,208 -1.4%
T13056(optasm) ghc/alloc 294,495,436 290,226,600 -1.4%
T13253(normal) ghc/alloc 364,663,144 361,043,432 -1.0%
T13253-spj(normal) ghc/alloc 118,248,796 59,996,856 -49.3% GOOD
T15164(normal) ghc/alloc 1,102,607,920 1,087,375,984 -1.4%
T15304(normal) ghc/alloc 1,196,061,524 1,155,296,336 -3.4%
T15630(normal) ghc/alloc 148,707,300 147,104,768 -1.1%
T17516(normal) ghc/alloc 1,657,993,132 1,626,735,192 -1.9%
T17836(normal) ghc/alloc 395,306,932 391,219,640 -1.0%
T18140(normal) ghc/alloc 71,948,496 73,206,920 +1.7%
T18282(normal) ghc/alloc 129,090,864 131,483,440 +1.9%
T18698b(normal) ghc/alloc 230,313,396 233,017,416 +1.2% BAD
T4801(normal) ghc/alloc 247,568,452 250,836,624 +1.3%
T9233(normal) ghc/alloc 709,634,020 685,363,720 -3.4% GOOD
T9630(normal) ghc/alloc 965,838,132 942,010,984 -2.5% GOOD
T9675(optasm) ghc/alloc 444,583,940 429,417,416 -3.4% GOOD
T9961(normal) ghc/alloc 303,041,544 307,384,192 +1.4% BAD
WWRec(normal) ghc/alloc 503,706,372 495,554,224 -1.6%
geo. mean -1.0%
minimum -49.3%
maximum +1.9%
The big win on T13253-spj
comes because it has a big nest of join
points, each occurring twice in the next one. The new occ-anal takes
only one iteration of the simplifier to do the inlining; the old one
took four. Moreover, we get much smaller code with the new one:
New: Result size of Tidy Core
= {terms: 429, types: 84, coercions: 0, joins: 14/14}
Old: Result size of Tidy Core
= {terms: 2,437, types: 304, coercions: 0, joins: 10/10}
--------- Runtime perf -----------
No significant changes in nofib results, except a 1% reduction in compiler allocation.