!9614: Make the occurrence analyser smarter about join points · Merge requests · Glasgow Haskell Compiler / GHC

Simon Peyton Jones requested to merge wip/T22404 into master Dec 24, 2022

This MR addresses #22404 (closed). There is a big Note [Occurrence analysis for join points] that explains it all. Significant changes

New field occ_join_points in OccEnv
The NonRec case of occAnalBind splits into two cases: one for existing join points (which does the special magic for Note [Occurrence analysis for join points], and one for other bindings.
mkOneOcc adds in info from occ_join_points.
All "bring into scope" activity is centralised in the new function addInScope.
I made a local data type LocalOcc for use inside the occurrence analyser It is like OccInfo, but lacks IAmDead and IAmALoopBreaker, which in turn makes computationns over it simpler and more efficient.
I found quite a bit of allocation in GHC.Core.Rules.getRules so I optimised it a bit.

More minor changes

Renamed data constructor WithUsageDetails to WUD, and WithTailUsageDetails to WTUD

--------- Compiler perf -----------

I spent quite a time on performance tuning, so even though it does more than before, the occurrence analyser runs slightly faster on average. Here are the compile-time allocation changes over 1%

 CoOpt_Read(normal) ghc/alloc    766,003,076    748,985,544  -2.2% GOOD
     T10858(normal) ghc/alloc    120,782,748    118,735,744  -1.7%
     T11545(normal) ghc/alloc     79,829,332     78,722,128  -1.4%
     T12150(optasm) ghc/alloc     73,881,192     72,854,208  -1.4%
     T13056(optasm) ghc/alloc    294,495,436    290,226,600  -1.4%
     T13253(normal) ghc/alloc    364,663,144    361,043,432  -1.0%
 T13253-spj(normal) ghc/alloc    118,248,796     59,996,856 -49.3% GOOD
     T15164(normal) ghc/alloc  1,102,607,920  1,087,375,984  -1.4%
     T15304(normal) ghc/alloc  1,196,061,524  1,155,296,336  -3.4%
     T15630(normal) ghc/alloc    148,707,300    147,104,768  -1.1%
     T17516(normal) ghc/alloc  1,657,993,132  1,626,735,192  -1.9%
     T17836(normal) ghc/alloc    395,306,932    391,219,640  -1.0%
     T18140(normal) ghc/alloc     71,948,496     73,206,920  +1.7%
     T18282(normal) ghc/alloc    129,090,864    131,483,440  +1.9%
    T18698b(normal) ghc/alloc    230,313,396    233,017,416  +1.2%  BAD
      T4801(normal) ghc/alloc    247,568,452    250,836,624  +1.3%
      T9233(normal) ghc/alloc    709,634,020    685,363,720  -3.4% GOOD
      T9630(normal) ghc/alloc    965,838,132    942,010,984  -2.5% GOOD
      T9675(optasm) ghc/alloc    444,583,940    429,417,416  -3.4% GOOD
      T9961(normal) ghc/alloc    303,041,544    307,384,192  +1.4%  BAD
      WWRec(normal) ghc/alloc    503,706,372    495,554,224  -1.6%

          geo. mean                                          -1.0%
          minimum                                           -49.3%
          maximum                                            +1.9%

The big win on T13253-spj comes because it has a big nest of join points, each occurring twice in the next one. The new occ-anal takes only one iteration of the simplifier to do the inlining; the old one took four. Moreover, we get much smaller code with the new one:

  New: Result size of Tidy Core
    = {terms: 429, types: 84, coercions: 0, joins: 14/14}

  Old: Result size of Tidy Core
    = {terms: 2,437, types: 304, coercions: 0, joins: 10/10}

--------- Runtime perf -----------

No significant changes in nofib results, except a 1% reduction in compiler allocation.

Edited Jul 26, 2023 by Simon Peyton Jones

Make the occurrence analyser smarter about join points

Merge request reports