Skip to content

Draft: Nested CPR Analysis (#18174)

Sebastian Graf requested to merge wip/nested-cpr-2019 into master

Nested CPR analysis (#18174 (closed))

This MR extends CPR analysis to unbox nested constructors. See Note [Nested CPR] for examples.

Unboxing a function's result beyond the first level risks making the function more strict, rendering the transformation unsound. See Note [Nested CPR needs Termination information]. To justify unboxing anyway, Nested CPR interleaves a termination analysis that is like a higher-order exprOkForSpeculation.

The termination analysis makes for the bulk of complexity in this patch. In principle, we can use the results of that analysis in many more ways in the future to do speculative execution.

Although there are quite a few examples in test cases that are now properly optimised (e.g., T1600, T18174, T18894), the results on NoFib are meager:

--------------------------------------------------------------------------------
        Program         Allocs    Instrs
--------------------------------------------------------------------------------
      cacheprof          -0.3%     -1.4%
      compress2          -1.9%     -0.9%
 fannkuch-redux           0.0%     -1.3%
         gamteb          -1.6%     -0.3%
       nucleic2          -1.2%     -0.6%
          sched          -0.0%     +0.9%
           x2n1          -0.0%     -5.0%
--------------------------------------------------------------------------------
            Min          -1.9%     -5.0%
            Max          +0.1%     +0.9%
 Geometric Mean          -0.1%     -0.1%

Allocation while compiling NoFib increases by 0.5%. Binary sizes on NoFib increase by 0.7%.

This patch manages to fix a few tickets: Fixes #1600 (closed), #18174 (closed), #18109 (closed)

ghc/alloc performance generally increases. run/alloc metrics improve throughout.

Justifications for metric increases:

  • MultiLayerModules increases due to #19293 (closed).
  • I could reproduce the 2.5% increase on T13701 on fedora in a -O0 perf-flavoured build. With -fno-code or -O2 this patch is faster. I investigated -v2 output, nothing obvious. It's very similar to #19293 (closed), so I'm just going to accept it.
  • The +15% ghc/alloc increase on T15164 in a registerised, validate-flavoured build does not show up under -dshow-passes and has no impact on runtime. #19311
  • I verified that T13253 simply does one more round of Simplification after Nested CPR
  • I looked at heap profiles for the ghc/max_bytes_used increases, which didn't show any obvious offenders.
Edited by Andreas Klebinger

Merge request reports