Draft: Nested CPR Analysis (#18174)
Nested CPR analysis (#18174 (closed))
This MR extends CPR analysis to unbox nested constructors.
See Note [Nested CPR]
for examples.
Unboxing a function's result beyond the first level risks making the
function more strict, rendering the transformation unsound.
See Note [Nested CPR needs Termination information]
.
To justify unboxing anyway, Nested CPR interleaves a termination
analysis that is like a higher-order exprOkForSpeculation
.
The termination analysis makes for the bulk of complexity in this patch. In principle, we can use the results of that analysis in many more ways in the future to do speculative execution.
Although there are quite a few examples in test cases that are now
properly optimised (e.g., T1600
, T18174
, T18894
), the results on
NoFib are meager:
--------------------------------------------------------------------------------
Program Allocs Instrs
--------------------------------------------------------------------------------
cacheprof -0.3% -1.4%
compress2 -1.9% -0.9%
fannkuch-redux 0.0% -1.3%
gamteb -1.6% -0.3%
nucleic2 -1.2% -0.6%
sched -0.0% +0.9%
x2n1 -0.0% -5.0%
--------------------------------------------------------------------------------
Min -1.9% -5.0%
Max +0.1% +0.9%
Geometric Mean -0.1% -0.1%
Allocation while compiling NoFib increases by 0.5%. Binary sizes on NoFib increase by 0.7%.
This patch manages to fix a few tickets: Fixes #1600 (closed), #18174 (closed), #18109 (closed)
ghc/alloc
performance generally increases.
run/alloc
metrics improve throughout.
Justifications for metric increases:
-
MultiLayerModules
increases due to #19293 (closed). - I could reproduce the 2.5% increase on
T13701
on fedora in a-O0
perf-flavoured build. With-fno-code
or-O2
this patch is faster. I investigated-v2
output, nothing obvious. It's very similar to #19293 (closed), so I'm just going to accept it. - The +15%
ghc/alloc
increase onT15164
in a registerised, validate-flavoured build does not show up under-dshow-passes
and has no impact on runtime. #19311 - I verified that
T13253
simply does one more round of Simplification after Nested CPR - I looked at heap profiles for the
ghc/max_bytes_used
increases, which didn't show any obvious offenders.