Skip to content
  • Ben Gamari's avatar
    simplifier: Kill off ufKeenessFactor · 3d2991f8
    Ben Gamari authored and Marge Bot's avatar Marge Bot committed
    We used to have another factor, ufKeenessFactor, which would scale the
    discounts before they were subtracted from the size. This was justified
    with the following comment:
    
      -- We multiple the raw discounts (args_discount and result_discount)
      -- ty opt_UnfoldingKeenessFactor because the former have to do with
      --  *size* whereas the discounts imply that there's some extra
      --  *efficiency* to be gained (e.g. beta reductions, case reductions)
      -- by inlining.
    
    However, this is highly suspect since it means that we subtract a
    *scaled* size from an absolute size, resulting in crazy (e.g. negative)
    scores in some cases (#15304). We consequently killed off
    ufKeenessFactor and bumped up the ufUseThreshold to compensate.
    
    Adjustment of unfolding use threshold
    =====================================
    
    Since this removes a discount from our inlining heuristic, I revisited our
    default choice of -funfolding-use-threshold to minimize the change in
    overall inlining behavior. Specifically, I measured runtime allocations
    and executable size of nofib and the testsuite performance tests built
    using compilers (and core libraries) built with several values of
    -funfolding-use-threshold.
    
    This comes as a result of a quantitative comparison of testsuite
    performance and code size as a function of ufUseThreshold, comparing
    GHC trees using values of 50, 60, 70, 80, 90, and 100. The test set
    consisted of nofib and the testsuite performance tests.
    A full summary of these measurements are found in the description of
    !2608
    
    Comparing executable sizes (relative to the base commit) across all
    nofib tests, we see that sizes are similar to the baseline:
    
                gmean      min      max   median
    thresh
    50         -6.36%   -7.04%   -4.82%   -6.46%
    60         -5.04%   -5.97%   -3.83%   -5.11%
    70         -2.90%   -3.84%   -2.31%   -2.92%
    80         -0.75%   -2.16%   -0.42%   -0.73%
    90         +0.24%   -0.41%   +0.55%   +0.26%
    100        +1.36%   +0.80%   +1.64%   +1.37%
    baseline   +0.00%   +0.00%   +0.00%   +0.00%
    
    Likewise, looking at runtime allocations we see that 80 gives slightly
    better optimisation than the baseline:
    
                gmean      min      max   median
    thresh
    50         +0.16%   -0.16%   +4.43%   +0.00%
    60         +0.09%   -0.00%   +3.10%   +0.00%
    70         +0.04%   -0.09%   +2.29%   +0.00%
    80         +0.02%   -1.17%   +2.29%   +0.00%
    90         -0.02%   -2.59%   +1.86%   +0.00%
    100        +0.00%   -2.59%   +7.51%   -0.00%
    baseline   +0.00%   +0.00%   +0.00%   +0.00%
    
    Finally, I had to add a NOINLINE in T4306 to ensure that `upd` is
    worker-wrappered as the test expects. This makes me wonder whether the
    inlining heuristic is now too liberal as `upd` is quite a large
    function. The same measure was taken in T12600.
    
                 Wall clock time compiling Cabal with -O0
    thresh       50     60     70     80     90      100    baseline
    build-Cabal  93.88  89.58  92.59  90.09  100.26  94.81  89.13
    
    Also, this change happens to avoid the spurious test output in
    `plugin-recomp-change` and `plugin-recomp-change-prof` (see #17308).
    
    Metric Decrease:
        hie002
        T12234
        T13035
        T13719
        T14683
        T4801
        T5631
        T5642
        T9020
        T9872d
        T9961
    Metric Increase:
        T12150
        T12425
        T13701
        T14697
        T15426
        T1969
        T3064
        T5837
        T6048
        T9203
        T9872a
        T9872b
        T9872c
        T9872d
        haddock.Cabal
        haddock.base
        haddock.compiler
    3d2991f8