Skip to content
  • Simon Peyton Jones's avatar
    More accurate unboxing · 0a82ae0d
    Simon Peyton Jones authored and Marge Bot's avatar Marge Bot committed
    This patch implements a fix for #20817.  It ensures that
    
    * The final strictness signature for a function accurately
      reflects the unboxing done by the wrapper
      See Note [Finalising boxity for demand signatures]
      and Note [Finalising boxity for let-bound Ids]
    
    * A much better "layer-at-a-time" implementation of the
      budget for how many worker arguments we can have
      See Note [Worker argument budget]
    
      Generally this leads to a bit more worker/wrapper generation,
      because instead of aborting entirely if the budget is exceeded
      (and then lying about boxity), we unbox a bit.
    
    Binary sizes in increase slightly (around 1.8%) because of the increase
    in worker/wrapper generation.  The big effects are to GHC.Ix,
    GHC.Show, GHC.IO.Handle.Internals. If we did a better job of dropping
    dead code, this effect might go away.
    
    Some nofib perf improvements:
    
            Program           Size    Allocs   Runtime   Elapsed  TotalMem
    --------------------------------------------------------------------------------
                VSD          +1.8%     -0.5%     0.017     0.017      0.0%
             awards          +1.8%     -0.1%     +2.3%     +2.3%      0.0%
             banner          +1.7%     -0.2%     +0.3%     +0.3%      0.0%
               bspt          +1.8%     -0.1%     +3.1%     +3.1%      0.0%
              eliza          +1.8%     -0.1%     +1.2%     +1.2%      0.0%
             expert          +1.7%     -0.1%     +9.6%     +9.6%      0.0%
     fannkuch-redux          +1.8%     -0.4%     -9.3%     -9.3%      0.0%
              kahan          +1.8%     -0.1%    +22.7%    +22.7%      0.0%
           maillist          +1.8%     -0.9%    +21.2%    +21.6%      0.0%
           nucleic2          +1.7%     -5.1%     +7.5%     +7.6%      0.0%
             pretty          +1.8%     -0.2%     0.000     0.000      0.0%
    reverse-complem          +1.8%     -2.5%    +12.2%    +12.2%      0.0%
               rfib          +1.8%     -0.2%     +2.5%     +2.5%      0.0%
                scc          +1.8%     -0.4%     0.000     0.000      0.0%
             simple          +1.7%     -1.3%    +17.0%    +17.0%     +7.4%
      spectral-norm          +1.8%     -0.1%     +6.8%     +6.7%      0.0%
             sphere          +1.7%     -2.0%    +13.3%    +13.3%      0.0%
                tak          +1.8%     -0.2%     +3.3%     +3.3%      0.0%
               x2n1          +1.8%     -0.4%     +8.1%     +8.1%      0.0%
    --------------------------------------------------------------------------------
                Min          +1.1%     -5.1%    -23.6%    -23.6%      0.0%
                Max          +1.8%     +0.0%    +36.2%    +36.2%     +7.4%
     Geometric Mean          +1.7%     -0.1%     +6.8%     +6.8%     +0.1%
    
    Compiler allocations in CI have a geometric mean of +0.1%; many small
    decreases but there are three bigger increases (7%), all because we do
    more worker/wrapper than before, so there is simply more code to
    compile.  That's OK.
    
    Perf benchmarks in perf/should_run improve in allocation by a geo mean
    of -0.2%, which is good.  None get worse. T12996 improves by -5.8%
    
    Metric Decrease:
        T12996
    Metric Increase:
        T18282
        T18923
        T9630
    0a82ae0d