Commit 0bd60059 authored by Simon Peyton Jones's avatar Simon Peyton Jones Committed by Marge Bot

This patch addresses the exponential blow-up in the simplifier.

Specifically:
  #13253 exponential inlining
  #10421 ditto
  #18140 strict constructors
  #18282 another nested-function call case

This patch makes one really significant changes: change the way that
mkDupableCont handles StrictArg.  The details are explained in
GHC.Core.Opt.Simplify Note [Duplicating StrictArg].

Specific changes

* In mkDupableCont, when making auxiliary bindings for the other arguments
  of a call, add extra plumbing so that we don't forget the demand on them.
  Otherwise we haev to wait for another round of strictness analysis. But
  actually all the info is to hand.  This change affects:
  - Make the strictness list in ArgInfo be [Demand] instead of [Bool],
    and rename it to ai_dmds.
  - Add as_dmd to ValArg
  - Simplify.makeTrivial takes a Demand
  - mkDupableContWithDmds takes a [Demand]

There are a number of other small changes

1. For Ids that are used at most once in each branch of a case, make
   the occurrence analyser record the total number of syntactic
   occurrences.  Previously we recorded just OneBranch or
   MultipleBranches.

   I thought this was going to be useful, but I ended up barely
   using it; see Note [Note [Suppress exponential blowup] in
   GHC.Core.Opt.Simplify.Utils

   Actual changes:
     * See the occ_n_br field of OneOcc.
     * postInlineUnconditionally

2. I found a small perf buglet in SetLevels; see the new
   function GHC.Core.Opt.SetLevels.hasFreeJoin

3. Remove the sc_cci field of StrictArg.  I found I could get
   its information from the sc_fun field instead.  Less to get
   wrong!

4. In ArgInfo, arrange that ai_dmds and ai_discs have a simpler
   invariant: they line up with the value arguments beyond ai_args
   This allowed a bit of nice refactoring; see isStrictArgInfo,
   lazyArgcontext, strictArgContext

There is virtually no difference in nofib. (The runtime numbers
are bogus -- I tried a few manually.)

        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
            fft          +0.0%     -2.0%    -48.3%    -49.4%      0.0%
     multiplier          +0.0%     -2.2%    -50.3%    -50.9%      0.0%
--------------------------------------------------------------------------------
            Min          -0.4%     -2.2%    -59.2%    -60.4%      0.0%
            Max          +0.0%     +0.1%     +3.3%     +4.9%      0.0%
 Geometric Mean          +0.0%     -0.0%    -33.2%    -34.3%     -0.0%

Test T18282 is an existing example of these deeply-nested strict calls.
We get a big decrease in compile time (-85%) because so much less
inlining takes place.

Metric Decrease:
    T18282
parent 0a815cea
Pipeline #22717 passed with stages
in 364 minutes and 44 seconds