Skip to content
  • Sebastian Graf's avatar
    9f57c96d
    Make DataCon workers strict in strict fields (#20749) · 9f57c96d
    Sebastian Graf authored and Marge Bot's avatar Marge Bot committed
    
    
    This patch tweaks `exprIsConApp_maybe`, `exprIsHNF` and friends, and Demand
    Analysis so that they exploit and maintain strictness of DataCon workers. See
    `Note [Strict fields in Core]` for details.
    
    Very little needed to change, and it puts field seq insertion done by Tag
    Inference into a new perspective: That of *implementing* strict field semantics.
    Before Tag Inference, DataCon workers are strict. Afterwards they are
    effectively lazy and field seqs happen around use sites. History has shown
    that there is no other way to guarantee taggedness and thus the STG Strict Field
    Invariant.
    
    Knock-on changes:
    
      * I reworked the whole narrative around "Tag inference".
        It's now called "EPT enforcement" and I recycyled the different overview
        Notes into `Note [EPT enforcement]`.
    
      * `exprIsHNF` previously used `exprOkForSpeculation` on unlifted arguments
        instead of recursing into `exprIsHNF`. That regressed the termination
        analysis in CPR analysis (which simply calls out to `exprIsHNF`), so I made
        it call `exprOkForSpeculation`, too.
    
      * There's a small regression in Demand Analysis, visible in the changed test
        output of T16859: Previously, a field seq on a variable would give that
        variable a "used exactly once" demand, now it's "used at least once",
        because `dmdTransformDataConSig` accounts for future uses of the field
        that actually all go through the case binder (and hence won't re-enter the
        potential thunk). The difference should hardly be observable.
    
      * The Simplifier's fast path for data constructors only applies to lazy
        data constructors now. I observed regressions involving Data.Binary.Put's
        `Pair` data type.
    
      * Unfortunately, T21392 does no longer reproduce after this patch, so I marked
        it as "not broken" in order to track whether we regress again in the future.
    
    Fixes #20749, the satisfying conclusion of an annoying saga (cf. the ideas
    in #21497 and #22475).
    
    Compiler perf generally improves, sometimes drastically:
    
                                                         Baseline
                                     Test    Metric          value      New value Change
    --------------------------------------------------------------------------------
                 ManyConstructors(normal) ghc/alloc  3,629,760,116  3,711,852,800  +2.3%  BAD
      MultiLayerModulesTH_OneShot(normal) ghc/alloc  2,502,735,440  2,565,282,888  +2.5%  BAD
                           T12707(normal) ghc/alloc    804,399,798    791,807,320  -1.6% GOOD
                           T17516(normal) ghc/alloc    964,987,744  1,008,383,520  +4.5%
                           T18140(normal) ghc/alloc     75,381,152     49,860,560 -33.9% GOOD
                          T18698b(normal) ghc/alloc    232,614,457    184,262,736 -20.8% GOOD
                           T18923(normal) ghc/alloc     62,002,368     58,301,408  -6.0% GOOD
                           T20049(normal) ghc/alloc     75,719,168     70,494,368  -6.9% GOOD
                            T3294(normal) ghc/alloc  1,237,925,833  1,157,638,992  -6.5% GOOD
                            T9233(normal) ghc/alloc    686,490,105    635,166,688  -7.5% GOOD
    
                                geo. mean                                          -0.7%
                                minimum                                           -33.9%
                                maximum                                            +4.5%
    
    I looked at T17516. It seems we do a few more simplifier iterations and end up
    with a larger program. It seems that some things inline more, while other things
    inline less. I don't see low-hanging fruit.
    
    I also looked at MultiLayerModulesTH_OneShot. It appears we generate a strange
    join point in the `getUnique` method of `Uniquable GHC.Unit.Types.Module` that
    should better call-site inline, but does not. Perhaps with !11492.
    
    NoFib does not seem affected much either:
    
    +-------------------------------++--+------------+-----------+---------------+-----------+
    |                               ||  |      base/ | std. err. | T20749/ (rel) | std. err. |
    +===============================++==+============+===========+===============+===========+
    |           spectral/last-piece ||  |    7.263e8 |      0.0% |        +0.62% |      0.0% |
    +===============================++==+============+===========+===============+===========+
    |                     geom mean ||  |     +0.00% |           |               |           |
    +-------------------------------++--+------------+-----------+---------------+-----------+
    
    I had a look at last-piece. Nothing changes in stg-final, but there is a bit
    of ... movement around Data.Map.insert's use of GHC.Exts.lazy that is gone in
    stg-final.
    
    Co-Authored-By: default avatarJaro Reinders <jaro.reinders@gmail.com>
    
    Metric Decrease:
        T12707
        T18140
        T18698b
        T18923
        T19695
        T20049
        T3294
        T9233
        T21839c
    Metric Increase:
        ManyConstructors
        MultiLayerModulesTH_OneShot
    9f57c96d
    Make DataCon workers strict in strict fields (#20749)
    Sebastian Graf authored and Marge Bot's avatar Marge Bot committed
    
    
    This patch tweaks `exprIsConApp_maybe`, `exprIsHNF` and friends, and Demand
    Analysis so that they exploit and maintain strictness of DataCon workers. See
    `Note [Strict fields in Core]` for details.
    
    Very little needed to change, and it puts field seq insertion done by Tag
    Inference into a new perspective: That of *implementing* strict field semantics.
    Before Tag Inference, DataCon workers are strict. Afterwards they are
    effectively lazy and field seqs happen around use sites. History has shown
    that there is no other way to guarantee taggedness and thus the STG Strict Field
    Invariant.
    
    Knock-on changes:
    
      * I reworked the whole narrative around "Tag inference".
        It's now called "EPT enforcement" and I recycyled the different overview
        Notes into `Note [EPT enforcement]`.
    
      * `exprIsHNF` previously used `exprOkForSpeculation` on unlifted arguments
        instead of recursing into `exprIsHNF`. That regressed the termination
        analysis in CPR analysis (which simply calls out to `exprIsHNF`), so I made
        it call `exprOkForSpeculation`, too.
    
      * There's a small regression in Demand Analysis, visible in the changed test
        output of T16859: Previously, a field seq on a variable would give that
        variable a "used exactly once" demand, now it's "used at least once",
        because `dmdTransformDataConSig` accounts for future uses of the field
        that actually all go through the case binder (and hence won't re-enter the
        potential thunk). The difference should hardly be observable.
    
      * The Simplifier's fast path for data constructors only applies to lazy
        data constructors now. I observed regressions involving Data.Binary.Put's
        `Pair` data type.
    
      * Unfortunately, T21392 does no longer reproduce after this patch, so I marked
        it as "not broken" in order to track whether we regress again in the future.
    
    Fixes #20749, the satisfying conclusion of an annoying saga (cf. the ideas
    in #21497 and #22475).
    
    Compiler perf generally improves, sometimes drastically:
    
                                                         Baseline
                                     Test    Metric          value      New value Change
    --------------------------------------------------------------------------------
                 ManyConstructors(normal) ghc/alloc  3,629,760,116  3,711,852,800  +2.3%  BAD
      MultiLayerModulesTH_OneShot(normal) ghc/alloc  2,502,735,440  2,565,282,888  +2.5%  BAD
                           T12707(normal) ghc/alloc    804,399,798    791,807,320  -1.6% GOOD
                           T17516(normal) ghc/alloc    964,987,744  1,008,383,520  +4.5%
                           T18140(normal) ghc/alloc     75,381,152     49,860,560 -33.9% GOOD
                          T18698b(normal) ghc/alloc    232,614,457    184,262,736 -20.8% GOOD
                           T18923(normal) ghc/alloc     62,002,368     58,301,408  -6.0% GOOD
                           T20049(normal) ghc/alloc     75,719,168     70,494,368  -6.9% GOOD
                            T3294(normal) ghc/alloc  1,237,925,833  1,157,638,992  -6.5% GOOD
                            T9233(normal) ghc/alloc    686,490,105    635,166,688  -7.5% GOOD
    
                                geo. mean                                          -0.7%
                                minimum                                           -33.9%
                                maximum                                            +4.5%
    
    I looked at T17516. It seems we do a few more simplifier iterations and end up
    with a larger program. It seems that some things inline more, while other things
    inline less. I don't see low-hanging fruit.
    
    I also looked at MultiLayerModulesTH_OneShot. It appears we generate a strange
    join point in the `getUnique` method of `Uniquable GHC.Unit.Types.Module` that
    should better call-site inline, but does not. Perhaps with !11492.
    
    NoFib does not seem affected much either:
    
    +-------------------------------++--+------------+-----------+---------------+-----------+
    |                               ||  |      base/ | std. err. | T20749/ (rel) | std. err. |
    +===============================++==+============+===========+===============+===========+
    |           spectral/last-piece ||  |    7.263e8 |      0.0% |        +0.62% |      0.0% |
    +===============================++==+============+===========+===============+===========+
    |                     geom mean ||  |     +0.00% |           |               |           |
    +-------------------------------++--+------------+-----------+---------------+-----------+
    
    I had a look at last-piece. Nothing changes in stg-final, but there is a bit
    of ... movement around Data.Map.insert's use of GHC.Exts.lazy that is gone in
    stg-final.
    
    Co-Authored-By: default avatarJaro Reinders <jaro.reinders@gmail.com>
    
    Metric Decrease:
        T12707
        T18140
        T18698b
        T18923
        T19695
        T20049
        T3294
        T9233
        T21839c
    Metric Increase:
        ManyConstructors
        MultiLayerModulesTH_OneShot
Loading