Skip to content
Snippets Groups Projects
  1. Aug 03, 2023
  2. Aug 02, 2023
  3. Aug 01, 2023
  4. Jul 30, 2023
    • Julian Ospald's avatar
    • Simon Peyton Jones's avatar
      Make the occurrence analyser smarter about join points · d0369802
      Simon Peyton Jones authored and Simon Peyton Jones's avatar Simon Peyton Jones committed
      This MR addresses #22404.  There is a big Note
      
         Note [Occurrence analysis for join points]
      
      that explains it all.  Significant changes
      
      * New field occ_join_points in OccEnv
      
      * The NonRec case of occAnalBind splits into two cases:
        one for existing join points (which does the special magic for
        Note [Occurrence analysis for join points], and one for other
        bindings.
      
      * mkOneOcc adds in info from occ_join_points.
      
      * All "bring into scope" activity is centralised in the
        new function `addInScope`.
      
      * I made a local data type LocalOcc for use inside the occurrence analyser
        It is like OccInfo, but lacks IAmDead and IAmALoopBreaker, which in turn
        makes computationns over it simpler and more efficient.
      
      * I found quite a bit of allocation in GHC.Core.Rules.getRules
        so I optimised it a bit.
      
      More minor changes
      
      * I found I was using (Maybe Arity) a lot, so I defined a new data
        type JoinPointHood and used it everwhere.  This touches a lot of
        non-occ-anal files, but it makes everything more perspicuous.
      
      * Renamed data constructor WithUsageDetails to WUD, and
        WithTailUsageDetails to WTUD
      
      This also fixes #21128, on the way.
      
      --------- Compiler perf -----------
      I spent quite a time on performance tuning, so even though it
      does more than before, the occurrence analyser runs slightly faster
      on average.  Here are the compile-time allocation changes over 0.5%
      
            CoOpt_Read(normal) ghc/alloc    766,025,520    754,561,992  -1.5%
      CoOpt_Singletons(normal) ghc/alloc    759,436,840    762,925,512  +0.5%
           LargeRecord(normal) ghc/alloc  1,814,482,440  1,799,530,456  -0.8%
             PmSeriesT(normal) ghc/alloc     68,159,272     67,519,720  -0.9%
                T10858(normal) ghc/alloc    120,805,224    118,746,968  -1.7%
                T11374(normal) ghc/alloc    164,901,104    164,070,624  -0.5%
                T11545(normal) ghc/alloc     79,851,808     78,964,704  -1.1%
                T12150(optasm) ghc/alloc     73,903,664     71,237,544  -3.6% GOOD
                T12227(normal) ghc/alloc    333,663,200    331,625,864  -0.6%
                T12234(optasm) ghc/alloc     52,583,224     52,340,344  -0.5%
                T12425(optasm) ghc/alloc     81,943,216     81,566,720  -0.5%
                T13056(optasm) ghc/alloc    294,517,928    289,642,512  -1.7%
            T13253-spj(normal) ghc/alloc    118,271,264     59,859,040 -49.4% GOOD
                T15164(normal) ghc/alloc  1,102,630,352  1,091,841,296  -1.0%
                T15304(normal) ghc/alloc  1,196,084,000  1,166,733,000  -2.5%
                T15630(normal) ghc/alloc    148,729,632    147,261,064  -1.0%
                T15703(normal) ghc/alloc    379,366,664    377,600,008  -0.5%
                T16875(normal) ghc/alloc     32,907,120     32,670,976  -0.7%
                T17516(normal) ghc/alloc  1,658,001,888  1,627,863,848  -1.8%
                T17836(normal) ghc/alloc    395,329,400    393,080,248  -0.6%
                T18140(normal) ghc/alloc     71,968,824     73,243,040  +1.8%
                T18223(normal) ghc/alloc    456,852,568    453,059,088  -0.8%
                T18282(normal) ghc/alloc    129,105,576    131,397,064  +1.8%
                T18304(normal) ghc/alloc     71,311,712     70,722,720  -0.8%
               T18698a(normal) ghc/alloc    208,795,112    210,102,904  +0.6%
               T18698b(normal) ghc/alloc    230,320,736    232,697,976  +1.0%  BAD
                T19695(normal) ghc/alloc  1,483,648,128  1,504,702,976  +1.4%
                T20049(normal) ghc/alloc     85,612,024     85,114,376  -0.6%
               T21839c(normal) ghc/alloc    415,080,992    410,906,216  -1.0% GOOD
                 T4801(normal) ghc/alloc    247,590,920    250,726,272  +1.3%
                 T6048(optasm) ghc/alloc     95,699,416     95,080,680  -0.6%
                  T783(normal) ghc/alloc    335,323,384    332,988,120  -0.7%
                 T9233(normal) ghc/alloc    709,641,224    685,947,008  -3.3% GOOD
                 T9630(normal) ghc/alloc    965,635,712    948,356,120  -1.8%
                 T9675(optasm) ghc/alloc    444,604,152    428,987,216  -3.5% GOOD
                 T9961(normal) ghc/alloc    303,064,592    308,798,800  +1.9%  BAD
                 WWRec(normal) ghc/alloc    503,728,832    498,102,272  -1.1%
      
                     geo. mean                                          -1.0%
                     minimum                                           -49.4%
                     maximum                                            +1.9%
      
      In fact these figures seem to vary between platforms; generally worse
      on i386 for some reason.  The Windows numbers vary by 1% espec in
      benchmarks where the total allocation is low. But the geom mean stays
      solidly negative, which is good.  The "increase/decrease" list below
      covers all platforms.
      
      The big win on T13253-spj comes because it has a big nest of join
      points, each occurring twice in the next one.  The new occ-anal takes
      only one iteration of the simplifier to do the inlining; the old one
      took four.  Moreover, we get much smaller code with the new one:
      
        New: Result size of Tidy Core
          = {terms: 429, types: 84, coercions: 0, joins: 14/14}
      
        Old: Result size of Tidy Core
          = {terms: 2,437, types: 304, coercions: 0, joins: 10/10}
      
      --------- Runtime perf -----------
      No significant changes in nofib results, except a 1% reduction in
      compiler allocation.
      
      Metric Decrease:
          CoOpt_Read
          T13253-spj
          T9233
          T9630
          T9675
          T12150
          T21839c
          LargeRecord
          MultiComponentModulesRecomp
          T10421
          T13701
          T10421
          T13701
          T12425
      
      Metric Increase:
          T18140
          T9961
          T18282
          T18698a
          T18698b
          T19695
      d0369802
  5. Jul 28, 2023
    • Bodigrim's avatar
      Add since pragmas to GHC.IO.Handle.FD · ee93edfd
      Bodigrim authored and Marge Bot's avatar Marge Bot committed
      ee93edfd
    • Bodigrim's avatar
      Bump filepath submodule to 1.4.100.4 · e9a0fa3f
      Bodigrim authored and Marge Bot's avatar Marge Bot committed
      Resolves #23741
      
      Metric Decrease:
          MultiComponentModules
          MultiComponentModulesRecomp
          MultiLayerModules
          MultiLayerModulesRecomp
          T10421
          T12234
          T12425
          T13035
          T13701
          T13719
          T16875
          T18304
          T18698a
          T18698b
          T21839c
          T9198
          TcPlugin_RewritePerf
          hard_hole_fits
      
      Metric decrease on Windows can be probably attributed to https://github.com/haskell/filepath/pull/183
      e9a0fa3f
    • Andreas Klebinger's avatar
      Aarch64 NCG: Use encoded immediates for literals. · 40425c50
      Andreas Klebinger authored and Marge Bot's avatar Marge Bot committed
      Try to generate
      
          instr x2, <imm>
      
      instead of
      
          mov x1, lit
          instr x2, x1
      
      When possible. This get's rid if quite a few redundant
      mov instructions.
      
      I believe this causes a metric decrease for LargeRecords as
      we reduce register pressure.
      
      -------------------------
      Metric Decrease:
          LargeRecord
      -------------------------
      40425c50
    • Finley McIlwaine's avatar
      Include -haddock in DynFlags fingerprint · 0bfc8908
      Finley McIlwaine authored and Marge Bot's avatar Marge Bot committed
      The -haddock flag determines whether or not the resulting .hi files
      contain haddock documentation strings. If the existing .hi files do
      not contain haddock documentation strings and the user requests them,
      we should recompile.
      0bfc8908
    • Ben Gamari's avatar
      ghc-prim: Use C11 atomics · f8fa1d08
      Ben Gamari authored and Marge Bot's avatar Marge Bot committed
      Previously `ghc-prim`'s atomic wrappers used the legacy `__sync_*`
      family of C builtins. Here we refactor these to rather use the
      appropriate C11 atomic equivalents, allowing us to be more explicit
      about the expected ordering semantics.
      f8fa1d08
Loading