Skip to content
Snippets Groups Projects
  1. Apr 12, 2024
    • Ben Gamari's avatar
      ghc-internal: Fix mentions of ghc-internal in deprecation warnings · 55eb8c98
      Ben Gamari authored and Marge Bot's avatar Marge Bot committed
      Closes #24609.
      55eb8c98
    • Ben Gamari's avatar
      base: Deprecate GHC.Pack · a4bb3a51
      Ben Gamari authored and Marge Bot's avatar Marge Bot committed
      As proposed in #21461.
      
      Closes #21540.
      a4bb3a51
    • Ben Gamari's avatar
      testsuite: Add broken test for CApiFFI with -fprefer-bytecode · d23afb8c
      Ben Gamari authored and Marge Bot's avatar Marge Bot committed
      See #24634.
      d23afb8c
    • Andreas Klebinger's avatar
      RTS: Emit warning when -M < -H · 23c3e624
      Andreas Klebinger authored and Marge Bot's avatar Marge Bot committed
      Fixes #24487
      23c3e624
    • Zubin's avatar
      driver: Make `checkHomeUnitsClosed` faster · a933aff3
      Zubin authored and Marge Bot's avatar Marge Bot committed
      The implementation of `checkHomeUnitsClosed` was traversing every single path
      in the unit dependency graph - this grows exponentially and quickly grows to be
      infeasible on larger unit dependency graphs.
      
      Instead we replace this with a faster implementation which follows from the
      specificiation of the closure property - there is a closure error if there are
      units which are both are both (transitively) depended upon by home units and
      (transitively) depend on home units, but are not themselves home units.
      
      To compute the set of units required for closure, we first compute the closure
      of the unit dependency graph, then the transpose of this closure, and find all
      units that are reachable from the home units in the transpose of the closure.
      a933aff3
    • Ben Gamari's avatar
      users-guide: Clarify language extension documentation · 6e18ce2b
      Ben Gamari authored and Marge Bot's avatar Marge Bot committed
      Over the years the users guide's language extension documentation has
      gone through quite a few refactorings. In the process some of the
      descriptions have been rendered non-sensical. For instance, the
      description of `NoImplicitPrelude` actually describes the semantics of
      `ImplicitPrelude`.
      
      To fix this we:
      
       * ensure that all extensions are named in their "positive" sense (e.g.
         `ImplicitPrelude` rather than `NoImplicitPrelude`).
       * rework the documentation to avoid flag-oriented wording
         like "enable" and "disable"
       * ensure that the polarity of the documentation is consistent with
         reality.
      
      Fixes #23895.
      6e18ce2b
  2. Apr 10, 2024
    • Alan Zimmerman's avatar
      EPA: Remove unnecessary XRec in CompleteMatchSig · 1b1a92bd
      Alan Zimmerman authored and Marge Bot's avatar Marge Bot committed
      The XRec for [LIdP pass] is not needed for exact printing, remove it.
      1b1a92bd
    • Ben Gamari's avatar
      testsuite: Add test for lookupSymbolInNativeObj · dccd3ea1
      Ben Gamari authored and Marge Bot's avatar Marge Bot committed
      dccd3ea1
    • Rodrigo Mesquita's avatar
      Use symbol cache in internal interpreter too · 12931698
      Rodrigo Mesquita authored and Marge Bot's avatar Marge Bot committed
      This commit makes the symbol cache that was used by the external
      interpreter available for the internal interpreter too.
      
      This follows from the analysis in #23415 that suggests the internal
      interpreter could benefit from this cache too, and that there is no good
      reason not to have the cache for it too. It also makes it a bit more
      uniform to have the symbol cache range over both the internal and
      external interpreter.
      
      This commit also refactors the cache into a function which is used by
      both `lookupSymbol` and also by `lookupSymbolInDLL`, extending the
      caching logic to `lookupSymbolInDLL` too.
      12931698
    • Rodrigo Mesquita's avatar
      rts: Make addDLL a wrapper around loadNativeObj · dcfaa190
      Rodrigo Mesquita authored and Marge Bot's avatar Marge Bot committed
      Rewrite the implementation of `addDLL` as a wrapper around the more
      principled `loadNativeObj` rts linker function. The latter should be
      preferred while the former is preserved for backwards compatibility.
      
      `loadNativeObj` was previously only available on ELF platforms, so this
      commit further refactors the rts linker to transform loadNativeObj_ELF
      into loadNativeObj_POSIX, which is available in ELF and MachO platforms.
      
      The refactor made it possible to remove the `dl_mutex` mutex in favour
      of always using `linker_mutex` (rather than a combination of both).
      
      Lastly, we implement `loadNativeObj` for Windows too.
      dcfaa190
    • Alexis King's avatar
      linker: Avoid linear search when looking up Haskell symbols via dlsym · e008a19a
      Alexis King authored and Marge Bot's avatar Marge Bot committed
      
      See the primary Note [Looking up symbols in the relevant objects] for a
      more in-depth explanation.
      
      When dynamically loading a Haskell symbol (typical when running a splice or
      GHCi expression), before this commit we would search for the symbol in
      all dynamic libraries that were loaded. However, this could be very
      inefficient when too many packages are loaded (which can happen if there are
      many package dependencies) because the time to lookup the would be
      linear in the number of packages loaded.
      
      This commit drastically improves symbol loading performance by
      introducing a mapping from units to the handles of corresponding loaded
      dlls. These handles are returned by dlopen when we load a dll, and can
      then be used to look up in a specific dynamic library.
      
      Looking up a given Name is now much more precise because we can get
      lookup its unit in the mapping and lookup the symbol solely in the
      handles of the dynamic libraries loaded for that unit.
      
      In one measurement, the wait time before the expression was executed
      went from +-38 seconds down to +-2s.
      
      This commit also includes Note [Symbols may not be found in pkgs_loaded],
      explaining the fallback to the old behaviour in case no dll can be found
      in the unit mapping for a given Name.
      
      Fixes #23415
      
      Co-authored-by: default avatarRodrigo Mesquita <(@alt-romes)>
      e008a19a
    • Rodrigo Mesquita's avatar
      rts: free error message before returning · dd530bb7
      Rodrigo Mesquita authored and Marge Bot's avatar Marge Bot committed
      Fixes a memory leak in rts/linker/PEi386.c
      dd530bb7
    • Jade's avatar
      Validate -main-is flag using parseIdentifier · 3d0806fc
      Jade authored and Marge Bot's avatar Marge Bot committed
      Fixes #24368
      3d0806fc
  3. Apr 09, 2024
  4. Apr 08, 2024
    • Alan Zimmerman's avatar
      EPA: Move DeltaPos and EpaLocation' into GHC.Types.SrcLoc · 12b997df
      Alan Zimmerman authored and Marge Bot's avatar Marge Bot committed
      This allows us to use a NoCommentsLocation for the possibly trailing
      comma location in a StringLiteral.
      This in turn allows us to correctly roundtrip via makeDeltaAst.
      12b997df
    • Alan Zimmerman's avatar
      EPA: Use EpaLocation in WarningTxt · 3b7b0c1c
      Alan Zimmerman authored and Marge Bot's avatar Marge Bot committed
      This allows us to use an EpDelta if needed when using makeDeltaAst.
      3b7b0c1c
    • Hannes Siebenhandl's avatar
      Eliminate name thunk in declaration fingerprinting · fbb91a63
      Hannes Siebenhandl authored and Marge Bot's avatar Marge Bot committed
      Thunk analysis showed that we have about 100_000 thunks (in agda and
      `-fwrite-simplified-core`) pointing to the name of the name decl.
      Forcing this thunk fixes this issue.
      
      The thunk created here is retained by the thunk created by forkM, it is
      better to eagerly force this because the result (a `Name`) is already
      retained indirectly via the `IfaceDecl`.
      fbb91a63
    • Matthew Pickering's avatar
      Force in_multi to avoid retaining entire hsc_env · c6def949
      Matthew Pickering authored and Marge Bot's avatar Marge Bot committed
      c6def949
    • Hannes Siebenhandl's avatar
      Never UNPACK `FastMutInt` for counting z-encoded `FastString`s · f2cc1107
      Hannes Siebenhandl authored and Marge Bot's avatar Marge Bot committed
      In `FastStringTable`, we count the number of z-encoded FastStrings
      that exist in a GHC session.
      We used to UNPACK the counters to not waste memory, but live retainer
      analysis showed that we allocate a lot of `FastMutInt`s, retained by
      `mkFastZString`.
      
      We lazily compute the `FastZString`, only incrementing the counter when the `FastZString` is
      forced.
      The function `mkFastStringWith` calls `mkZFastString` and boxes the
      `FastMutInt`, leading to the following core:
      
          mkFastStringWith
            = \ mk_fs _  ->
                   = case stringTable of
                      { FastStringTable _ n_zencs segments# _ ->
                          ...
                               case ((mk_fs (I# ...) (FastMutInt n_zencs))
                                  `cast` <Co:2> :: ...)
                                  ...
      
      Marking this field as `NOUNPACK` avoids this reboxing, eliminating the
      allocation of a fresh `FastMutInt` on every `FastString` allocation.
      f2cc1107
    • Hannes Siebenhandl's avatar
      Avoid UArray when indexing is not required · 88cb3e10
      Hannes Siebenhandl authored and Marge Bot's avatar Marge Bot committed
      `UnlinkedBCO`'s can occur many times in the heap. Each `UnlinkedBCO`
      references two `UArray`'s but never indexes them. They are only needed
      to encode the elements into a `ByteArray#`. The three words for
      the lower bound, upper bound and number of elements are essentially
      unused, thus we replace `UArray` with a wrapper around `ByteArray#`.
      This saves us up to three words for each `UnlinkedBCO`.
      
      Further, to avoid re-allocating these words for `ResolvedBCO`, we repeat
      the procedure for `ResolvedBCO` and add custom `Binary` and `Show` instances.
      
      For example, agda's repl session has around 360_000 UnlinkedBCO's,
      so avoiding these three words is already saving us around 8MB residency.
      88cb3e10
  5. Apr 05, 2024
  6. Apr 04, 2024
  7. Apr 03, 2024
    • Simon Peyton Jones's avatar
      Account for bottoming functions in OccurAnal · 271a7812
      Simon Peyton Jones authored and Marge Bot's avatar Marge Bot committed
      This fixes #24582, a small but long-standing bug
      271a7812
    • Simon Peyton Jones's avatar
      Testsuite message changes from simplifier improvements · 27db3c5e
      Simon Peyton Jones authored and Marge Bot's avatar Marge Bot committed
      27db3c5e
    • Simon Peyton Jones's avatar
      Simplifier improvements · e026bdf2
      Simon Peyton Jones authored and Marge Bot's avatar Marge Bot committed
      This MR started as: allow the simplifer to do more in one pass,
      arising from places I could see the simplifier taking two iterations
      where one would do.  But it turned into a larger project, because
      these changes unexpectedly made inlining blow up, especially join
      points in deeply-nested cases.
      
      The main changes are below.  There are also many new or rewritten Notes.
      
      Avoiding simplifying repeatedly
      ~~~~~~~~~~~~~~~
      See Note [Avoiding simplifying repeatedly]
      
      * The SimplEnv now has a seInlineDepth field, which says how deep
        in unfoldings we are.  See Note [Inline depth] in Simplify.Env.
        Currently used only for the next point: avoiding repeatedly
        simplifying coercions.
      
      * Avoid repeatedly simplifying coercions.
        see Note [Avoid re-simplifying coercions] in Simplify.Iteration
        As you'll see from the Note, this makes use of the seInlineDepth.
      
      * Allow Simplify.Iteration.simplAuxBind to inline used-once things.
        This is another part of Note [Post-inline for single-use things], and
        is really good for reducing simplifier iterations in situations like
            case K e of { K x -> blah }
        wher x is used once in blah.
      
      * Make GHC.Core.SimpleOpt.exprIsConApp_maybe do some simple case
        elimination.  Note [Case elim in exprIsConApp_maybe]
      
      * Improve the case-merge transformation:
        - Move the main code to `GHC.Core.Utils.mergeCaseAlts`, to join `filterAlts`
          and friends.  See Note [Merge Nested Cases] in GHC.Core.Utils.
        - Add a new case for `tagToEnum#`; see wrinkle (MC3).
        - Add a new case to look through join points: see wrinkle (MC4)
      
      postInlineUnconditionally
      ~~~~~~~~~~~~~~~~~~~~~~~~~
      * Allow Simplify.Utils.postInlineUnconditionally to inline variables
        that are used exactly once. See Note [Post-inline for single-use things].
      
      * Do not postInlineUnconditionally join point, ever.
        Doing so does not reduce allocation, which is the main point,
        and with join points that are used a lot it can bloat code.
        See point (1) of Note [Duplicating join points] in
        GHC.Core.Opt.Simplify.Iteration.
      
      * Do not postInlineUnconditionally a strict (demanded) binding.
        It will not allocate a thunk (it'll turn into a case instead)
        so again the main point of inlining it doesn't hold.  Better
        to check per-call-site.
      
      * Improve occurrence analyis for bottoming function calls, to help
        postInlineUnconditionally.  See Note [Bottoming function calls]
        in GHC.Core.Opt.OccurAnal
      
      Inlining generally
      ~~~~~~~~~~~~~~~~~~
      * In GHC.Core.Opt.Simplify.Utils.interestingCallContext,
        use RhsCtxt NonRecursive (not BoringCtxt) for a plain-seq case.
        See Note [Seq is boring]  Also, wrinkle (SB1), inline in that
        `seq` context only for INLINE functions (UnfWhen guidance).
      
      * In GHC.Core.Opt.Simplify.Utils.interestingArg,
        - return ValueArg for OtherCon [c1,c2, ...], but
        - return NonTrivArg for OtherCon []
        This makes a function a little less likely to inline if all we
        know is that the argument is evaluated, but nothing else.
      
      * isConLikeUnfolding is no longer true for OtherCon {}.
        This propagates to exprIsConLike.  Con-like-ness has /positive/
        information.
      
      Join points
      ~~~~~~~~~~~
      * Be very careful about inlining join points.
        See these two long Notes
          Note [Duplicating join points] in GHC.Core.Opt.Simplify.Iteration
          Note [Inlining join points] in GHC.Core.Opt.Simplify.Inline
      
      * When making join points, don't do so if the join point is so small
        it will immediately be inlined; check uncondInlineJoin.
      
      * In GHC.Core.Opt.Simplify.Inline.tryUnfolding, improve the inlining
        heuristics for join points. In general we /do not/ want to inline
        join points /even if they are small/.  See Note [Duplicating join points]
        GHC.Core.Opt.Simplify.Iteration.
      
        But sometimes we do: see Note [Inlining join points] in
        GHC.Core.Opt.Simplify.Inline; and the new `isBetterUnfoldingThan` function.
      
      * Do not add an unfolding to a join point at birth.  This is a tricky one
        and has a long Note [Do not add unfoldings to join points at birth]
        It shows up in two places
        - In `mkDupableAlt` do not add an inlining
        - (trickier) In `simplLetUnfolding` don't add an unfolding for a
          fresh join point
        I am not fully satisifed with this, but it works and is well documented.
      
      * In GHC.Core.Unfold.sizeExpr, make jumps small, so that we don't penalise
        having a non-inlined join point.
      
      Performance changes
      ~~~~~~~~~~~~~~~~~~~
      * Binary sizes fall by around 2.6%, according to nofib.
      
      * Compile times improve slightly. Here are the figures over 1%.
      
        I investiate the biggest differnce in T18304. It's a very small module, just
        a few hundred nodes. The large percentage difffence is due to a single
        function that didn't quite inline before, and does now, making code size a
        bit bigger.  I decided gains outweighed the losses.
      
          Metrics: compile_time/bytes allocated (changes over +/- 1%)
          ------------------------------------------------
                 CoOpt_Singletons(normal)   -9.2% GOOD
                      LargeRecord(normal)  -23.5% GOOD
      MultiComponentModulesRecomp(normal)   +1.2%
      MultiLayerModulesTH_OneShot(normal)   +4.1%  BAD
                        PmSeriesS(normal)   -3.8%
                        PmSeriesV(normal)   -1.5%
                           T11195(normal)   -1.3%
                           T12227(normal)  -20.4% GOOD
                           T12545(normal)   -3.2%
                           T12707(normal)   -2.1% GOOD
                           T13253(normal)   -1.2%
                       T13253-spj(normal)   +8.1%  BAD
                           T13386(normal)   -3.1% GOOD
                           T14766(normal)   -2.6% GOOD
                           T15164(normal)   -1.4%
                           T15304(normal)   +1.2%
                           T15630(normal)   -8.2%
                          T15630a(normal)          NEW
                           T15703(normal)  -14.7% GOOD
                           T16577(normal)   -2.3% GOOD
                           T17516(normal)  -39.7% GOOD
                           T18140(normal)   +1.2%
                           T18223(normal)  -17.1% GOOD
                           T18282(normal)   -5.0% GOOD
                           T18304(normal)  +10.8%  BAD
                           T18923(normal)   -2.9% GOOD
                            T1969(normal)   +1.0%
                           T19695(normal)   -1.5%
                           T20049(normal)  -12.7% GOOD
                          T21839c(normal)   -4.1% GOOD
                            T3064(normal)   -1.5%
                            T3294(normal)   +1.2%  BAD
                            T4801(normal)   +1.2%
                            T5030(normal)  -15.2% GOOD
                         T5321Fun(normal)   -2.2% GOOD
                            T6048(optasm)  -16.8% GOOD
                             T783(normal)   -1.2%
                            T8095(normal)   -6.0% GOOD
                            T9630(normal)   -4.7% GOOD
                            T9961(normal)   +1.9%  BAD
                            WWRec(normal)   -1.4%
              info_table_map_perf(normal)   -1.3%
                       parsing001(normal)   +1.5%
      
                                geo. mean   -2.0%
                                minimum    -39.7%
                                maximum    +10.8%
      
      * Runtimes generally improve. In the testsuite perf/should_run gives:
         Metrics: runtime/bytes allocated
         ------------------------------------------
                   Conversions(normal)   -0.3%
                       T13536a(optasm)  -41.7% GOOD
                         T4830(normal)   -0.1%
                 haddock.Cabal(normal)   -0.1%
                  haddock.base(normal)   -0.1%
              haddock.compiler(normal)   -0.1%
      
                             geo. mean   -0.8%
                             minimum    -41.7%
                             maximum     +0.0%
      
      * For runtime, nofib is a better test.  The news is mostly good.
        Here are the number more than +/- 0.1%:
      
          # bytes allocated
          ==========================++==========
             imaginary/digits-of-e1 ||  -14.40%
             imaginary/digits-of-e2 ||   -4.41%
                imaginary/paraffins ||   -0.17%
                     imaginary/rfib ||   -0.15%
             imaginary/wheel-sieve2 ||   -0.10%
                      real/compress ||   -0.47%
                         real/fluid ||   -0.10%
                        real/fulsom ||   +0.14%
                        real/gamteb ||   -1.47%
                            real/gg ||   -0.20%
                         real/infer ||   +0.24%
                           real/pic ||   -0.23%
                        real/prolog ||   -0.36%
                           real/scs ||   -0.46%
                       real/smallpt ||   +4.03%
              shootout/k-nucleotide ||  -20.23%
                    shootout/n-body ||   -0.42%
             shootout/spectral-norm ||   -0.13%
                    spectral/boyer2 ||   -3.80%
               spectral/constraints ||   -0.27%
                spectral/hartel/ida ||   -0.82%
                      spectral/mate ||  -20.34%
                      spectral/para ||   +0.46%
                   spectral/rewrite ||   +1.30%
                    spectral/sphere ||   -0.14%
          ==========================++==========
                          geom mean ||   -0.59%
      
          real/smallpt has a huge nest of local definitions, and I
          could not pin down a reason for a regression.  But there are
          three big wins!
      
      Metric Decrease:
          CoOpt_Singletons
          LargeRecord
          T12227
          T12707
          T13386
          T13536a
          T14766
          T15703
          T16577
          T17516
          T18223
          T18282
          T18923
          T21839c
          T20049
          T5321Fun
          T5030
          T6048
          T8095
          T9630
          T783
      Metric Increase:
          MultiLayerModulesTH_OneShot
          T13253-spj
          T18304
          T18698a
          T9961
          T3294
      e026bdf2
    • Simon Peyton Jones's avatar
      Remove a long-commented-out line · b4581e23
      Simon Peyton Jones authored and Marge Bot's avatar Marge Bot committed
      Pure refactoring
      b4581e23
    • Simon Peyton Jones's avatar
      Use named record fields for the CastIt { ... } data constructor · e9297181
      Simon Peyton Jones authored and Marge Bot's avatar Marge Bot committed
      This is a pure refactor
      e9297181
    • Simon Peyton Jones's avatar
      Slight improvement in WorkWrap · ae24c9bc
      Simon Peyton Jones authored and Marge Bot's avatar Marge Bot committed
      Ensure that WorkWrap preserves lambda binders, in case of join points.  Sadly I
      have forgotten why I made this change (it was while I was doing a lot of
      meddling in the Simplifier, but
        * it does no harm,
        * it is slightly more efficient, and
        * presumably it made something better!
      
      Anyway I have kept it in a separate commit.
      ae24c9bc
    • Simon Peyton Jones's avatar
      Inline GHC.HsToCore.Pmc.Solver.Types.trvVarInfo · 609cd32c
      Simon Peyton Jones authored and Marge Bot's avatar Marge Bot committed
      When exploring compile-time regressions after meddling with the Simplifier, I
      discovered that GHC.HsToCore.Pmc.Solver.Types.trvVarInfo was very delicately
      balanced.  It's a small, heavily used, overloaded function and it's important
      that it inlines. By a fluke it was before, but at various times in my journey it
      stopped doing so.  So I just added an INLINE pragma to it; no sense in depending
      on a delicately-balanced fluke.
      609cd32c
    • Simon Peyton Jones's avatar
      Improve exprIsConApp_maybe a little · bdf1660f
      Simon Peyton Jones authored and Marge Bot's avatar Marge Bot committed
      Eliminate a redundant case at birth.  This sometimes reduces
      Simplifier iterations.
      
      See Note [Case elim in exprIsConApp_maybe].
      bdf1660f
    • Simon Peyton Jones's avatar
      Spelling, layout, pretty-printing only · 95a9a172
      Simon Peyton Jones authored and Marge Bot's avatar Marge Bot committed
      95a9a172
Loading