Commits · 55eb8c98895308d2dd025f7bd64c0b80fce6ace3 · Reinier Maas / GHC

Apr 12, 2024

ghc-internal: Fix mentions of ghc-internal in deprecation warnings · 55eb8c98
Ben Gamari authored 11 months ago and Marge Bot committed 11 months ago
```
Closes #24609.
```
55eb8c98
base: Deprecate GHC.Pack · a4bb3a51
Ben Gamari authored 11 months ago and Marge Bot committed 11 months ago
```
As proposed in #21461.

Closes #21540.
```
a4bb3a51
testsuite: Add broken test for CApiFFI with -fprefer-bytecode · d23afb8c
Ben Gamari authored 11 months ago and Marge Bot committed 11 months ago
```
See #24634.
```
d23afb8c
RTS: Emit warning when -M < -H · 23c3e624
Andreas Klebinger authored 1 year ago and Marge Bot committed 11 months ago
```
Fixes #24487
```
23c3e624

driver: Make `checkHomeUnitsClosed` faster · a933aff3

Zubin authored 1 year ago and

Marge Bot committed 11 months ago

The implementation of `checkHomeUnitsClosed` was traversing every single path
in the unit dependency graph - this grows exponentially and quickly grows to be
infeasible on larger unit dependency graphs.

Instead we replace this with a faster implementation which follows from the
specificiation of the closure property - there is a closure error if there are
units which are both are both (transitively) depended upon by home units and
(transitively) depend on home units, but are not themselves home units.

To compute the set of units required for closure, we first compute the closure
of the unit dependency graph, then the transpose of this closure, and find all
units that are reachable from the home units in the transpose of the closure.

a933aff3

users-guide: Clarify language extension documentation · 6e18ce2b

Ben Gamari authored 1 year ago and

Marge Bot committed 11 months ago

Over the years the users guide's language extension documentation has
gone through quite a few refactorings. In the process some of the
descriptions have been rendered non-sensical. For instance, the
description of `NoImplicitPrelude` actually describes the semantics of
`ImplicitPrelude`.

To fix this we:

 * ensure that all extensions are named in their "positive" sense (e.g.
   `ImplicitPrelude` rather than `NoImplicitPrelude`).
 * rework the documentation to avoid flag-oriented wording
   like "enable" and "disable"
 * ensure that the polarity of the documentation is consistent with
   reality.

Fixes #23895.

6e18ce2b

Apr 10, 2024

EPA: Remove unnecessary XRec in CompleteMatchSig · 1b1a92bd
Alan Zimmerman authored 11 months ago and Marge Bot committed 11 months ago
```
The XRec for [LIdP pass] is not needed for exact printing, remove it.
```
1b1a92bd
testsuite: Add test for lookupSymbolInNativeObj · dccd3ea1
Ben Gamari authored 11 months ago and Marge Bot committed 11 months ago

dccd3ea1

Use symbol cache in internal interpreter too · 12931698

Rodrigo Mesquita authored 1 year ago and

Marge Bot committed 11 months ago

This commit makes the symbol cache that was used by the external
interpreter available for the internal interpreter too.

This follows from the analysis in #23415 that suggests the internal
interpreter could benefit from this cache too, and that there is no good
reason not to have the cache for it too. It also makes it a bit more
uniform to have the symbol cache range over both the internal and
external interpreter.

This commit also refactors the cache into a function which is used by
both `lookupSymbol` and also by `lookupSymbolInDLL`, extending the
caching logic to `lookupSymbolInDLL` too.

12931698

rts: Make addDLL a wrapper around loadNativeObj · dcfaa190

Rodrigo Mesquita authored 11 months ago and

Marge Bot committed 11 months ago

Rewrite the implementation of `addDLL` as a wrapper around the more
principled `loadNativeObj` rts linker function. The latter should be
preferred while the former is preserved for backwards compatibility.

`loadNativeObj` was previously only available on ELF platforms, so this
commit further refactors the rts linker to transform loadNativeObj_ELF
into loadNativeObj_POSIX, which is available in ELF and MachO platforms.

The refactor made it possible to remove the `dl_mutex` mutex in favour
of always using `linker_mutex` (rather than a combination of both).

Lastly, we implement `loadNativeObj` for Windows too.

dcfaa190

linker: Avoid linear search when looking up Haskell symbols via dlsym · e008a19a

Alexis King authored 1 year ago and

Marge Bot committed 11 months ago


See the primary Note [Looking up symbols in the relevant objects] for a
more in-depth explanation.

When dynamically loading a Haskell symbol (typical when running a splice or
GHCi expression), before this commit we would search for the symbol in
all dynamic libraries that were loaded. However, this could be very
inefficient when too many packages are loaded (which can happen if there are
many package dependencies) because the time to lookup the would be
linear in the number of packages loaded.

This commit drastically improves symbol loading performance by
introducing a mapping from units to the handles of corresponding loaded
dlls. These handles are returned by dlopen when we load a dll, and can
then be used to look up in a specific dynamic library.

Looking up a given Name is now much more precise because we can get
lookup its unit in the mapping and lookup the symbol solely in the
handles of the dynamic libraries loaded for that unit.

In one measurement, the wait time before the expression was executed
went from +-38 seconds down to +-2s.

This commit also includes Note [Symbols may not be found in pkgs_loaded],
explaining the fallback to the old behaviour in case no dll can be found
in the unit mapping for a given Name.

Fixes #23415

Co-authored-by: Rodrigo Mesquita <(@alt-romes)>

e008a19a

rts: free error message before returning · dd530bb7
Rodrigo Mesquita authored 1 year ago and Marge Bot committed 11 months ago
```
Fixes a memory leak in rts/linker/PEi386.c
```
dd530bb7
Validate -main-is flag using parseIdentifier · 3d0806fc
Jade authored 1 year ago and Marge Bot committed 11 months ago
```
Fixes #24368
```
3d0806fc

Apr 09, 2024

EPA: Capture all comments in a ClassDecl · be3bddde
Alan Zimmerman authored 11 months ago and Marge Bot committed 11 months ago
```
Hopefully the final fix needed for #24533
```
be3bddde

Prefer packed representation for CompiledByteCode · 868c8a78

Hannes Siebenhandl authored 11 months ago and

Marge Bot committed 11 months ago

As there are many 'CompiledByteCode' objects alive during a GHCi
session, representing its element in a more packed manner improves space
behaviour at a minimal cost.

When running GHCi on the agda codebase, we find around 380 live
'CompiledByteCode' objects. Packing their respective 'UnlinkedByteCode'
can save quite some pointers.

868c8a78

Apr 08, 2024

EPA: Move DeltaPos and EpaLocation' into GHC.Types.SrcLoc · 12b997df

Alan Zimmerman authored 11 months ago and

Marge Bot committed 11 months ago

This allows us to use a NoCommentsLocation for the possibly trailing
comma location in a StringLiteral.
This in turn allows us to correctly roundtrip via makeDeltaAst.

12b997df

EPA: Use EpaLocation in WarningTxt · 3b7b0c1c
Alan Zimmerman authored 11 months ago and Marge Bot committed 11 months ago
```
This allows us to use an EpDelta if needed when using makeDeltaAst.
```
3b7b0c1c

Eliminate name thunk in declaration fingerprinting · fbb91a63

Hannes Siebenhandl authored 11 months ago and

Marge Bot committed 11 months ago

Thunk analysis showed that we have about 100_000 thunks (in agda and
`-fwrite-simplified-core`) pointing to the name of the name decl.
Forcing this thunk fixes this issue.

The thunk created here is retained by the thunk created by forkM, it is
better to eagerly force this because the result (a `Name`) is already
retained indirectly via the `IfaceDecl`.

fbb91a63

Force in_multi to avoid retaining entire hsc_env · c6def949
Matthew Pickering authored 1 year ago and Marge Bot committed 11 months ago

c6def949

Never UNPACK `FastMutInt` for counting z-encoded `FastString`s · f2cc1107

Hannes Siebenhandl authored 11 months ago and

Marge Bot committed 11 months ago

In `FastStringTable`, we count the number of z-encoded FastStrings
that exist in a GHC session.
We used to UNPACK the counters to not waste memory, but live retainer
analysis showed that we allocate a lot of `FastMutInt`s, retained by
`mkFastZString`.

We lazily compute the `FastZString`, only incrementing the counter when the `FastZString` is
forced.
The function `mkFastStringWith` calls `mkZFastString` and boxes the
`FastMutInt`, leading to the following core:

    mkFastStringWith
      = \ mk_fs _  ->
             = case stringTable of
                { FastStringTable _ n_zencs segments# _ ->
                    ...
                         case ((mk_fs (I# ...) (FastMutInt n_zencs))
                            `cast` <Co:2> :: ...)
                            ...

Marking this field as `NOUNPACK` avoids this reboxing, eliminating the
allocation of a fresh `FastMutInt` on every `FastString` allocation.

f2cc1107

Avoid UArray when indexing is not required · 88cb3e10

Hannes Siebenhandl authored 1 year ago and

Marge Bot committed 11 months ago

`UnlinkedBCO`'s can occur many times in the heap. Each `UnlinkedBCO`
references two `UArray`'s but never indexes them. They are only needed
to encode the elements into a `ByteArray#`. The three words for
the lower bound, upper bound and number of elements are essentially
unused, thus we replace `UArray` with a wrapper around `ByteArray#`.
This saves us up to three words for each `UnlinkedBCO`.

Further, to avoid re-allocating these words for `ResolvedBCO`, we repeat
the procedure for `ResolvedBCO` and add custom `Binary` and `Show` instances.

For example, agda's repl session has around 360_000 UnlinkedBCO's,
so avoiding these three words is already saving us around 8MB residency.

88cb3e10

Apr 05, 2024

Remove accidentally committed test.hs · e8724327
Matthew Pickering authored 11 months ago and Marge Bot committed 11 months ago

e8724327
EPA: Use EpaLocation for RecFieldsDotDot · 19883a23
Alan Zimmerman authored 11 months ago and Marge Bot committed 11 months ago
```
So we can update it to a delta position in makeDeltaAst if needed.
```
19883a23
EPA: Use EpaLocation not SrcSpan in ForeignDecls · 1324b862
Alan Zimmerman authored 11 months ago and Marge Bot committed 11 months ago
```
This allows us to update them for makeDeltaAst in ghc-exactprint
```
1324b862

compiler: Allow more types in GHCForeignImportPrim · 9b9e031b

Ben Gamari authored 11 months ago and

Marge Bot committed 11 months ago

For many, many years `GHCForeignImportPrim` has suffered from the rather
restrictive limitation of not allowing any non-trivial types in arguments
or results. This limitation was justified by the code generator allegely
barfing in the presence of such types.

However, this restriction appears to originate well before the NCG
rewrite and the new NCG does not appear to have any trouble with such
types (see the added `T24598` test). Lift this restriction.

Fixes #24598.

9b9e031b

Apr 04, 2024

Fix off by one error in seekBinNoExpand and seekBin · 28009fbc
Matthew Pickering authored 1 year ago and Marge Bot committed 11 months ago

28009fbc

Change how invisible patterns represented in haskell syntax and TH AST (#24557) · 36a75b80

Andrei Borzenkov authored 11 months ago and

Marge Bot committed 11 months ago

Before this patch:
  data ArgPat p
    = InvisPat (LHsType p)
    | VisPat (LPat p)

With this patch:
  data Pat p
    = ...
    | InvisPat (LHsType p)
    ...

And the same transformation in the TH land. The rest of the
changes is just updating code to handle new AST and writing tests
to check if it is possible to create invalid states using TH.

Metric Increase:
    MultiLayerModulesTH_OneShot

36a75b80

Compact FlatBag array representation · 82cfe10c

Hannes Siebenhandl authored 1 year ago and

Marge Bot committed 11 months ago

`Array` contains three additional `Word`'s we do not need in `FlatBag`. Move
`FlatBag` to `SmallArray`.

Expand the API of SmallArray by `sizeofSmallArray` and add common
traversal functions, such as `mapSmallArray` and `foldMapSmallArray`.
Additionally, allow users to force the elements of a `SmallArray`
via `rnfSmallArray`.

82cfe10c

Replace `SizedSeq` with `FlatBag` for flattened structure · 5f085d3a

Hannes Siebenhandl authored 1 year ago and

Marge Bot committed 11 months ago

LinkedLists are notoriously memory ineffiecient when all we do is
traversing a structure.
As 'UnlinkedBCO' has been identified as a data structure that impacts
the overall memory usage of GHCi sessions, we avoid linked lists and
prefer flattened structure for storing.

We introduce a new memory efficient representation of sequential
elements that has special support for the cases:

* Empty
* Singleton
* Tuple Elements

This improves sharing in the 'Empty' case and avoids the overhead of
'Array' until its constant overhead is justified.

5f085d3a

Update correct counter in bumpTickyAllocd · 0c4a9686
Luite Stegeman authored 1 year ago and Marge Bot committed 11 months ago

0c4a9686
testsuite: Introduce template-haskell-exports test · 0fde229f
Ben Gamari authored 1 year ago and Marge Bot committed 11 months ago

0fde229f

Apr 03, 2024

Account for bottoming functions in OccurAnal · 271a7812
Simon Peyton Jones authored 1 year ago and Marge Bot committed 11 months ago
```
This fixes #24582, a small but long-standing bug
```
271a7812
Testsuite message changes from simplifier improvements · 27db3c5e
Simon Peyton Jones authored 1 year ago and Marge Bot committed 11 months ago

27db3c5e

Simplifier improvements · e026bdf2

Simon Peyton Jones authored 1 year ago and

Marge Bot committed 11 months ago

This MR started as: allow the simplifer to do more in one pass,
arising from places I could see the simplifier taking two iterations
where one would do.  But it turned into a larger project, because
these changes unexpectedly made inlining blow up, especially join
points in deeply-nested cases.

The main changes are below.  There are also many new or rewritten Notes.

Avoiding simplifying repeatedly
~~~~~~~~~~~~~~~
See Note [Avoiding simplifying repeatedly]

* The SimplEnv now has a seInlineDepth field, which says how deep
  in unfoldings we are.  See Note [Inline depth] in Simplify.Env.
  Currently used only for the next point: avoiding repeatedly
  simplifying coercions.

* Avoid repeatedly simplifying coercions.
  see Note [Avoid re-simplifying coercions] in Simplify.Iteration
  As you'll see from the Note, this makes use of the seInlineDepth.

* Allow Simplify.Iteration.simplAuxBind to inline used-once things.
  This is another part of Note [Post-inline for single-use things], and
  is really good for reducing simplifier iterations in situations like
      case K e of { K x -> blah }
  wher x is used once in blah.

* Make GHC.Core.SimpleOpt.exprIsConApp_maybe do some simple case
  elimination.  Note [Case elim in exprIsConApp_maybe]

* Improve the case-merge transformation:
  - Move the main code to `GHC.Core.Utils.mergeCaseAlts`, to join `filterAlts`
    and friends.  See Note [Merge Nested Cases] in GHC.Core.Utils.
  - Add a new case for `tagToEnum#`; see wrinkle (MC3).
  - Add a new case to look through join points: see wrinkle (MC4)

postInlineUnconditionally
~~~~~~~~~~~~~~~~~~~~~~~~~
* Allow Simplify.Utils.postInlineUnconditionally to inline variables
  that are used exactly once. See Note [Post-inline for single-use things].

* Do not postInlineUnconditionally join point, ever.
  Doing so does not reduce allocation, which is the main point,
  and with join points that are used a lot it can bloat code.
  See point (1) of Note [Duplicating join points] in
  GHC.Core.Opt.Simplify.Iteration.

* Do not postInlineUnconditionally a strict (demanded) binding.
  It will not allocate a thunk (it'll turn into a case instead)
  so again the main point of inlining it doesn't hold.  Better
  to check per-call-site.

* Improve occurrence analyis for bottoming function calls, to help
  postInlineUnconditionally.  See Note [Bottoming function calls]
  in GHC.Core.Opt.OccurAnal

Inlining generally
~~~~~~~~~~~~~~~~~~
* In GHC.Core.Opt.Simplify.Utils.interestingCallContext,
  use RhsCtxt NonRecursive (not BoringCtxt) for a plain-seq case.
  See Note [Seq is boring]  Also, wrinkle (SB1), inline in that
  `seq` context only for INLINE functions (UnfWhen guidance).

* In GHC.Core.Opt.Simplify.Utils.interestingArg,
  - return ValueArg for OtherCon [c1,c2, ...], but
  - return NonTrivArg for OtherCon []
  This makes a function a little less likely to inline if all we
  know is that the argument is evaluated, but nothing else.

* isConLikeUnfolding is no longer true for OtherCon {}.
  This propagates to exprIsConLike.  Con-like-ness has /positive/
  information.

Join points
~~~~~~~~~~~
* Be very careful about inlining join points.
  See these two long Notes
    Note [Duplicating join points] in GHC.Core.Opt.Simplify.Iteration
    Note [Inlining join points] in GHC.Core.Opt.Simplify.Inline

* When making join points, don't do so if the join point is so small
  it will immediately be inlined; check uncondInlineJoin.

* In GHC.Core.Opt.Simplify.Inline.tryUnfolding, improve the inlining
  heuristics for join points. In general we /do not/ want to inline
  join points /even if they are small/.  See Note [Duplicating join points]
  GHC.Core.Opt.Simplify.Iteration.

  But sometimes we do: see Note [Inlining join points] in
  GHC.Core.Opt.Simplify.Inline; and the new `isBetterUnfoldingThan` function.

* Do not add an unfolding to a join point at birth.  This is a tricky one
  and has a long Note [Do not add unfoldings to join points at birth]
  It shows up in two places
  - In `mkDupableAlt` do not add an inlining
  - (trickier) In `simplLetUnfolding` don't add an unfolding for a
    fresh join point
  I am not fully satisifed with this, but it works and is well documented.

* In GHC.Core.Unfold.sizeExpr, make jumps small, so that we don't penalise
  having a non-inlined join point.

Performance changes
~~~~~~~~~~~~~~~~~~~
* Binary sizes fall by around 2.6%, according to nofib.

* Compile times improve slightly. Here are the figures over 1%.

  I investiate the biggest differnce in T18304. It's a very small module, just
  a few hundred nodes. The large percentage difffence is due to a single
  function that didn't quite inline before, and does now, making code size a
  bit bigger.  I decided gains outweighed the losses.

    Metrics: compile_time/bytes allocated (changes over +/- 1%)
    ------------------------------------------------
           CoOpt_Singletons(normal)   -9.2% GOOD
                LargeRecord(normal)  -23.5% GOOD
MultiComponentModulesRecomp(normal)   +1.2%
MultiLayerModulesTH_OneShot(normal)   +4.1%  BAD
                  PmSeriesS(normal)   -3.8%
                  PmSeriesV(normal)   -1.5%
                     T11195(normal)   -1.3%
                     T12227(normal)  -20.4% GOOD
                     T12545(normal)   -3.2%
                     T12707(normal)   -2.1% GOOD
                     T13253(normal)   -1.2%
                 T13253-spj(normal)   +8.1%  BAD
                     T13386(normal)   -3.1% GOOD
                     T14766(normal)   -2.6% GOOD
                     T15164(normal)   -1.4%
                     T15304(normal)   +1.2%
                     T15630(normal)   -8.2%
                    T15630a(normal)          NEW
                     T15703(normal)  -14.7% GOOD
                     T16577(normal)   -2.3% GOOD
                     T17516(normal)  -39.7% GOOD
                     T18140(normal)   +1.2%
                     T18223(normal)  -17.1% GOOD
                     T18282(normal)   -5.0% GOOD
                     T18304(normal)  +10.8%  BAD
                     T18923(normal)   -2.9% GOOD
                      T1969(normal)   +1.0%
                     T19695(normal)   -1.5%
                     T20049(normal)  -12.7% GOOD
                    T21839c(normal)   -4.1% GOOD
                      T3064(normal)   -1.5%
                      T3294(normal)   +1.2%  BAD
                      T4801(normal)   +1.2%
                      T5030(normal)  -15.2% GOOD
                   T5321Fun(normal)   -2.2% GOOD
                      T6048(optasm)  -16.8% GOOD
                       T783(normal)   -1.2%
                      T8095(normal)   -6.0% GOOD
                      T9630(normal)   -4.7% GOOD
                      T9961(normal)   +1.9%  BAD
                      WWRec(normal)   -1.4%
        info_table_map_perf(normal)   -1.3%
                 parsing001(normal)   +1.5%

                          geo. mean   -2.0%
                          minimum    -39.7%
                          maximum    +10.8%

* Runtimes generally improve. In the testsuite perf/should_run gives:
   Metrics: runtime/bytes allocated
   ------------------------------------------
             Conversions(normal)   -0.3%
                 T13536a(optasm)  -41.7% GOOD
                   T4830(normal)   -0.1%
           haddock.Cabal(normal)   -0.1%
            haddock.base(normal)   -0.1%
        haddock.compiler(normal)   -0.1%

                       geo. mean   -0.8%
                       minimum    -41.7%
                       maximum     +0.0%

* For runtime, nofib is a better test.  The news is mostly good.
  Here are the number more than +/- 0.1%:

    # bytes allocated
    ==========================++==========
       imaginary/digits-of-e1 ||  -14.40%
       imaginary/digits-of-e2 ||   -4.41%
          imaginary/paraffins ||   -0.17%
               imaginary/rfib ||   -0.15%
       imaginary/wheel-sieve2 ||   -0.10%
                real/compress ||   -0.47%
                   real/fluid ||   -0.10%
                  real/fulsom ||   +0.14%
                  real/gamteb ||   -1.47%
                      real/gg ||   -0.20%
                   real/infer ||   +0.24%
                     real/pic ||   -0.23%
                  real/prolog ||   -0.36%
                     real/scs ||   -0.46%
                 real/smallpt ||   +4.03%
        shootout/k-nucleotide ||  -20.23%
              shootout/n-body ||   -0.42%
       shootout/spectral-norm ||   -0.13%
              spectral/boyer2 ||   -3.80%
         spectral/constraints ||   -0.27%
          spectral/hartel/ida ||   -0.82%
                spectral/mate ||  -20.34%
                spectral/para ||   +0.46%
             spectral/rewrite ||   +1.30%
              spectral/sphere ||   -0.14%
    ==========================++==========
                    geom mean ||   -0.59%

    real/smallpt has a huge nest of local definitions, and I
    could not pin down a reason for a regression.  But there are
    three big wins!

Metric Decrease:
    CoOpt_Singletons
    LargeRecord
    T12227
    T12707
    T13386
    T13536a
    T14766
    T15703
    T16577
    T17516
    T18223
    T18282
    T18923
    T21839c
    T20049
    T5321Fun
    T5030
    T6048
    T8095
    T9630
    T783
Metric Increase:
    MultiLayerModulesTH_OneShot
    T13253-spj
    T18304
    T18698a
    T9961
    T3294

e026bdf2

Remove a long-commented-out line · b4581e23
Simon Peyton Jones authored 1 year ago and Marge Bot committed 11 months ago
```
Pure refactoring
```
b4581e23
Use named record fields for the CastIt { ... } data constructor · e9297181
Simon Peyton Jones authored 1 year ago and Marge Bot committed 11 months ago
```
This is a pure refactor
```
e9297181

Slight improvement in WorkWrap · ae24c9bc

Simon Peyton Jones authored 1 year ago and

Marge Bot committed 11 months ago

Ensure that WorkWrap preserves lambda binders, in case of join points.  Sadly I
have forgotten why I made this change (it was while I was doing a lot of
meddling in the Simplifier, but
  * it does no harm,
  * it is slightly more efficient, and
  * presumably it made something better!

Anyway I have kept it in a separate commit.

ae24c9bc

Inline GHC.HsToCore.Pmc.Solver.Types.trvVarInfo · 609cd32c

Simon Peyton Jones authored 1 year ago and

Marge Bot committed 11 months ago

When exploring compile-time regressions after meddling with the Simplifier, I
discovered that GHC.HsToCore.Pmc.Solver.Types.trvVarInfo was very delicately
balanced. It's a small, heavily used, overloaded function and it's important
that it inlines. By a fluke it was before, but at various times in my journey it
stopped doing so. So I just added an INLINE pragma to it; no sense in depending
on a delicately-balanced fluke.

609cd32c

Improve exprIsConApp_maybe a little · bdf1660f

Simon Peyton Jones authored 1 year ago and

Marge Bot committed 11 months ago

Eliminate a redundant case at birth.  This sometimes reduces
Simplifier iterations.

See Note [Case elim in exprIsConApp_maybe].

bdf1660f

Spelling, layout, pretty-printing only · 95a9a172
Simon Peyton Jones authored 1 year ago and Marge Bot committed 11 months ago

95a9a172