Commits · ed0b69dc069a23692bff89cfcab1ad9d61999506 · Reinier Maas / GHC

Mar 01, 2024

Introduce ListTuplePuns extension · d91d00fc

Torsten Schmits authored 2 years ago and

Marge Bot committed 1 year ago

This implements Proposal 0475, introducing the `ListTuplePuns` extension
which is enabled by default.

Disabling this extension makes it invalid to refer to list, tuple and
sum type constructors by using built-in syntax like `[Int]`,
`(Int, Int)`, `(# Int#, Int# #)` or `(# Int | Int #)`.
Instead, this syntax exclusively denotes data constructors for use with
`DataKinds`.
The conventional way of referring to these data constructors by
prefixing them with a single quote (`'(Int, Int)`) is now a parser
error.

Tuple declarations have been moved to `GHC.Tuple.Prim` and the `Solo`
data constructor has been renamed to `MkSolo` (in a previous commit).
Unboxed tuples and sums now have real source declarations in `GHC.Types`.
Unit and solo types for tuples are now called `Unit`, `Unit#`, `Solo`
and `Solo#`.
Constraint tuples now have the unambiguous type constructors `CTuple<n>`
as well as `CUnit` and `CSolo`, defined in `GHC.Classes` like before.

A new parser construct has been added for the unboxed sum data
constructor declarations.

The type families `Tuple`, `Sum#` etc. that were intended to provide
nicer syntax have been omitted from this change set due to inference
problems, to be implemented at a later time.
See the MR discussion for more info.

Updates the submodule utils/haddock.
Updates the cabal submodule due to new language extension.

Metric Increase:
haddock.base

Metric Decrease:
MultiLayerModulesTH_OneShot
size_hello_artifact

Proposal document: https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0475-tuple-syntax.rst

Merge request: ghc/ghc!8820

Tracking ticket: ghc/ghc#21294

d91d00fc

Feb 25, 2024
- ghc-internal: Move modules into GHC.Internal.* namespace · d8d6ad8c
  Ben Gamari authored 1 year ago
  
  Bumps haddock submodule due to testsuite output changes.
  d8d6ad8c
Jan 20, 2024
- Fix Spelling in the compiler · 5b7fa20c
  Jade authored 1 year ago and Marge Bot committed 1 year ago
  
  Tracking: #16591
  5b7fa20c
Dec 06, 2023

Zap OccInfo on case binders during StgCse #14895 #24233 · 7ac6006e

Sylvain Henry authored 1 year ago and

Marge Bot committed 1 year ago

StgCse can revive dead binders:

  case foo of dead { Foo x y -> Foo x y; ... }
  ===>
  case foo of dead { Foo x y -> dead; ... } -- dead is no longer dead

So we must zap occurrence information on case binders.

Fix #14895 and #24233

7ac6006e

Oct 28, 2023
- Teach tag-inference about SeqOp/seq# · 9bc5cb92
  Matthew Craven authored 1 year ago and Marge Bot committed 1 year ago
  
  Fixes the STG/tag-inference analogue of #15226. Co-Authored-By: Simon Peyton Jones <simon.peytonjones@gmail.com>
  9bc5cb92
Sep 06, 2023
- Make STG rewriter produce updatable closures · 3930d793
  Jaro Reinders authored 1 year ago and Marge Bot committed 1 year ago
  
  3930d793
Feb 14, 2023
- Fix some correctness issues around tag inference when targeting the bytecode generator. · d6411d6c
  Andreas Klebinger authored 2 years ago and Marge Bot committed 2 years ago
  
  * Let binders are now always assumed untagged for bytecode. * Imported referenced are now always assumed to be untagged for bytecode. Fixes #22840
  d6411d6c
Jan 11, 2023

Misc cleanup · 083f7015

Krzysztof Gogolewski authored 2 years ago and

Marge Bot committed 2 years ago

- Remove unused mkWildEvBinder
- Use typeTypeOrConstraint - more symmetric and asserts that
  that the type is Type or Constraint
- Fix escape sequences in Python; they raise a deprecation warning
  with -Wdefault

083f7015

Oct 18, 2022

Fix GHCis interaction with tag inference. · 0ac60423

Andreas Klebinger authored 2 years ago and

Marge Bot committed 2 years ago

I had assumed that wrappers were not inlined in interactive mode.
Meaning we would always execute the compiled wrapper which properly
takes care of upholding the strict field invariant.
This turned out to be wrong. So instead we now run tag inference even
when we generate bytecode. In that case only for correctness not
performance reasons although it will be still beneficial for runtime
in some cases.

I further fixed a bug where GHCi didn't tag nullary constructors
properly when used as arguments. Which caused segfaults when calling
into compiled functions which expect the strict field invariant to
be upheld.

Fixes #22042 and #21083

-------------------------
Metric Increase:
    T4801

Metric Decrease:
    T13035
-------------------------

0ac60423

Sep 21, 2022

Don't use isUnliftedType in isTagged · 06ccad0d

sheaf authored 2 years ago and

Marge Bot committed 2 years ago

The function GHC.Stg.InferTags.Rewrite.isTagged can be given
the Id of a join point, which might be representation polymorphic.
This would cause the call to isUnliftedType to crash. It's better
to use typeLevity_maybe instead.

Fixes #22212

06ccad0d

Sep 15, 2022

Tag inference: Fix #21954 by retaining tagsigs of vars in function position. · d6ea8356

Andreas Klebinger authored 2 years ago

For an expression like:

    case x of y
      Con z -> z

If we also retain the tag sig for z we can generate code to immediately return
it rather than calling out to stg_ap_0_fast.

d6ea8356

May 09, 2022
- STG: only print cost-center if asked to · a4fbb589
  Sylvain Henry authored 2 years ago and Marge Bot committed 2 years ago
  
  a4fbb589
Feb 12, 2022

Tag inference work. · 0e93023e

Andreas Klebinger authored 3 years ago and

Matthew Pickering committed 3 years ago

This does three major things:
* Enforce the invariant that all strict fields must contain tagged
pointers.
* Try to predict the tag on bindings in order to omit tag checks.
* Allows functions to pass arguments unlifted (call-by-value).

The former is "simply" achieved by wrapping any constructor allocations with
a case which will evaluate the respective strict bindings.

The prediction is done by a new data flow analysis based on the STG
representation of a program. This also helps us to avoid generating
redudant cases for the above invariant.

StrictWorkers are created by W/W directly and SpecConstr indirectly.
See the Note [Strict Worker Ids]

Other minor changes:

* Add StgUtil module containing a few functions needed by, but
  not specific to the tag analysis.

-------------------------
Metric Decrease:
	T12545
	T18698b
	T18140
	T18923
        LargeRecord
Metric Increase:
        LargeRecord
	ManyAlternatives
	ManyConstructors
	T10421
	T12425
	T12707
	T13035
	T13056
	T13253
	T13253-spj
	T13379
	T15164
	T18282
	T18304
	T18698a
	T1969
	T20049
	T3294
	T4801
	T5321FD
	T5321Fun
	T783
	T9233
	T9675
	T9961
	T19695
	WWRec
-------------------------

0e93023e

Sep 30, 2021

Nested CPR light unleashed (#18174 ) · c261f220

Sebastian Graf authored 3 years ago and

Marge Bot committed 3 years ago

This patch enables worker/wrapper for nested constructed products, as described
in `Note [Nested CPR]`. The machinery for expressing Nested CPR was already
there, since !5054. Worker/wrapper is equipped to exploit Nested CPR annotations
since !5338. CPR analysis already handles applications in batches since !5753.
This patch just needs to flip a few more switches:

1. In `cprTransformDataConWork`, we need to look at the field expressions
   and their `CprType`s to see whether the evaluation of the expressions
   terminates quickly (= is in HNF) or if they are put in strict fields.
   If that is the case, then we retain their CPR info and may unbox nestedly
   later on. More details in `Note [Nested CPR]`.
2. Enable nested `ConCPR` signatures in `GHC.Types.Cpr`.
3. In the `asConCpr` call in `GHC.Core.Opt.WorkWrap.Utils`, pass CPR info of
   fields to the `Unbox`.
4. Instead of giving CPR signatures to DataCon workers and wrappers, we now have
   `cprTransformDataConWork` for workers and treat wrappers by analysing their
   unfolding. As a result, the code from GHC.Types.Id.Make went away completely.
5. I deactivated worker/wrappering for recursive DataCons and wrote a function
   `isRecDataCon` to detect them. We really don't want to give `repeat` or
   `replicate` the Nested CPR property.
   See Note [CPR for recursive data structures] for which kind of recursive
   DataCons we target.
6. Fix a couple of tests and their outputs.

I also documented that CPR can destroy sharing and lead to asymptotic increase
in allocations (which is tracked by #13331/#19326) in
`Note [CPR for data structures can destroy sharing]`.

Nofib results:
```
--------------------------------------------------------------------------------
        Program         Allocs    Instrs
--------------------------------------------------------------------------------
   ben-raytrace          -3.1%     -0.4%
   binary-trees          +0.8%     -2.9%
   digits-of-e2          +5.8%     +1.2%
          event          +0.8%     -2.1%
 fannkuch-redux          +0.0%     -1.4%
           fish           0.0%     -1.5%
         gamteb          -1.4%     -0.3%
        mkhprog          +1.4%     +0.8%
     multiplier          +0.0%     -1.9%
            pic          -0.6%     -0.1%
        reptile         -20.9%    -17.8%
      wave4main          +4.8%     +0.4%
           x2n1        -100.0%     -7.6%
--------------------------------------------------------------------------------
            Min         -95.0%    -17.8%
            Max          +5.8%     +1.2%
 Geometric Mean          -2.9%     -0.4%
```
The huge wins in x2n1 (loopy list) and reptile (see #19970) are due to
refraining from unboxing (:). Other benchmarks like digits-of-e2 or wave4main
regress because of that. Ultimately there are no great improvements due to
Nested CPR alone, but at least it's a win.
Binary sizes decrease by 0.6%.

There are a significant number of metric decreases. The most notable ones (>1%):
```
       ManyAlternatives(normal) ghc/alloc   771656002.7   762187472.0  -1.2%
       ManyConstructors(normal) ghc/alloc  4191073418.7  4114369216.0  -1.8%
      MultiLayerModules(normal) ghc/alloc  3095678333.3  3128720704.0  +1.1%
              PmSeriesG(normal) ghc/alloc    50096429.3    51495664.0  +2.8%
              PmSeriesS(normal) ghc/alloc    63512989.3    64681600.0  +1.8%
              PmSeriesV(normal) ghc/alloc    62575424.0    63767208.0  +1.9%
                 T10547(normal) ghc/alloc    29347469.3    29944240.0  +2.0%
                T11303b(normal) ghc/alloc    46018752.0    47367576.0  +2.9%
                 T12150(optasm) ghc/alloc    81660890.7    82547696.0  +1.1%
                 T12234(optasm) ghc/alloc    59451253.3    60357952.0  +1.5%
                 T12545(normal) ghc/alloc  1705216250.7  1751278952.0  +2.7%
                 T12707(normal) ghc/alloc   981000472.0   968489800.0  -1.3% GOOD
                 T13056(optasm) ghc/alloc   389322664.0   372495160.0  -4.3% GOOD
                 T13253(normal) ghc/alloc   337174229.3   341954576.0  +1.4%
                 T13701(normal) ghc/alloc  2381455173.3  2439790328.0  +2.4%  BAD
                   T14052(ghci) ghc/alloc  2162530642.7  2139108784.0  -1.1%
                 T14683(normal) ghc/alloc  3049744728.0  2977535064.0  -2.4% GOOD
                 T14697(normal) ghc/alloc   362980213.3   369304512.0  +1.7%
                 T15164(normal) ghc/alloc  1323102752.0  1307480600.0  -1.2%
                 T15304(normal) ghc/alloc  1304607429.3  1291024568.0  -1.0%
                 T16190(normal) ghc/alloc   281450410.7   284878048.0  +1.2%
                 T16577(normal) ghc/alloc  7984960789.3  7811668768.0  -2.2% GOOD
                 T17516(normal) ghc/alloc  1171051192.0  1153649664.0  -1.5%
                 T17836(normal) ghc/alloc  1115569746.7  1098197592.0  -1.6%
                T17836b(normal) ghc/alloc    54322597.3    55518216.0  +2.2%
                 T17977(normal) ghc/alloc    47071754.7    48403408.0  +2.8%
                T17977b(normal) ghc/alloc    42579133.3    43977392.0  +3.3%
                 T18923(normal) ghc/alloc    71764237.3    72566240.0  +1.1%
                  T1969(normal) ghc/alloc   784821002.7   773971776.0  -1.4% GOOD
                  T3294(normal) ghc/alloc  1634913973.3  1614323584.0  -1.3% GOOD
                  T4801(normal) ghc/alloc   295619648.0   292776440.0  -1.0%
                T5321FD(normal) ghc/alloc   278827858.7   276067280.0  -1.0%
                  T5631(normal) ghc/alloc   586618202.7   577579960.0  -1.5%
                  T5642(normal) ghc/alloc   494923048.0   487927208.0  -1.4%
                  T5837(normal) ghc/alloc    37758061.3    39261608.0  +4.0%
                  T9020(optasm) ghc/alloc   257362077.3   254672416.0  -1.0%
                  T9198(normal) ghc/alloc    49313365.3    50603936.0  +2.6%  BAD
                  T9233(normal) ghc/alloc   704944258.7   685692712.0  -2.7% GOOD
                  T9630(normal) ghc/alloc  1476621560.0  1455192784.0  -1.5%
                  T9675(optasm) ghc/alloc   443183173.3   433859696.0  -2.1% GOOD
                 T9872a(normal) ghc/alloc  1720926653.3  1693190072.0  -1.6% GOOD
                 T9872b(normal) ghc/alloc  2185618061.3  2162277568.0  -1.1% GOOD
                 T9872c(normal) ghc/alloc  1765842405.3  1733618088.0  -1.8% GOOD
   TcPlugin_RewritePerf(normal) ghc/alloc  2388882730.7  2365504696.0  -1.0%
                  WWRec(normal) ghc/alloc   607073186.7   597512216.0  -1.6%

                  T9203(normal) run/alloc   107284064.0   102881832.0  -4.1%
          haddock.Cabal(normal) run/alloc 24025329589.3 23768382560.0  -1.1%
           haddock.base(normal) run/alloc 25660521653.3 25370321824.0  -1.1%
       haddock.compiler(normal) run/alloc 74064171706.7 73358712280.0  -1.0%
```
The biggest exception to the rule is T13701 which seems to fluctuate as usual
(not unlike T12545). T14697 has a similar quality, being a generated
multi-module test. T5837 is small enough that it similarly doesn't measure
anything significant besides module loading overhead.
T13253 simply does one additional round of Simplification due to Nested CPR.

There are also some apparent regressions in T9198, T12234 and PmSeriesG that we
(@mpickering and I) were simply unable to reproduce locally. @mpickering tried
to run the CI script in a local Docker container and actually found that T9198
and PmSeriesG *improved*. In MRs that were rebased on top this one, like !4229,
I did not experience such increases. Let's not get hung up on these regression
tests, they were meant to test for asymptotic regressions.

The build-cabal test improves by 1.2% in -O0.

Metric Increase:
    T10421
    T12234
    T12545
    T13035
    T13056
    T13701
    T14697
    T18923
    T5837
    T9198
Metric Decrease:
    ManyConstructors
    T12545
    T12707
    T13056
    T14683
    T16577
    T18223
    T1969
    T3294
    T9203
    T9233
    T9675
    T9872a
    T9872b
    T9872c
    T9961
    TcPlugin_RewritePerf

c261f220

Apr 17, 2021

Improve CSE in STG-land · 7bd12940

Simon Peyton Jones authored 3 years ago

This patch fixes #19717, a long-standing bug in CSE for STG, which
led to a stupid loss of CSE in some situations.

It's explained in Note [Trivial case scrutinee], which I have
substantially extended.

7bd12940

Apr 30, 2020
- Unit: split and rename modules · 8bfb0219
  Sylvain Henry authored 4 years ago and Marge Bot committed 4 years ago
  
  Introduce GHC.Unit.* hierarchy for everything concerning units, packages and modules. Update Haddock submodule
  8bfb0219
Mar 08, 2019

Always do the worker/wrapper split for NOINLINEs · 1675d40a

Sebastian Graf authored 6 years ago and

Marge Bot committed 6 years ago

Trac #10069 revealed that small NOINLINE functions didn't get split
into worker and wrapper. This was due to `certainlyWillInline`
saying that any unfoldings with a guidance of `UnfWhen` inline
unconditionally. That isn't the case for NOINLINE functions, so we
catch this case earlier now.

Nofib results:

--------------------------------------------------------------------------------
        Program         Allocs    Instrs
--------------------------------------------------------------------------------
 fannkuch-redux          -0.3%      0.0%
             gg          +0.0%     +0.1%
       maillist          -0.2%     -0.2%
        minimax           0.0%     -0.8%
--------------------------------------------------------------------------------
            Min          -0.3%     -0.8%
            Max          +0.0%     +0.1%
 Geometric Mean          -0.0%     -0.0%

Fixes #10069.

-------------------------
Metric Increase:
    T9233
-------------------------

1675d40a

Nov 07, 2018

testsuite: Save performance metrics in git notes. · 932cd41d

davide authored 6 years ago

This patch makes the following improvement:
  - Automatically records test metrics (per test environment) so that
    the programmer need not supply nor update expected values in *.T
    files.
    - On expected metric changes, the programmer need only indicate the
      direction of change in the git commit message.
  - Provides a simple python tool "perf_notes.py" to compare metrics
    over time.

Issues:
  - Using just the previous commit allows performance to drift with each
    commit.
    - Currently we allow drift as we have a preference for minimizing
      false positives.
    - Some possible alternatives include:
      - Use metrics from a fixed commit per test: the last commit that
        allowed a change in performance (else the oldest metric)
      - Or use some sort of aggregate since the last commit that allowed
        a change in performance (else all available metrics)
      - These alternatives may result in a performance issue (with the
        test driver) having to heavily search git commits/notes.
  - Run locally, performance tests will trivially pass unless the tests
    were run locally on the previous commit. This is often not the case
    e.g.  after pulling recent changes.

Previously, *.T files contain statements such as:
```
stats_num_field('peak_megabytes_allocated', (2, 1))
compiler_stats_num_field('bytes allocated',
                         [(wordsize(64), 165890392, 10)])
```
This required the programmer to give the expected values and a tolerance
deviation (percentage). With this patch, the above statements are
replaced with:
```
collect_stats('peak_megabytes_allocated', 5)
collect_compiler_stats('bytes allocated', 10)
```
So that programmer must only enter which metrics to test and a tolerance
deviation. No expected value is required. CircleCI will then run the
tests per test environment and record the metrics to a git note for that
commit and push them to the git.haskell.org ghc repo. Metrics will be
compared to the previous commit. If they are different by the tolerance
deviation from the *.T file, then the corresponding test will fail. By
adding to the git commit message e.g.
```
 # Metric (In|De)crease <metric(s)> <options>: <tests>
Metric Increase ['bytes allocated', 'peak_megabytes_allocated'] \
         (test_env='linux_x86', way='default'):
    Test012, Test345
Metric Decrease 'bytes allocated':
    Test678
Metric Increase:
    Test711
```
This will allow the noted changes (letting the test pass). Note that by
omitting metrics or options, the change will apply to all possible
metrics/options (i.e. in the above, an increase for all metrics in all
test environments is allowed for Test711)

phabricator will use the message in the description

Reviewers: bgamari, hvr

Reviewed By: bgamari

Subscribers: rwbarton, carter

GHC Trac Issues: #12758

Differential Revision: https://phabricator.haskell.org/D5059

932cd41d

Jul 27, 2018

Run StgCse after unarise, fixes #15300 · 3c311e50

Ömer Sinan Ağacan authored 6 years ago

Given two unboxed sum terms:

    (# 1 | #) :: (# Int | Int# #)
    (# 1 | #) :: (# Int | Int  #)

These two terms are not equal as they unarise to different unboxed
tuples. However StgCse was thinking that these are equal, and replacing
one of these with a binder to the other.

To not deal with unboxed sums in StgCse we now do it after unarise. For
StgCse to maintain post-unarise invariants we factor-out case binder
in-scopeness check to `stgCaseBndrInScope` and use it in StgCse.

Also did some refactoring in SimplStg.

Another way to fix this would be adding a special case in StgCse to not
bring unboxed sum binders in scope:

    diff --git a/compiler/simplStg/StgCse.hs
b/compiler/simplStg/StgCse.hs
    index 6c740ca4cb..93a0f8f6ad 100644
    --- a/compiler/simplStg/StgCse.hs
    +++ b/compiler/simplStg/StgCse.hs
    @@ -332,7 +332,11 @@ stgCseExpr env (StgLetNoEscape binds body)
     stgCseAlt :: CseEnv -> OutId -> InStgAlt -> OutStgAlt
     stgCseAlt env case_bndr (DataAlt dataCon, args, rhs)
         = let (env1, args') = substBndrs env args
    -          env2 = addDataCon case_bndr dataCon (map StgVarArg
args') env1
    +          env2
    +            | isUnboxedSumCon dataCon
    +            = env1
    +            | otherwise
    +            = addDataCon case_bndr dataCon (map StgVarArg args')
env1
                 -- see note [Case 2: CSEing case binders]
               rhs' = stgCseExpr env2 rhs
           in (DataAlt dataCon, args', rhs')

I think this patch seems better in that it doesn't add a special case to
StgCse.

Test Plan:
Validate.

I tried to come up with a minimal example but failed. I thought a simple
program like

    data T = T (# Int | Int #) (# Int# | Int #)

    case T (# 1 | #) (# 1 | #) of ...

should be enough to trigger this bug, but for some reason StgCse
doesn't do
anything on this program.

Reviewers: simonpj, bgamari

Reviewed By: simonpj

Subscribers: rwbarton, thomie, carter

GHC Trac Issues: #15300

Differential Revision: https://phabricator.haskell.org/D4962

3c311e50

Apr 19, 2017
- Simplify StgCases when all alts refer to the case binder · 21c35bda
  Joachim Breitner authored 7 years ago
  
  as proposed in #13588. Differential Revision: https://phabricator.haskell.org/D3467
  21c35bda
Apr 18, 2017
- Add failing test case for #13588 · ebb780f1
  Joachim Breitner authored 7 years ago
  
  ebb780f1
Apr 11, 2017
- Typos in comments [ci skip] · fc2a96a1
  Gabor Greif authored 7 years ago
  
  fc2a96a1
Apr 10, 2017

Add a second regression test for #13536 · ddc05912
Joachim Breitner authored 7 years ago
```
which counts allocations instead of observing recomputation directly.
```
ddc05912

StgCse: Do not re-use trivial case scrutinees · b55f310d

Joachim Breitner authored 7 years ago

as they might be marked as one-shot, and suddenly we’d evaluate them
multiple times. This came up in #13536 (test cases included).

The solution was layed out by SPJ in ticket:13536#comment:12.

Differential Revision: https://phabricator.haskell.org/D3437

b55f310d

Jan 05, 2017

Add a CSE pass to Stg (#9291 ) · 19d5c731

Joachim Breitner authored 8 years ago

This CSE pass only targets data constructor applications. This is
probably the best we can do, as function calls and primitive operations
might have side-effects.

Introduces the flag -fstg-cse, enabled by default with -O for now. It
might also be a good candiate for -O2.

Differential Revision: https://phabricator.haskell.org/D2871

19d5c731