Commits · cd9b94594440163a1a726300d300f76ff05cd15a · gershomb / GHC

Oct 28, 2019

Refactor TcDeriv to validity-check less in anyclass/via deriving (#13154 ) · cd9b9459

Ryan Scott authored 5 years ago and

Marge Bot committed 5 years ago

Due to the way `DerivEnv` is currently structured, there is an
invariant that every derived instance must consist of a class applied
to a non-empty list of argument types, where the last argument *must*
be an application of a type constructor to some arguments. This works
for many cases, but there are also some design patterns in standalone
`anyclass`/`via` deriving that are made impossible due to enforcing
this invariant, as documented in #13154.

This fixes #13154 by refactoring `TcDeriv` and friends to perform
fewer validity checks when using the `anyclass` or `via` strategies.
The highlights are as followed:

* Five fields of `DerivEnv` have been factored out into a new
  `DerivInstTys` data type. These fields only make sense for
  instances that satisfy the invariant mentioned above, so
  `DerivInstTys` is now only used in `stock` and `newtype` deriving,
  but not in other deriving strategies.
* There is now a `Note [DerivEnv and DerivSpecMechanism]` describing
  the bullet point above in more detail, as well as explaining the
  exact requirements that each deriving strategy imposes.
* I've refactored `mkEqnHelp`'s call graph to be slightly less
  complicated. Instead of the previous `mkDataTypeEqn`/`mkNewTypeEqn`
  dichotomy, there is now a single entrypoint `mk_eqn`.
* Various bits of code were tweaked so as not to use fields that are
  specific to `DerivInstTys` so that they may be used by all deriving
  strategies, since not all deriving strategies use `DerivInstTys`.

cd9b9459

Fix #15344 : use fail when desugaring applicative-do · 6635a3f6

Josef Svenningsson authored 5 years ago and

Marge Bot committed 5 years ago

Applicative-do has a bug where it fails to use the monadic fail method
when desugaring patternmatches which can fail. See #15344.

This patch fixes that problem. It required more rewiring than I had expected.
Applicative-do happens mostly in the renamer; that's where decisions about
scheduling are made. This schedule is then carried through the typechecker and
into the desugarer which performs the actual translation. Fixing this bug
required sending information about the fail method from the renamer, through
the type checker and into the desugarer. Previously, the desugarer didn't
have enough information to actually desugar pattern matches correctly.

As a side effect, we also fix #16628, where GHC wouldn't catch missing
MonadFail instances with -XApplicativeDo.

6635a3f6

Oct 27, 2019
- hadrian: Silence output from Support SMP check · 90d06fd0
  Ben Gamari authored 5 years ago
  
  Previously we would allow the output from the check of SMP support introduced by 83655b06 leak to stdout. Silence this. See #16873.
  90d06fd0
- Parenthesize nullary constraint tuples using sigPrec (#17403 ) · fa0d4809
  Ryan Scott authored 5 years ago and Marge Bot committed 5 years ago
  
  We were using `appPrec`, not `sigPrec`, as the precedence when determining whether or not to parenthesize `() :: Constraint`, which lead to the parentheses being omitted in function contexts like `(() :: Constraint) => String`. Easily fixed. Fixes #17403.
  fa0d4809
- Remove documented flags from expected-undocumented-flags.txt · 7d80f8b5
  Luke Lau authored 5 years ago and Marge Bot committed 5 years ago
  
  7d80f8b5
- Add back documentation for deprecated -Whi-shadowing · 896d470a
  Luke Lau authored 5 years ago and Marge Bot committed 5 years ago
  
  This was removed in b538476b, but without it the compare-flags.py script fails. This adds it back and marks it as deprecated, with a notice that it is slated for removal.
  896d470a
- Fix path to ghc-flags in users guide Hadrian rules · d2520bef
  Luke Lau authored 5 years ago and Marge Bot committed 5 years ago
  
  It should point to the _build directory, not the source
  d2520bef
- Remove unused import · fc3a5205
  Luke Lau authored 5 years ago and Marge Bot committed 5 years ago
  
  fc3a5205
- Fix RankNTypes :ghc-flag: in users guide · c62817f2
  Luke Lau authored 5 years ago and Marge Bot committed 5 years ago
  
  This fixes a hadrian `build docs` failure
  c62817f2
Oct 26, 2019

Remove redundant -fno-cse options · 4054f0e5
Ömer Sinan Ağacan authored 5 years ago and Marge Bot committed 5 years ago
```
These were probably added with some GLOBAL_VARs, but those GLOBAL_VARs
are now gone.
```
4054f0e5

rts: Fix ARM linker includes · 417f59d4

Ben Gamari authored 5 years ago and

Marge Bot committed 5 years ago

 * Prefer #pragma once over guard macros
 * Drop redundant #includes
 * Fix order to ensure that necessary macros are defined when we
   condition on them

417f59d4

Enable PDF documentation · 60575596
Ben Gamari authored 5 years ago and Marge Bot committed 5 years ago

60575596
testsuite: Skip regalloc_unit_tests unless have_ncg · 8ac49411
Ben Gamari authored 5 years ago and Marge Bot committed 5 years ago
```
This is a unit test for the native code generator's register allocator;
naturally. the NCG is required.
```
8ac49411
gitlab-ci: Produce ARMv7 binary distributions · 609c7ee6
Ben Gamari authored 5 years ago and Marge Bot committed 5 years ago

609c7ee6
hadrian: point link to ghc gitlab · 4820af10
adam authored 5 years ago and Marge Bot committed 5 years ago
```
[skip ci]
```
4820af10

Fix #14690 - :steplocal panics after break-on-error · 1be9c35c

Roland Senn authored 5 years ago and

Marge Bot committed 5 years ago

`:steplocal` enables only breakpoints in the current top-level binding.

When a normal breakpoint is hit, then the module name and the break id from the `BRK_FUN` byte code
allow us to access the corresponding entry in a ModBreak table. From this entry we then get the SrcSpan
(see compiler/main/InteractiveEval.hs:bindLocalsAtBreakpoint).
With this source-span we can then determine the current top-level binding, needed for the steplocal command.

However, if we break at an exception or at an error, we don't have an BRK_FUN byte-code, so we don't have any source information.
The function `bindLocalsAtBreakpoint` creates an `UnhelpfulSpan`, which doesn't allow us to determine the current top-level binding.
To avoid a `panic`, we have to check for `UnhelpfulSpan` in the function `ghc/GHCi/UI.hs:stepLocalCmd`.
Hence a :steplocal command after a break-on-exception or a break-on-error is not possible.

1be9c35c

Implement shrinkSmallMutableArray# and resizeSmallMutableArray#. · 8916e64e

Andrew Martin authored 5 years ago and

Marge Bot committed 5 years ago

This is a part of GHC Proposal #25: "Offer more array resizing primitives".
Resources related to the proposal:

  - Discussion: https://github.com/ghc-proposals/ghc-proposals/pull/121
  - Proposal: https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0025-resize-boxed.rst

Only shrinkSmallMutableArray# is implemented as a primop since a
library-space implementation of resizeSmallMutableArray# (in GHC.Exts)
is no less efficient than a primop would be. This may be replaced by
a primop in the future if someone devises a strategy for growing
arrays in-place. The library-space implementation always copies the
array when growing it.

This commit also tweaks the documentation of the deprecated
sizeofMutableByteArray#, removing the mention of concurrency. That
primop is unsound even in single-threaded applications. Additionally,
the non-negativity assertion on the existing shrinkMutableByteArray#
primop has been removed since this predicate is trivially always true.

8916e64e

Oct 25, 2019
- gitlab-ci: Run check-uniques during lint job · acedfc8b
  Ben Gamari authored 5 years ago and Marge Bot committed 5 years ago
  
  acedfc8b
- configure: Drop GccLT46 · 519f5162
  Ben Gamari authored 5 years ago and Marge Bot committed 5 years ago
  
  GCC 4.6 was released 7 years ago. I think we can finally assume that it's available. This is a simplification prompted by #15742.
  519f5162
- aclocal: A bit of reformatting · 02822d84
  Ben Gamari authored 5 years ago and Marge Bot committed 5 years ago
  
  02822d84
- Make documentation for byteSwap16 consistent with byteSwap32 (impl is same, with s/16/32) · 4c7d45d1
  Brian Wignall authored 5 years ago and Marge Bot committed 5 years ago
  
  4c7d45d1
- testsuite: Use fragile modifier in TH_foreignInterruptible · 08298926
  Ben Gamari authored 5 years ago and Marge Bot committed 5 years ago
  
  It looks like this use of `skip` snuck through my previous refactoring.
  08298926
- testsuite: Mark T13786 as fragile in unreg build · 2a16b555
  Ben Gamari authored 5 years ago and Marge Bot committed 5 years ago
  
  Due to #17018.
  2a16b555
- hadrian/ci: run testsuite against a freshly produced and installed bindist · ecd89062
  Alp Mestanogullari authored 5 years ago and Marge Bot committed 5 years ago
  
  ecd89062
- Make Hadrian use -dynamic-too in the basic case · 87175e78
  Jess authored 5 years ago and Marge Bot committed 5 years ago
  
  This commit makes Hadrian use the `-dynamic-too` flag when the current Flavour's libraryWays contains both vanilla and dynamic, cutting down the amount of repeated work caused by separate compilation of dynamic and static files. It does this for the basic case where '.o' and '.dyn_o' files are built with one command, but does not generalise to cases like '.prof_o' and '.prof_dyn_o'.
  87175e78
Oct 24, 2019

Mark promoted InfixT names as IsPromoted (#17394) · 243c72eb

Ryan Scott authored 5 years ago and

Marge Bot committed 5 years ago

We applied a similar fix for `ConT` in #15572 but forgot to apply the
fix to `InfixT` as well. This patch fixes #17394 by doing just that.

243c72eb

Make isTcLevPoly more conservative with newtypes (#17360) · 9de3f8b1

Ryan Scott authored 5 years ago and

Marge Bot committed 5 years ago

`isTcLevPoly` gives an approximate answer for when a type constructor
is levity polymorphic when fully applied, where `True` means
"possibly levity polymorphic" and `False` means "definitely not
levity polymorphic". `isTcLevPoly` returned `False` for newtypes,
which is incorrect in the presence of `UnliftedNewtypes`, leading
to #17360. This patch tweaks `isTcLevPoly` to return `True` for
newtypes instead.

Fixes #17360.

9de3f8b1

Parenthesize GADT return types in pprIfaceConDecl (#17384) · 6824f29a
Ryan Scott authored 5 years ago and Marge Bot committed 5 years ago
```
We were using `pprIfaceAppArgs` instead of `pprParendIfaceAppArgs`
in `pprIfaceConDecl`. Oops.

Fixes #17384.
```
6824f29a
hadrian: Warn user if hadrian build fails due to lack of threaded RTS · 83655b06
Ben Gamari authored 5 years ago and Marge Bot committed 5 years ago
```
See #16873.
```
83655b06

Oct 23, 2019

Merge non-moving garbage collector · 7f72b540

Ben Gamari authored 5 years ago

This introduces a concurrent mark & sweep garbage collector to manage the old
generation. The concurrent nature of this collector typically results in
significantly reduced maximum and mean pause times in applications with large
working sets.

Due to the large and intricate nature of the change I have opted to
preserve the fully-buildable history, including merge commits, which is
described in the "Branch overview" section below.

Collector design
================

The full design of the collector implemented here is described in detail
in a technical note

> B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
> Compiler" (2018)

This document can be requested from @bgamari.
The basic heap structure used in this design is heavily inspired by

> K. Ueno & A. Ohori. "A fully concurrent garbage collector for
> functional programs on multicore processors." /ACM SIGPLAN Notices/
> Vol. 51. No. 9 (presented at ICFP 2016)

This design is intended to allow both marking and sweeping
concurrent to execution of a multi-core mutator. Unlike the Ueno design,
which requires no global synchronization pauses, the collector
introduced here requires a stop-the-world pause at the beginning and end
of the mark phase.

To avoid heap fragmentation, the allocator consists of a number of
fixed-size /sub-allocators/. Each of these sub-allocators allocators into
its own set of /segments/, themselves allocated from the block
allocator. Each segment is broken into a set of fixed-size allocation
blocks (which back allocations) in addition to a bitmap (used to track
the liveness of blocks) and some additional metadata (used also used
to track liveness).

This heap structure enables collection via mark-and-sweep, which can be
performed concurrently via a snapshot-at-the-beginning scheme (although
concurrent collection is not implemented in this patch).

Implementation structure
========================

The majority of the collector is implemented in a handful of files:

 * `rts/Nonmoving.c` is the heart of the beast. It implements the entry-point
   to the nonmoving collector (`nonmoving_collect`), as well as the allocator
   (`nonmoving_allocate`) and a number of utilities for manipulating the heap.

 * `rts/NonmovingMark.c` implements the mark queue functionality, update
   remembered set, and mark loop.

 * `rts/NonmovingSweep.c` implements the sweep loop.

 * `rts/NonmovingScav.c` implements the logic necessary to scavenge the
   nonmoving heap.

Branch overview
===============

```
 * wip/gc/opt-pause:
 |   A variety of small optimisations to further reduce pause times.
 |
 * wip/gc/compact-nfdata:
 |   Introduce support for compact regions into the non-moving
 |\  collector
 | \
 |  \
 | | * wip/gc/segment-header-to-bdescr:
 | | |   Another optimization that we are considering, pushing
 | | |   some segment metadata into the segment descriptor for
 | | |   the sake of locality during mark
 | | |
 | * | wip/gc/shortcutting:
 | | |   Support for indirection shortcutting and the selector optimization
 | | |   in the non-moving heap.
 | | |
 * | | wip/gc/docs:
 | |/    Work on implementation documentation.
 | /
 |/
 * wip/gc/everything:
 |   A roll-up of everything below.
 |\
 | \
 | |\
 | | \
 | | * wip/gc/optimize:
 | | |   A variety of optimizations, primarily to the mark loop.
 | | |   Some of these are microoptimizations but a few are quite
 | | |   significant. In particular, the prefetch patches have
 | | |   produced a nontrivial improvement in mark performance.
 | | |
 | | * wip/gc/aging:
 | | |   Enable support for aging in major collections.
 | | |
 | * | wip/gc/test:
 | | |   Fix up the testsuite to more or less pass.
 | | |
 * | | wip/gc/instrumentation:
 | | |   A variety of runtime instrumentation including statistics
 | | /   support, the nonmoving census, and eventlog support.
 | |/
 | /
 |/
 * wip/gc/nonmoving-concurrent:
 |   The concurrent write barriers.
 |
 * wip/gc/nonmoving-nonconcurrent:
 |   The nonmoving collector without the write barriers necessary
 |   for concurrent collection.
 |
 * wip/gc/preparation:
 |   A merge of the various preparatory patches that aren't directly
 |   implementing the GC.
 |
 |
 * GHC HEAD
 .
 .
 .
```

7f72b540

base: Add @since on GHC.IO.Handle.Lock.hUnlock · 8abddac8

Ben Gamari authored 5 years ago and

Marge Bot committed 5 years ago

Unfortunately this was introduced in base-4.11.0 (GHC 8.4.1)
whereas the other locking primitives were added in base-4.10.0 (GHC
8.2.1).

8abddac8

Add new flag for unarised STG dumps · 266435a7

Ömer Sinan Ağacan authored 5 years ago and

Marge Bot committed 5 years ago

Previously -ddump-stg would dump pre and post-unarise STGs. Now we have
a new flag for post-unarise STG and -ddump-stg only dumps coreToStg
output.

STG dump flags after this commit:

- -ddump-stg: Dumps CoreToStg output
- -ddump-stg-unarised: Unarise output
- -ddump-stg-final: STG right before code gen (includes CSE and lambda
  lifting)

266435a7

Hadrian: Invoke ghc0 via bash when running tests to fix #17362. · bb0dc5a5

Andreas Klebinger authored 5 years ago and

Marge Bot committed 5 years ago

cmd uses RawCommand which uses Windows semantics to find the executable
which sometimes seems to fail for unclear reasons.

If we invoke ghc via bash then bash will find the ghc executable and
the issue goes away.

bb0dc5a5

Drop duplicate -optl's from GHC invocations · 21663693

Ben Gamari authored 5 years ago and

Marge Bot committed 5 years ago

Previously the make build system would pass things like
`-optl-optl-Wl,-x -optl-optl-Wl,noexecstack` to GHC. This would
naturally result in mass confusion as GHC would pass `-optl-Wl,-x` to
GCC. GCC would in turn interpret this as `-o ptl-Wl,-x`, setting the
output pass of the invocation.

The problem that `-optl` was added to the command-line in two places in
the build system. Fix this.

Fixes #17385.

21663693

users-guide: Fix :since: for -Wunused-packages · 4af20bbc
Ben Gamari authored 5 years ago and Marge Bot committed 5 years ago
```
Fixes #17382.
```
4af20bbc

Performance tests: Reduce acceptance threshold for bytes allocated tests · 8dd480cc

Matthew Pickering authored 5 years ago and

Marge Bot committed 5 years ago

The "new" performance testing infrastructure resets the baseline after
every test so it's easy to miss gradual performance regressions over
time. We should at least make these numbers smaller to catch patches
which affect performance earlier.

8dd480cc

Make dynflag argument for withTiming pure. · 6beea836

Andreas Klebinger authored 5 years ago and

Marge Bot committed 5 years ago

19 times out of 20 we already have dynflags in scope.

We could just always use `return dflags`. But this is in fact not free.
When looking at some STG code I noticed that we always allocate a
closure for this expression in the heap. Clearly a waste in these cases.

For the other cases we can either just modify the callsite to
get dynflags or use the _D variants of withTiming I added which
will use getDynFlags under the hood.

6beea836

Bump stm submodule · 9c1f0f7c
Ben Gamari authored 5 years ago and Marge Bot committed 5 years ago

9c1f0f7c

Full abort on validate failure merging `orElse`. · 1f40e68a

ryates@cs.rochester.edu authored 5 years ago and

Marge Bot committed 5 years ago

Previously partial roll back of a branch of an `orElse` was attempted
if validation failure was observed.  Validation here, however, does
not account for what part of the transaction observed inconsistent
state.  This commit fixes this by fully aborting and restarting the
transaction.

1f40e68a

Fix bug in the x86 backend involving the CFG. · aa778152

Andreas Klebinger authored 5 years ago and

Marge Bot committed 5 years ago

This is part two of fixing #17334.

There are two parts to this commit:

- A bugfix for computing loop levels
- A bugfix of basic block invariants in the NCG.

-----------------------------------------------------------

In the first bug we ended up with a CFG of the sort: [A -> B -> C]
This was represented via maps as fromList [(A,B),(B,C)] and later
transformed into a adjacency array. However the transformation did
not include block C in the array (since we only looked at the keys of
the map).

This was still fine until we tried to look up successors for C and tried
to read outside of the array bounds when accessing C.

In order to prevent this in the future I refactored to code to include
all nodes as keys in the map representation. And make this a invariant
which is checked in a few places.

Overall I expect this to make the code more robust as now any failed
lookup will represent an error, versus failed lookups sometimes being
expected and sometimes not.

In terms of performance this makes some things cheaper (getting a list
of all nodes) and others more expensive (adding a new edge). Overall
this adds up to no noteable performance difference.

-----------------------------------------------------------

Part 2: When the NCG generated a new basic block, it did
not always insert a NEWBLOCK meta instruction in the stream which
caused a quite subtle bug.

    During instruction selection a statement `s`
    in a block B with control of the sort: B -> C
    will sometimes result in control
    flow of the sort:

            ┌ < ┐
            v   ^
      B ->  B1  ┴ -> C

    as is the case for some atomic operations.

    Now to keep the CFG in sync when introducing B1 we clearly
    want to insert it between B and C. However there is
    a catch when we have to deal with self loops.

    We might start with code and a CFG of these forms:

    loop:
        stmt1               ┌ < ┐
        ....                v   ^
        stmtX              loop ┘
        stmtY
        ....
        goto loop:

    Now we introduce B1:
                            ┌ ─ ─ ─ ─ ─┐
        loop:               │   ┌ <  ┐ │
        instrs              v   │    │ ^
        ....               loop ┴ B1 ┴ ┘
        instrsFromX
        stmtY
        goto loop:

    This is simple, all outgoing edges from loop now simply
    start from B1 instead and the code generator knows which
    new edges it introduced for the self loop of B1.

    Disaster strikes if the statement Y follows the same pattern.
    If we apply the same rule that all outgoing edges change then
    we end up with:

        loop ─> B1 ─> B2 ┬─┐
          │      │    └─<┤ │
          │      └───<───┘ │
          └───────<────────┘

    This is problematic. The edge B1->B1 is modified as expected.
    However the modification is wrong!

    The assembly in this case looked like this:

    _loop:
        <instrs>
    _B1:
        ...
        cmpxchgq ...
        jne _B1
        <instrs>
        <end _B1>
    _B2:
        ...
        cmpxchgq ...
        jne _B2
        <instrs>
        jmp loop

    There is no edge _B2 -> _B1 here. It's still a self loop onto _B1.

    The problem here is that really B1 should be two basic blocks.
    Otherwise we have control flow in the *middle* of a basic block.
    A contradiction!

    So to account for this we add yet another basic block marker:

    _B:
        <instrs>
    _B1:
        ...
        cmpxchgq ...
        jne _B1
        jmp _B1'
    _B1':
        <instrs>
        <end _B1>
    _B2:
        ...

    Now when inserting B2 we will only look at the outgoing edges of B1' and
    everything will work out nicely.

    You might also wonder why we don't insert jumps at the end of _B1'. There is
    no way another block ends up jumping to the labels _B1 or _B2 since they are
    essentially invisible to other blocks. View them as control flow labels local
    to the basic block if you'd like.

    Not doing this ultimately caused (part 2 of) #17334.

aa778152