- Oct 28, 2019
-
-
Due to the way `DerivEnv` is currently structured, there is an invariant that every derived instance must consist of a class applied to a non-empty list of argument types, where the last argument *must* be an application of a type constructor to some arguments. This works for many cases, but there are also some design patterns in standalone `anyclass`/`via` deriving that are made impossible due to enforcing this invariant, as documented in #13154. This fixes #13154 by refactoring `TcDeriv` and friends to perform fewer validity checks when using the `anyclass` or `via` strategies. The highlights are as followed: * Five fields of `DerivEnv` have been factored out into a new `DerivInstTys` data type. These fields only make sense for instances that satisfy the invariant mentioned above, so `DerivInstTys` is now only used in `stock` and `newtype` deriving, but not in other deriving strategies. * There is now a `Note [DerivEnv and DerivSpecMechanism]` describing the bullet point above in more detail, as well as explaining the exact requirements that each deriving strategy imposes. * I've refactored `mkEqnHelp`'s call graph to be slightly less complicated. Instead of the previous `mkDataTypeEqn`/`mkNewTypeEqn` dichotomy, there is now a single entrypoint `mk_eqn`. * Various bits of code were tweaked so as not to use fields that are specific to `DerivInstTys` so that they may be used by all deriving strategies, since not all deriving strategies use `DerivInstTys`.
-
Applicative-do has a bug where it fails to use the monadic fail method when desugaring patternmatches which can fail. See #15344. This patch fixes that problem. It required more rewiring than I had expected. Applicative-do happens mostly in the renamer; that's where decisions about scheduling are made. This schedule is then carried through the typechecker and into the desugarer which performs the actual translation. Fixing this bug required sending information about the fail method from the renamer, through the type checker and into the desugarer. Previously, the desugarer didn't have enough information to actually desugar pattern matches correctly. As a side effect, we also fix #16628, where GHC wouldn't catch missing MonadFail instances with -XApplicativeDo.
-
- Oct 27, 2019
-
-
Ben Gamari authored
Previously we would allow the output from the check of SMP support introduced by 83655b06 leak to stdout. Silence this. See #16873.
-
We were using `appPrec`, not `sigPrec`, as the precedence when determining whether or not to parenthesize `() :: Constraint`, which lead to the parentheses being omitted in function contexts like `(() :: Constraint) => String`. Easily fixed. Fixes #17403.
-
- Oct 26, 2019
-
-
These were probably added with some GLOBAL_VARs, but those GLOBAL_VARs are now gone.
-
* Prefer #pragma once over guard macros * Drop redundant #includes * Fix order to ensure that necessary macros are defined when we condition on them
-
-
This is a unit test for the native code generator's register allocator; naturally. the NCG is required.
-
-
`:steplocal` enables only breakpoints in the current top-level binding. When a normal breakpoint is hit, then the module name and the break id from the `BRK_FUN` byte code allow us to access the corresponding entry in a ModBreak table. From this entry we then get the SrcSpan (see compiler/main/InteractiveEval.hs:bindLocalsAtBreakpoint). With this source-span we can then determine the current top-level binding, needed for the steplocal command. However, if we break at an exception or at an error, we don't have an BRK_FUN byte-code, so we don't have any source information. The function `bindLocalsAtBreakpoint` creates an `UnhelpfulSpan`, which doesn't allow us to determine the current top-level binding. To avoid a `panic`, we have to check for `UnhelpfulSpan` in the function `ghc/GHCi/UI.hs:stepLocalCmd`. Hence a :steplocal command after a break-on-exception or a break-on-error is not possible.
-
This is a part of GHC Proposal #25: "Offer more array resizing primitives". Resources related to the proposal: - Discussion: https://github.com/ghc-proposals/ghc-proposals/pull/121 - Proposal: https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0025-resize-boxed.rst Only shrinkSmallMutableArray# is implemented as a primop since a library-space implementation of resizeSmallMutableArray# (in GHC.Exts) is no less efficient than a primop would be. This may be replaced by a primop in the future if someone devises a strategy for growing arrays in-place. The library-space implementation always copies the array when growing it. This commit also tweaks the documentation of the deprecated sizeofMutableByteArray#, removing the mention of concurrency. That primop is unsound even in single-threaded applications. Additionally, the non-negativity assertion on the existing shrinkMutableByteArray# primop has been removed since this predicate is trivially always true.
-
- Oct 25, 2019
-
-
-
GCC 4.6 was released 7 years ago. I think we can finally assume that it's available. This is a simplification prompted by #15742.
-
-
-
It looks like this use of `skip` snuck through my previous refactoring.
-
Due to #17018.
-
-
This commit makes Hadrian use the `-dynamic-too` flag when the current Flavour's libraryWays contains both vanilla and dynamic, cutting down the amount of repeated work caused by separate compilation of dynamic and static files. It does this for the basic case where '.o' and '.dyn_o' files are built with one command, but does not generalise to cases like '.prof_o' and '.prof_dyn_o'.
-
- Oct 24, 2019
-
-
We applied a similar fix for `ConT` in #15572 but forgot to apply the fix to `InfixT` as well. This patch fixes #17394 by doing just that.
-
`isTcLevPoly` gives an approximate answer for when a type constructor is levity polymorphic when fully applied, where `True` means "possibly levity polymorphic" and `False` means "definitely not levity polymorphic". `isTcLevPoly` returned `False` for newtypes, which is incorrect in the presence of `UnliftedNewtypes`, leading to #17360. This patch tweaks `isTcLevPoly` to return `True` for newtypes instead. Fixes #17360.
-
We were using `pprIfaceAppArgs` instead of `pprParendIfaceAppArgs` in `pprIfaceConDecl`. Oops. Fixes #17384.
-
See #16873.
-
- Oct 23, 2019
-
-
Ben Gamari authored
This introduces a concurrent mark & sweep garbage collector to manage the old generation. The concurrent nature of this collector typically results in significantly reduced maximum and mean pause times in applications with large working sets. Due to the large and intricate nature of the change I have opted to preserve the fully-buildable history, including merge commits, which is described in the "Branch overview" section below. Collector design ================ The full design of the collector implemented here is described in detail in a technical note > B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell > Compiler" (2018) This document can be requested from @bgamari. The basic heap structure used in this design is heavily inspired by > K. Ueno & A. Ohori. "A fully concurrent garbage collector for > functional programs on multicore processors." /ACM SIGPLAN Notices/ > Vol. 51. No. 9 (presented at ICFP 2016) This design is intended to allow both marking and sweeping concurrent to execution of a multi-core mutator. Unlike the Ueno design, which requires no global synchronization pauses, the collector introduced here requires a stop-the-world pause at the beginning and end of the mark phase. To avoid heap fragmentation, the allocator consists of a number of fixed-size /sub-allocators/. Each of these sub-allocators allocators into its own set of /segments/, themselves allocated from the block allocator. Each segment is broken into a set of fixed-size allocation blocks (which back allocations) in addition to a bitmap (used to track the liveness of blocks) and some additional metadata (used also used to track liveness). This heap structure enables collection via mark-and-sweep, which can be performed concurrently via a snapshot-at-the-beginning scheme (although concurrent collection is not implemented in this patch). Implementation structure ======================== The majority of the collector is implemented in a handful of files: * `rts/Nonmoving.c` is the heart of the beast. It implements the entry-point to the nonmoving collector (`nonmoving_collect`), as well as the allocator (`nonmoving_allocate`) and a number of utilities for manipulating the heap. * `rts/NonmovingMark.c` implements the mark queue functionality, update remembered set, and mark loop. * `rts/NonmovingSweep.c` implements the sweep loop. * `rts/NonmovingScav.c` implements the logic necessary to scavenge the nonmoving heap. Branch overview =============== ``` * wip/gc/opt-pause: | A variety of small optimisations to further reduce pause times. | * wip/gc/compact-nfdata: | Introduce support for compact regions into the non-moving |\ collector | \ | \ | | * wip/gc/segment-header-to-bdescr: | | | Another optimization that we are considering, pushing | | | some segment metadata into the segment descriptor for | | | the sake of locality during mark | | | | * | wip/gc/shortcutting: | | | Support for indirection shortcutting and the selector optimization | | | in the non-moving heap. | | | * | | wip/gc/docs: | |/ Work on implementation documentation. | / |/ * wip/gc/everything: | A roll-up of everything below. |\ | \ | |\ | | \ | | * wip/gc/optimize: | | | A variety of optimizations, primarily to the mark loop. | | | Some of these are microoptimizations but a few are quite | | | significant. In particular, the prefetch patches have | | | produced a nontrivial improvement in mark performance. | | | | | * wip/gc/aging: | | | Enable support for aging in major collections. | | | | * | wip/gc/test: | | | Fix up the testsuite to more or less pass. | | | * | | wip/gc/instrumentation: | | | A variety of runtime instrumentation including statistics | | / support, the nonmoving census, and eventlog support. | |/ | / |/ * wip/gc/nonmoving-concurrent: | The concurrent write barriers. | * wip/gc/nonmoving-nonconcurrent: | The nonmoving collector without the write barriers necessary | for concurrent collection. | * wip/gc/preparation: | A merge of the various preparatory patches that aren't directly | implementing the GC. | | * GHC HEAD . . . ```
-
Unfortunately this was introduced in base-4.11.0 (GHC 8.4.1) whereas the other locking primitives were added in base-4.10.0 (GHC 8.2.1).
-
Previously -ddump-stg would dump pre and post-unarise STGs. Now we have a new flag for post-unarise STG and -ddump-stg only dumps coreToStg output. STG dump flags after this commit: - -ddump-stg: Dumps CoreToStg output - -ddump-stg-unarised: Unarise output - -ddump-stg-final: STG right before code gen (includes CSE and lambda lifting)
-
cmd uses RawCommand which uses Windows semantics to find the executable which sometimes seems to fail for unclear reasons. If we invoke ghc via bash then bash will find the ghc executable and the issue goes away.
-
Previously the make build system would pass things like `-optl-optl-Wl,-x -optl-optl-Wl,noexecstack` to GHC. This would naturally result in mass confusion as GHC would pass `-optl-Wl,-x` to GCC. GCC would in turn interpret this as `-o ptl-Wl,-x`, setting the output pass of the invocation. The problem that `-optl` was added to the command-line in two places in the build system. Fix this. Fixes #17385.
-
Fixes #17382.
-
The "new" performance testing infrastructure resets the baseline after every test so it's easy to miss gradual performance regressions over time. We should at least make these numbers smaller to catch patches which affect performance earlier.
-
19 times out of 20 we already have dynflags in scope. We could just always use `return dflags`. But this is in fact not free. When looking at some STG code I noticed that we always allocate a closure for this expression in the heap. Clearly a waste in these cases. For the other cases we can either just modify the callsite to get dynflags or use the _D variants of withTiming I added which will use getDynFlags under the hood.
-
-
Previously partial roll back of a branch of an `orElse` was attempted if validation failure was observed. Validation here, however, does not account for what part of the transaction observed inconsistent state. This commit fixes this by fully aborting and restarting the transaction.
-
This is part two of fixing #17334. There are two parts to this commit: - A bugfix for computing loop levels - A bugfix of basic block invariants in the NCG. ----------------------------------------------------------- In the first bug we ended up with a CFG of the sort: [A -> B -> C] This was represented via maps as fromList [(A,B),(B,C)] and later transformed into a adjacency array. However the transformation did not include block C in the array (since we only looked at the keys of the map). This was still fine until we tried to look up successors for C and tried to read outside of the array bounds when accessing C. In order to prevent this in the future I refactored to code to include all nodes as keys in the map representation. And make this a invariant which is checked in a few places. Overall I expect this to make the code more robust as now any failed lookup will represent an error, versus failed lookups sometimes being expected and sometimes not. In terms of performance this makes some things cheaper (getting a list of all nodes) and others more expensive (adding a new edge). Overall this adds up to no noteable performance difference. ----------------------------------------------------------- Part 2: When the NCG generated a new basic block, it did not always insert a NEWBLOCK meta instruction in the stream which caused a quite subtle bug. During instruction selection a statement `s` in a block B with control of the sort: B -> C will sometimes result in control flow of the sort: ┌ < ┐ v ^ B -> B1 ┴ -> C as is the case for some atomic operations. Now to keep the CFG in sync when introducing B1 we clearly want to insert it between B and C. However there is a catch when we have to deal with self loops. We might start with code and a CFG of these forms: loop: stmt1 ┌ < ┐ .... v ^ stmtX loop ┘ stmtY .... goto loop: Now we introduce B1: ┌ ─ ─ ─ ─ ─┐ loop: │ ┌ < ┐ │ instrs v │ │ ^ .... loop ┴ B1 ┴ ┘ instrsFromX stmtY goto loop: This is simple, all outgoing edges from loop now simply start from B1 instead and the code generator knows which new edges it introduced for the self loop of B1. Disaster strikes if the statement Y follows the same pattern. If we apply the same rule that all outgoing edges change then we end up with: loop ─> B1 ─> B2 ┬─┐ │ │ └─<┤ │ │ └───<───┘ │ └───────<────────┘ This is problematic. The edge B1->B1 is modified as expected. However the modification is wrong! The assembly in this case looked like this: _loop: <instrs> _B1: ... cmpxchgq ... jne _B1 <instrs> <end _B1> _B2: ... cmpxchgq ... jne _B2 <instrs> jmp loop There is no edge _B2 -> _B1 here. It's still a self loop onto _B1. The problem here is that really B1 should be two basic blocks. Otherwise we have control flow in the *middle* of a basic block. A contradiction! So to account for this we add yet another basic block marker: _B: <instrs> _B1: ... cmpxchgq ... jne _B1 jmp _B1' _B1': <instrs> <end _B1> _B2: ... Now when inserting B2 we will only look at the outgoing edges of B1' and everything will work out nicely. You might also wonder why we don't insert jumps at the end of _B1'. There is no way another block ends up jumping to the labels _B1 or _B2 since they are essentially invisible to other blocks. View them as control flow labels local to the basic block if you'd like. Not doing this ultimately caused (part 2 of) #17334.
-