1. 24 Oct, 2019 4 commits
  2. 23 Oct, 2019 20 commits
    • Ben Gamari's avatar
      Merge non-moving garbage collector · 7f72b540
      Ben Gamari authored
      This introduces a concurrent mark & sweep garbage collector to manage the old
      generation. The concurrent nature of this collector typically results in
      significantly reduced maximum and mean pause times in applications with large
      working sets.
      Due to the large and intricate nature of the change I have opted to
      preserve the fully-buildable history, including merge commits, which is
      described in the "Branch overview" section below.
      Collector design
      The full design of the collector implemented here is described in detail
      in a technical note
      > B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
      > Compiler" (2018)
      This document can be requested from @bgamari.
      The basic heap structure used in this design is heavily inspired by
      > K. Ueno & A. Ohori. "A fully concurrent garbage collector for
      > functional programs on multicore processors." /ACM SIGPLAN Notices/
      > Vol. 51. No. 9 (presented at ICFP 2016)
      This design is intended to allow both marking and sweeping
      concurrent to execution of a multi-core mutator. Unlike the Ueno design,
      which requires no global synchronization pauses, the collector
      introduced here requires a stop-the-world pause at the beginning and end
      of the mark phase.
      To avoid heap fragmentation, the allocator consists of a number of
      fixed-size /sub-allocators/. Each of these sub-allocators allocators into
      its own set of /segments/, themselves allocated from the block
      allocator. Each segment is broken into a set of fixed-size allocation
      blocks (which back allocations) in addition to a bitmap (used to track
      the liveness of blocks) and some additional metadata (used also used
      to track liveness).
      This heap structure enables collection via mark-and-sweep, which can be
      performed concurrently via a snapshot-at-the-beginning scheme (although
      concurrent collection is not implemented in this patch).
      Implementation structure
      The majority of the collector is implemented in a handful of files:
       * `rts/Nonmoving.c` is the heart of the beast. It implements the entry-point
         to the nonmoving collector (`nonmoving_collect`), as well as the allocator
         (`nonmoving_allocate`) and a number of utilities for manipulating the heap.
       * `rts/NonmovingMark.c` implements the mark queue functionality, update
         remembered set, and mark loop.
       * `rts/NonmovingSweep.c` implements the sweep loop.
       * `rts/NonmovingScav.c` implements the logic necessary to scavenge the
         nonmoving heap.
      Branch overview
       * wip/gc/opt-pause:
       |   A variety of small optimisations to further reduce pause times.
       * wip/gc/compact-nfdata:
       |   Introduce support for compact regions into the non-moving
       |\  collector
       | \
       |  \
       | | * wip/gc/segment-header-to-bdescr:
       | | |   Another optimization that we are considering, pushing
       | | |   some segment metadata into the segment descriptor for
       | | |   the sake of locality during mark
       | | |
       | * | wip/gc/shortcutting:
       | | |   Support for indirection shortcutting and the selector optimization
       | | |   in the non-moving heap.
       | | |
       * | | wip/gc/docs:
       | |/    Work on implementation documentation.
       | /
       * wip/gc/everything:
       |   A roll-up of everything below.
       | \
       | |\
       | | \
       | | * wip/gc/optimize:
       | | |   A variety of optimizations, primarily to the mark loop.
       | | |   Some of these are microoptimizations but a few are quite
       | | |   significant. In particular, the prefetch patches have
       | | |   produced a nontrivial improvement in mark performance.
       | | |
       | | * wip/gc/aging:
       | | |   Enable support for aging in major collections.
       | | |
       | * | wip/gc/test:
       | | |   Fix up the testsuite to more or less pass.
       | | |
       * | | wip/gc/instrumentation:
       | | |   A variety of runtime instrumentation including statistics
       | | /   support, the nonmoving census, and eventlog support.
       | |/
       | /
       * wip/gc/nonmoving-concurrent:
       |   The concurrent write barriers.
       * wip/gc/nonmoving-nonconcurrent:
       |   The nonmoving collector without the write barriers necessary
       |   for concurrent collection.
       * wip/gc/preparation:
       |   A merge of the various preparatory patches that aren't directly
       |   implementing the GC.
       * GHC HEAD
    • Ben Gamari's avatar
      base: Add @since on GHC.IO.Handle.Lock.hUnlock · 8abddac8
      Ben Gamari authored and Marge Bot's avatar Marge Bot committed
      Unfortunately this was introduced in base-4.11.0 (GHC 8.4.1)
      whereas the other locking primitives were added in base-4.10.0 (GHC
    • Ömer Sinan Ağacan's avatar
      Add new flag for unarised STG dumps · 266435a7
      Ömer Sinan Ağacan authored and Marge Bot's avatar Marge Bot committed
      Previously -ddump-stg would dump pre and post-unarise STGs. Now we have
      a new flag for post-unarise STG and -ddump-stg only dumps coreToStg
      STG dump flags after this commit:
      - -ddump-stg: Dumps CoreToStg output
      - -ddump-stg-unarised: Unarise output
      - -ddump-stg-final: STG right before code gen (includes CSE and lambda
    • Andreas Klebinger's avatar
      Hadrian: Invoke ghc0 via bash when running tests to fix #17362. · bb0dc5a5
      Andreas Klebinger authored and Marge Bot's avatar Marge Bot committed
      cmd uses RawCommand which uses Windows semantics to find the executable
      which sometimes seems to fail for unclear reasons.
      If we invoke ghc via bash then bash will find the ghc executable and
      the issue goes away.
    • Ben Gamari's avatar
      Drop duplicate -optl's from GHC invocations · 21663693
      Ben Gamari authored and Marge Bot's avatar Marge Bot committed
      Previously the make build system would pass things like
      `-optl-optl-Wl,-x -optl-optl-Wl,noexecstack` to GHC. This would
      naturally result in mass confusion as GHC would pass `-optl-Wl,-x` to
      GCC. GCC would in turn interpret this as `-o ptl-Wl,-x`, setting the
      output pass of the invocation.
      The problem that `-optl` was added to the command-line in two places in
      the build system. Fix this.
      Fixes #17385.
    • Ben Gamari's avatar
      users-guide: Fix :since: for -Wunused-packages · 4af20bbc
      Ben Gamari authored and Marge Bot's avatar Marge Bot committed
      Fixes #17382.
    • Matthew Pickering's avatar
      Performance tests: Reduce acceptance threshold for bytes allocated tests · 8dd480cc
      Matthew Pickering authored and Marge Bot's avatar Marge Bot committed
      The "new" performance testing infrastructure resets the baseline after
      every test so it's easy to miss gradual performance regressions over
      time. We should at least make these numbers smaller to catch patches
      which affect performance earlier.
    • Andreas Klebinger's avatar
      Make dynflag argument for withTiming pure. · 6beea836
      Andreas Klebinger authored and Marge Bot's avatar Marge Bot committed
      19 times out of 20 we already have dynflags in scope.
      We could just always use `return dflags`. But this is in fact not free.
      When looking at some STG code I noticed that we always allocate a
      closure for this expression in the heap. Clearly a waste in these cases.
      For the other cases we can either just modify the callsite to
      get dynflags or use the _D variants of withTiming I added which
      will use getDynFlags under the hood.
    • Ben Gamari's avatar
      Bump stm submodule · 9c1f0f7c
      Ben Gamari authored and Marge Bot's avatar Marge Bot committed
    • ryates@cs.rochester.edu's avatar
      Full abort on validate failure merging `orElse`. · 1f40e68a
      ryates@cs.rochester.edu authored and Marge Bot's avatar Marge Bot committed
      Previously partial roll back of a branch of an `orElse` was attempted
      if validation failure was observed.  Validation here, however, does
      not account for what part of the transaction observed inconsistent
      state.  This commit fixes this by fully aborting and restarting the
    • Andreas Klebinger's avatar
      Fix bug in the x86 backend involving the CFG. · aa778152
      Andreas Klebinger authored and Marge Bot's avatar Marge Bot committed
      This is part two of fixing #17334.
      There are two parts to this commit:
      - A bugfix for computing loop levels
      - A bugfix of basic block invariants in the NCG.
      In the first bug we ended up with a CFG of the sort: [A -> B -> C]
      This was represented via maps as fromList [(A,B),(B,C)] and later
      transformed into a adjacency array. However the transformation did
      not include block C in the array (since we only looked at the keys of
      the map).
      This was still fine until we tried to look up successors for C and tried
      to read outside of the array bounds when accessing C.
      In order to prevent this in the future I refactored to code to include
      all nodes as keys in the map representation. And make this a invariant
      which is checked in a few places.
      Overall I expect this to make the code more robust as now any failed
      lookup will represent an error, versus failed lookups sometimes being
      expected and sometimes not.
      In terms of performance this makes some things cheaper (getting a list
      of all nodes) and others more expensive (adding a new edge). Overall
      this adds up to no noteable performance difference.
      Part 2: When the NCG generated a new basic block, it did
      not always insert a NEWBLOCK meta instruction in the stream which
      caused a quite subtle bug.
          During instruction selection a statement `s`
          in a block B with control of the sort: B -> C
          will sometimes result in control
          flow of the sort:
                  ┌ < ┐
                  v   ^
            B ->  B1  ┴ -> C
          as is the case for some atomic operations.
          Now to keep the CFG in sync when introducing B1 we clearly
          want to insert it between B and C. However there is
          a catch when we have to deal with self loops.
          We might start with code and a CFG of these forms:
              stmt1               ┌ < ┐
              ....                v   ^
              stmtX              loop ┘
              goto loop:
          Now we introduce B1:
                                  ┌ ─ ─ ─ ─ ─┐
              loop:               │   ┌ <  ┐ │
              instrs              v   │    │ ^
              ....               loop ┴ B1 ┴ ┘
              goto loop:
          This is simple, all outgoing edges from loop now simply
          start from B1 instead and the code generator knows which
          new edges it introduced for the self loop of B1.
          Disaster strikes if the statement Y follows the same pattern.
          If we apply the same rule that all outgoing edges change then
          we end up with:
              loop ─> B1 ─> B2 ┬─┐
                │      │    └─<┤ │
                │      └───<───┘ │
          This is problematic. The edge B1->B1 is modified as expected.
          However the modification is wrong!
          The assembly in this case looked like this:
              cmpxchgq ...
              jne _B1
              <end _B1>
              cmpxchgq ...
              jne _B2
              jmp loop
          There is no edge _B2 -> _B1 here. It's still a self loop onto _B1.
          The problem here is that really B1 should be two basic blocks.
          Otherwise we have control flow in the *middle* of a basic block.
          A contradiction!
          So to account for this we add yet another basic block marker:
              cmpxchgq ...
              jne _B1
              jmp _B1'
              <end _B1>
          Now when inserting B2 we will only look at the outgoing edges of B1' and
          everything will work out nicely.
          You might also wonder why we don't insert jumps at the end of _B1'. There is
          no way another block ends up jumping to the labels _B1 or _B2 since they are
          essentially invisible to other blocks. View them as control flow labels local
          to the basic block if you'd like.
          Not doing this ultimately caused (part 2 of) #17334.
    • Takenobu Tani's avatar
      Allow command name resolution for GHCi commands with option `!` #17345 · 4798f3b9
      Takenobu Tani authored and Marge Bot's avatar Marge Bot committed
      This commit allows command name resolution for GHCi commands
      with option `!` as follows:
          ghci> :k! Int
          Int :: *
          = Int
      This commit changes implementation as follows:
        * Prefix match with full string including the option `!` (e.g. `k!`)
      After (this patch):
        * Prefix match without option suffix `!` (e.g. `k`)
        * in addition, suffix match with option `!`
      See also #8305 and #8113
    • Matthew Pickering's avatar
      eventlog: Dump cost centre stack on each sample · 17987a4b
      Matthew Pickering authored and Marge Bot's avatar Marge Bot committed
      With this change it is possible to reconstruct the timing portion of a
      `.prof` file after the fact. By logging the stacks at each time point
      a more precise executation trace of the program can be observed rather
      than all identical cost centres being identified in the report.
      There are two new events:
      1. `EVENT_PROF_BEGIN` - emitted at the start of profiling to communicate
      the tick interval
      2. `EVENT_PROF_SAMPLE_COST_CENTRE` - emitted on each tick to communicate the
      current call stack.
      Fixes #17322
    • Ömer Sinan Ağacan's avatar
      Refactor Compact.c: · b521e8b6
      Ömer Sinan Ağacan authored and Marge Bot's avatar Marge Bot committed
      - Remove forward declarations
      - Introduce UNTAG_PTR and GET_PTR_TAG for dealing with pointer tags
        without having to cast arguments to StgClosure*
      - Remove dead code
      - Use W_ instead of StgWord
      - Use P_ instead of StgPtr
    • Ben Gamari's avatar
      testsuite: Don't run T7653 in ghci and profiled ways · 9b2a5008
      Ben Gamari authored and Marge Bot's avatar Marge Bot committed
      Currently this routinely fails in the i386 job.
      See #7653.
    • Ryan Scott's avatar
      Reify oversaturated data family instances correctly (#17296) · a19c7d17
      Ryan Scott authored and Marge Bot's avatar Marge Bot committed
      `TcSplice` was not properly handling oversaturated data family
      instances, such as the example in #17296, as it dropped arguments due
      to carelessly zipping data family instance arguments with
      `tyConTyVars`. For data families, the number of `tyConTyVars` can
      sometimes be less than the number of arguments it can accept in a
      data family instance due to the fact that data family instances can
      be oversaturated.
      To account for this, `TcSplice.mkIsPolyTvs` has now been renamed to
      `tyConArgsPolyKinded` and now factors in `tyConResKind` in addition
      to `tyConTyVars`. I've also added
      `Note [Reified instances and explicit kind signatures]` which
      explains the various subtleties in play here.
      Fixes #17296.
    • Alp Mestanogullari's avatar
      compiler: introduce DynFlags plugins · 900cf195
      Alp Mestanogullari authored and Marge Bot's avatar Marge Bot committed
      They have type '[CommandLineOpts] -> Maybe (DynFlags -> IO DynFlags)'.
      All plugins that supply a non-Nothing 'dynflagsPlugin' will see their
      updates applied to the current DynFlags right after the plugins are
      One use case for this is to superseede !1580 for registering hooks
      from a plugin. Frontend/parser plugins were considered to achieve this
      but they respectively conflict with how this plugin is going to be used
      and don't allow overriding/modifying the DynFlags, which is how hooks have
      to be registered.
      This commit comes with a test, 'test-hook-plugin', that registers a "fake"
      meta hook that replaces TH expressions with the 0 integer literal.
    • Richard Eisenberg's avatar
      Implement a coverage checker for injectivity · 1cd3fa29
      Richard Eisenberg authored and Marge Bot's avatar Marge Bot committed
      This fixes #16512.
      There are lots of parts of this patch:
      * The main payload is in FamInst. See
      Note [Coverage condition for injective type families] there
      for the overview. But it doesn't fix the bug.
      * We now bump the reduction depth every time we discharge
      a CFunEqCan. See Note [Flatten when discharging CFunEqCan]
      in TcInteract.
      * Exploration of this revealed a new, easy to maintain invariant
      for CTyEqCans. See Note [Almost function-free] in TcRnTypes.
      * We also realized that type inference for injectivity was a
      bit incomplete. This means we exchanged lookupFlattenTyVar for
      rewriteTyVar. See Note [rewriteTyVar] in TcFlatten. The new
      function is monadic while the previous one was pure, necessitating
      some faff in TcInteract. Nothing too bad.
      * zonkCt did not maintain invariants on CTyEqCan. It's not worth
      the bother doing so, so we just transmute CTyEqCans to
      * The pure unifier was finding the fixpoint of the returned
      substitution, even when doing one-way matching (in tcUnifyTysWithTFs).
      Fixed now.
      Test cases: typecheck/should_fail/T16512{a,b}
    • Andreas Klebinger's avatar
      Warn about missing profiled libs when using the Interpreter. · faa30dcb
      Andreas Klebinger authored and Marge Bot's avatar Marge Bot committed
      When GHC itself, or it's interpreter is profiled we need to load
      profiled libraries as well.
      This requirement is not always obvious, especially when TH
      implicilty uses the interpreter.
      When the libs were not found we fall back to assuming the
      are in a DLL. This is usually not the case so now we warn
      users when we do so. This makes it more obvious what is
      happening and gives users a way to fix the issue.
      This fixes #17121.
    • David Feuer's avatar
      Use an IORef for QSemN · 96c5411a
      David Feuer authored and Marge Bot's avatar Marge Bot committed
      Replace the outer `MVar` in `QSemN` with an `IORef`. This should
      probably be lighter, and it removes the need for `uninterruptibleMask`.
      Previously Differential Revision https://phabricator.haskell.org/D4896
  3. 22 Oct, 2019 16 commits