1. 04 Jan, 2020 2 commits
  2. 31 Dec, 2019 1 commit
  3. 18 Dec, 2019 1 commit
    • Sylvain Henry's avatar
      Add GHC-API logging hooks · 58655b9d
      Sylvain Henry authored
      * Add 'dumpAction' hook to DynFlags.
      
      It allows GHC API users to catch dumped intermediate codes and
      information. The format of the dump (Core, Stg, raw text, etc.) is now
      reported allowing easier automatic handling.
      
      * Add 'traceAction' hook to DynFlags.
      
      Some dumps go through the trace mechanism (for instance unfoldings that
      have been considered for inlining). This is problematic because:
      1) dumps aren't written into files even with -ddump-to-file on
      2) dumps are written on stdout even with GHC API
      3) in this specific case, dumping depends on unsafe globally stored
      DynFlags which is bad for GHC API users
      
      We introduce 'traceAction' hook which allows GHC API to catch those
      traces and to avoid using globally stored DynFlags.
      
      * Avoid dumping empty logs via dumpAction/traceAction (but still write
      empty files to keep the existing behavior)
      58655b9d
  4. 07 Dec, 2019 1 commit
    • Gabor Greif's avatar
      Implement pointer tagging for big families (#14373) · 9897e8c8
      Gabor Greif authored
      Formerly we punted on these and evaluated constructors always got a tag
      of 1.
      
      We now cascade switches because we have to check the tag first and when
      it is MAX_PTR_TAG then get the precise tag from the info table and
      switch on that. The only technically tricky part is that the default
      case needs (logical) duplication. To do this we emit an extra label for
      it and branch to that from the second switch. This avoids duplicated
      codegen.
      
      Here's a simple example of the new code gen:
      
          data D = D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8
      
      On a 64-bit system previously all constructors would be tagged 1. With
      the new code gen D7 and D8 are tagged 7:
      
          [Lib.D7_con_entry() {
               ...
               {offset
                 c1eu: // global
                     R1 = R1 + 7;
                     call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
               }
           }]
      
          [Lib.D8_con_entry() {
               ...
               {offset
                 c1ez: // global
                     R1 = R1 + 7;
                     call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
               }
           }]
      
      When switching we now look at the info table only when the tag is 7. For
      example, if we derive Enum for the type above, the Cmm looks like this:
      
          c2Le:
              _s2Js::P64 = R1;
              _c2Lq::P64 = _s2Js::P64 & 7;
              switch [1 .. 7] _c2Lq::P64 {
                  case 1 : goto c2Lk;
                  case 2 : goto c2Ll;
                  case 3 : goto c2Lm;
                  case 4 : goto c2Ln;
                  case 5 : goto c2Lo;
                  case 6 : goto c2Lp;
                  case 7 : goto c2Lj;
              }
      
          // Read info table for tag
          c2Lj:
              _c2Lv::I64 = %MO_UU_Conv_W32_W64(I32[I64[_s2Js::P64 & (-8)] - 4]);
              if (_c2Lv::I64 != 6) goto c2Lu; else goto c2Lt;
      
      Generated Cmm sizes do not change too much, but binaries are very
      slightly larger, due to the fact that the new instructions are longer in
      encoded form. E.g. previously entry code for D8 above would be
      
          00000000000001c0 <Lib_D8_con_info>:
           1c0:	48 ff c3             	inc    %rbx
           1c3:	ff 65 00             	jmpq   *0x0(%rbp)
      
      With this patch
      
          00000000000001d0 <Lib_D8_con_info>:
           1d0:	48 83 c3 07          	add    $0x7,%rbx
           1d4:	ff 65 00             	jmpq   *0x0(%rbp)
      
      This is one byte longer.
      
      Secondly, reading info table directly and then switching is shorter
      
          _c1co:
                  movq -1(%rbx),%rax
                  movl -4(%rax),%eax
                  // Switch on info table tag
                  jmp *_n1d5(,%rax,8)
      
      than doing the same switch, and then for the tag 7 doing another switch:
      
          // When tag is 7
          _c1ct:
                  andq $-8,%rbx
                  movq (%rbx),%rax
                  movl -4(%rax),%eax
                  // Switch on info table tag
                  ...
      
      Some changes of binary sizes in actual programs:
      
      - In NoFib the worst case is 0.1% increase in benchmark "parser" (see
        NoFib results below). All programs get slightly larger.
      
      - Stage 2 compiler size does not change.
      
      - In "containers" (the library) size of all object files increases
        0.0005%. Size of the test program "bitqueue-properties" increases
        0.03%.
      
      nofib benchmarks kindly provided by Ömer (@osa1):
      
      NoFib Results
      =============
      
      --------------------------------------------------------------------------------
              Program           Size    Allocs    Instrs     Reads    Writes
      --------------------------------------------------------------------------------
                   CS          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  CSD          +0.0%      0.0%      0.0%     +0.0%     +0.0%
                   FS          +0.0%      0.0%      0.0%     +0.0%      0.0%
                    S          +0.0%      0.0%     -0.0%      0.0%      0.0%
                   VS          +0.0%      0.0%     -0.0%     +0.0%     +0.0%
                  VSD          +0.0%      0.0%     -0.0%     +0.0%     -0.0%
                  VSM          +0.0%      0.0%      0.0%      0.0%      0.0%
                 anna          +0.0%      0.0%     +0.1%     -0.9%     -0.0%
                 ansi          +0.0%      0.0%     -0.0%     +0.0%     +0.0%
                 atom          +0.0%      0.0%      0.0%      0.0%      0.0%
               awards          +0.0%      0.0%     -0.0%     +0.0%      0.0%
               banner          +0.0%      0.0%     -0.0%     +0.0%      0.0%
           bernouilli          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
         binary-trees          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
                boyer          +0.0%      0.0%     +0.0%      0.0%     -0.0%
               boyer2          +0.0%      0.0%     +0.0%      0.0%     -0.0%
                 bspt          +0.0%      0.0%     +0.0%     +0.0%      0.0%
            cacheprof          +0.0%      0.0%     +0.1%     -0.8%      0.0%
             calendar          +0.0%      0.0%     -0.0%     +0.0%     -0.0%
             cichelli          +0.0%      0.0%     +0.0%      0.0%      0.0%
              circsim          +0.0%      0.0%     -0.0%     -0.1%     -0.0%
             clausify          +0.0%      0.0%     +0.0%     +0.0%      0.0%
        comp_lab_zift          +0.0%      0.0%     +0.0%      0.0%     -0.0%
             compress          +0.0%      0.0%     +0.0%     +0.0%      0.0%
            compress2          +0.0%      0.0%      0.0%      0.0%      0.0%
          constraints          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
         cryptarithm1          +0.0%      0.0%     +0.0%      0.0%      0.0%
         cryptarithm2          +0.0%      0.0%     +0.0%     -0.0%      0.0%
                  cse          +0.0%      0.0%     +0.0%     +0.0%      0.0%
         digits-of-e1          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
         digits-of-e2          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
               dom-lt          +0.0%      0.0%     +0.0%     +0.0%      0.0%
                eliza          +0.0%      0.0%     -0.0%     +0.0%      0.0%
                event          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
          exact-reals          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
               exp3_8          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
               expert          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
       fannkuch-redux          +0.0%      0.0%     +0.0%      0.0%      0.0%
                fasta          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  fem          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
                  fft          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
                 fft2          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
             fibheaps          +0.0%      0.0%     +0.0%     +0.0%      0.0%
                 fish          +0.0%      0.0%     +0.0%     +0.0%      0.0%
                fluid          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
               fulsom          +0.0%      0.0%     +0.0%     -0.0%     +0.0%
               gamteb          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
                  gcd          +0.0%      0.0%     +0.0%     +0.0%      0.0%
          gen_regexps          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
               genfft          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
                   gg          +0.0%      0.0%      0.0%     -0.0%      0.0%
                 grep          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
               hidden          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
                  hpg          +0.0%      0.0%     +0.0%     -0.1%     -0.0%
                  ida          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
                infer          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
              integer          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
            integrate          +0.0%      0.0%      0.0%     +0.0%      0.0%
         k-nucleotide          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
                kahan          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
              knights          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
               lambda          +0.0%      0.0%     +1.2%     -6.1%     -0.0%
           last-piece          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
                 lcss          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
                 life          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
                 lift          +0.0%      0.0%     +0.0%     +0.0%      0.0%
               linear          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
            listcompr          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
             listcopy          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
             maillist          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
               mandel          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
              mandel2          +0.0%      0.0%     +0.0%     +0.0%     -0.0%
                 mate          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
              minimax          +0.0%      0.0%     -0.0%     +0.0%     -0.0%
              mkhprog          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
           multiplier          +0.0%      0.0%      0.0%     +0.0%     -0.0%
               n-body          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
             nucleic2          +0.0%      0.0%     +0.0%     +0.0%     -0.0%
                 para          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
            paraffins          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
               parser          +0.1%      0.0%     +0.4%     -1.7%     -0.0%
              parstof          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  pic          +0.0%      0.0%     +0.0%      0.0%     -0.0%
             pidigits          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
                power          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
               pretty          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
               primes          +0.0%      0.0%     +0.0%      0.0%      0.0%
            primetest          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
               prolog          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
               puzzle          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
               queens          +0.0%      0.0%      0.0%     +0.0%     +0.0%
              reptile          +0.0%      0.0%     +0.0%     +0.0%      0.0%
      reverse-complem          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
              rewrite          +0.0%      0.0%     +0.0%      0.0%     -0.0%
                 rfib          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
                  rsa          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
                  scc          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
                sched          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
                  scs          +0.0%      0.0%     +0.0%     +0.0%      0.0%
               simple          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
                solid          +0.0%      0.0%     +0.0%     +0.0%      0.0%
              sorting          +0.0%      0.0%     +0.0%     -0.0%      0.0%
        spectral-norm          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
               sphere          +0.0%      0.0%     +0.0%     -1.0%      0.0%
               symalg          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
                  tak          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
            transform          +0.0%      0.0%     +0.4%     -1.3%     +0.0%
             treejoin          +0.0%      0.0%     +0.0%     -0.0%      0.0%
            typecheck          +0.0%      0.0%     -0.0%     +0.0%      0.0%
              veritas          +0.0%      0.0%     +0.0%     -0.1%     +0.0%
                 wang          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
            wave4main          +0.0%      0.0%     +0.0%      0.0%     -0.0%
         wheel-sieve1          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
         wheel-sieve2          +0.0%      0.0%     +0.0%     +0.0%      0.0%
                 x2n1          +0.0%      0.0%     +0.0%     +0.0%      0.0%
      --------------------------------------------------------------------------------
                  Min          +0.0%      0.0%     -0.0%     -6.1%     -0.0%
                  Max          +0.1%      0.0%     +1.2%     +0.0%     +0.0%
       Geometric Mean          +0.0%     -0.0%     +0.0%     -0.1%     -0.0%
      
      NoFib GC Results
      ================
      
      --------------------------------------------------------------------------------
              Program           Size    Allocs    Instrs     Reads    Writes
      --------------------------------------------------------------------------------
              circsim          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
          constraints          +0.0%      0.0%     -0.0%      0.0%     -0.0%
             fibheaps          +0.0%      0.0%      0.0%     -0.0%     -0.0%
               fulsom          +0.0%      0.0%      0.0%     -0.6%     -0.0%
             gc_bench          +0.0%      0.0%      0.0%      0.0%     -0.0%
                 hash          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 lcss          +0.0%      0.0%      0.0%     -0.0%      0.0%
            mutstore1          +0.0%      0.0%      0.0%     -0.0%     -0.0%
            mutstore2          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
                power          +0.0%      0.0%     -0.0%      0.0%     -0.0%
           spellcheck          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
      --------------------------------------------------------------------------------
                  Min          +0.0%      0.0%     -0.0%     -0.6%     -0.0%
                  Max          +0.0%      0.0%     +0.0%      0.0%      0.0%
       Geometric Mean          +0.0%     +0.0%     +0.0%     -0.1%     +0.0%
      
      Fixes #14373
      
      These performance regressions appear to be a fluke in CI. See the
      discussion in !1742 for details.
      
      Metric Increase:
          T6048
          T12234
          T12425
          Naperian
          T12150
          T5837
          T13035
      9897e8c8
  5. 03 Dec, 2019 1 commit
  6. 28 Nov, 2019 1 commit
  7. 24 Nov, 2019 1 commit
  8. 17 Nov, 2019 1 commit
  9. 13 Nov, 2019 1 commit
  10. 23 Oct, 2019 1 commit
    • Andreas Klebinger's avatar
      Make dynflag argument for withTiming pure. · 6beea836
      Andreas Klebinger authored
      19 times out of 20 we already have dynflags in scope.
      
      We could just always use `return dflags`. But this is in fact not free.
      When looking at some STG code I noticed that we always allocate a
      closure for this expression in the heap. Clearly a waste in these cases.
      
      For the other cases we can either just modify the callsite to
      get dynflags or use the _D variants of withTiming I added which
      will use getDynFlags under the hood.
      6beea836
  11. 21 Oct, 2019 1 commit
    • Ben Gamari's avatar
      rts: Implement concurrent collection in the nonmoving collector · bd8e3ff4
      Ben Gamari authored
      This extends the non-moving collector to allow concurrent collection.
      
      The full design of the collector implemented here is described in detail
      in a technical note
      
          B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
          Compiler" (2018)
      
      This extension involves the introduction of a capability-local
      remembered set, known as the /update remembered set/, which tracks
      objects which may no longer be visible to the collector due to mutation.
      To maintain this remembered set we introduce a write barrier on
      mutations which is enabled while a concurrent mark is underway.
      
      The update remembered set representation is similar to that of the
      nonmoving mark queue, being a chunked array of `MarkEntry`s. Each
      `Capability` maintains a single accumulator chunk, which it flushed
      when it (a) is filled, or (b) when the nonmoving collector enters its
      post-mark synchronization phase.
      
      While the write barrier touches a significant amount of code it is
      conceptually straightforward: the mutator must ensure that the referee
      of any pointer it overwrites is added to the update remembered set.
      However, there are a few details:
      
       * In the case of objects with a dirty flag (e.g. `MVar`s) we can
         exploit the fact that only the *first* mutation requires a write
         barrier.
      
       * Weak references, as usual, complicate things. In particular, we must
         ensure that the referee of a weak object is marked if dereferenced by
         the mutator. For this we (unfortunately) must introduce a read
         barrier, as described in Note [Concurrent read barrier on deRefWeak#]
         (in `NonMovingMark.c`).
      
       * Stable names are also a bit tricky as described in Note [Sweeping
         stable names in the concurrent collector] (`NonMovingSweep.c`).
      
      We take quite some pains to ensure that the high thread count often seen
      in parallel Haskell applications doesn't affect pause times. To this end
      we allow thread stacks to be marked either by the thread itself (when it
      is executed or stack-underflows) or the concurrent mark thread (if the
      thread owning the stack is never scheduled). There is a non-trivial
      handshake to ensure that this happens without racing which is described
      in Note [StgStack dirtiness flags and concurrent marking].
      Co-Authored-by: Ömer Sinan Ağacan's avatarÖmer Sinan Ağacan <omer@well-typed.com>
      bd8e3ff4
  12. 16 Oct, 2019 1 commit
    • Andreas Klebinger's avatar
      Add loop level analysis to the NCG backend. · 535a88e1
      Andreas Klebinger authored
      For backends maintaining the CFG during codegen
      we can now find loops and their nesting level.
      
      This is based on the Cmm CFG and dominator analysis.
      
      As a result we can estimate edge frequencies a lot better
      for methods, resulting in far better code layout.
      
      Speedup on nofib: ~1.5%
      Increase in compile times: ~1.9%
      
      To make this feasible this commit adds:
      * Dominator analysis based on the Lengauer-Tarjan Algorithm.
      * An algorithm estimating global edge frequences from branch
      probabilities - In CFG.hs
      
      A few static branch prediction heuristics:
      
      * Expect to take the backedge in loops.
      * Expect to take the branch NOT exiting a loop.
      * Expect integer vs constant comparisons to be false.
      
      We also treat heap/stack checks special for branch prediction
      to avoid them being treated as loops.
      535a88e1
  13. 25 Sep, 2019 1 commit
    • Sebastian Graf's avatar
      PmCheck: Only ever check constantly many models against a single pattern · ebc65025
      Sebastian Graf authored
      Introduces a new flag `-fmax-pmcheck-deltas` to achieve that. Deprecates
      the old `-fmax-pmcheck-iter` mechanism in favor of this new flag.
      
      From the user's guide:
      
      Pattern match checking can be exponential in some cases. This limit makes sure
      we scale polynomially in the number of patterns, by forgetting refined
      information gained from a partially successful match. For example, when
      matching `x` against `Just 4`, we split each incoming matching model into two
      sub-models: One where `x` is not `Nothing` and one where `x` is `Just y` but
      `y` is not `4`. When the number of incoming models exceeds the limit, we
      continue checking the next clause with the original, unrefined model.
      
      This also retires the incredibly hard to understand "maximum number of
      refinements" mechanism, because the current mechanism is more general
      and should catch the same exponential cases like PrelRules at the same
      time.
      
      -------------------------
      Metric Decrease:
          T11822
      -------------------------
      ebc65025
  14. 20 Sep, 2019 1 commit
    • Alp Mestanogullari's avatar
      ErrUtils: split withTiming into withTiming and withTimingSilent · b3e5c731
      Alp Mestanogullari authored
      'withTiming' becomes a function that, when passed '-vN' (N >= 2) or
      '-ddump-timings', will print timing (and possibly allocations) related
      information. When additionally built with '-eventlog' and executed with
      '+RTS -l', 'withTiming' will also emit both 'traceMarker' and 'traceEvent'
      events to the eventlog.
      
      'withTimingSilent' on the other hand will never print any timing information,
      under any circumstance, and will only emit 'traceEvent' events to the eventlog.
      As pointed out in !1672, 'traceMarker' is better suited for things that we
      might want to visualize in tools like eventlog2html, while 'traceEvent'
      is better suited for internal events that occur a lot more often and that we
      don't necessarily want to visualize.
      
      This addresses #17138 by using 'withTimingSilent' for all the codegen bits
      that are expressed as a bunch of small computations over streams of codegen
      ASTs.
      b3e5c731
  15. 09 Sep, 2019 1 commit
    • Sylvain Henry's avatar
      Module hierarchy: StgToCmm (#13009) · 447864a9
      Sylvain Henry authored
      Add StgToCmm module hierarchy. Platform modules that are used in several
      other places (NCG, LLVM codegen, Cmm transformations) are put into
      GHC.Platform.
      447864a9
  16. 05 Sep, 2019 1 commit
  17. 02 Sep, 2019 1 commit
  18. 29 Aug, 2019 1 commit
    • Ömer Sinan Ağacan's avatar
      Small optimization in the SRT algorithm · 304067a0
      Ömer Sinan Ağacan authored
      Noticed by @simonmar in !1362:
      
          If the srtEntry is Nothing, then it should be safe to omit
          references to this SRT from other SRTs, even if it is a static
          function.
      
      When updating SRT map we don't omit references to static functions (see
      Note [Invalid optimisation: shortcutting]), but there's no reason to add
      an SRT entry for a static function if the function is not CAFFY.
      
      (Previously we'd add SRT entries for static functions even when they're
      not CAFFY)
      
      Using 9151b99e I checked sizes of all SRTs when building GHC and
      containers:
      
      - GHC: 583736 (HEAD), 581695 (this patch). 2041 less SRT entries.
      - containers: 2457 (HEAD), 2381 (this patch). 76 less SRT entries.
      304067a0
  19. 28 Aug, 2019 1 commit
    • Ömer Sinan Ağacan's avatar
      Return results of Cmm streams in backends · 1c7ec449
      Ömer Sinan Ağacan authored
      This generalizes code generators (outputAsm, outputLlvm, outputC, and
      the call site codeOutput) so that they'll return the return values of
      the passed Cmm streams.
      
      This allows accumulating data during Cmm generation and returning it to
      the call site in HscMain.
      
      Previously the Cmm streams were assumed to return (), so the code
      generators returned () as well.
      
      This change is required by !1304 and !1530.
      
      Skipping CI as this was tested before and I only updated the commit
      message.
      
      [skip ci]
      1c7ec449
  20. 23 Aug, 2019 1 commit
    • Ömer Sinan Ağacan's avatar
      Make non-streaming LLVM and C backends streaming · a8300520
      Ömer Sinan Ağacan authored
      This adds a Stream.consume function, uses it in LLVM and C code
      generators, and removes the use of Stream.collect function which was
      used to collect streaming Cmm generation results into a list.
      
      LLVM and C backends now properly use streamed Cmm generation, instead of
      collecting Cmm groups into a list before generating LLVM/C code.
      a8300520
  21. 15 Aug, 2019 1 commit
    • James Foster's avatar
      Remove unused imports of the form 'import foo ()' (Fixes #17065) · ca71d551
      James Foster authored
      These kinds of imports are necessary in some cases such as
      importing instances of typeclasses or intentionally creating
      dependencies in the build system, but '-Wunused-imports' can't
      detect when they are no longer needed. This commit removes the
      unused ones currently in the code base (not including test files
      or submodules), with the hope that doing so may increase
      parallelism in the build system by removing unnecessary
      dependencies.
      ca71d551
  22. 07 Aug, 2019 1 commit
  23. 03 Aug, 2019 1 commit
  24. 26 Jul, 2019 1 commit
  25. 17 Jul, 2019 1 commit
    • John Ericson's avatar
      Create {Int,Word}32Rep · 0a9b77b8
      John Ericson authored
      This prepares the way for making Int32# and Word32# the actual size they
      claim to be.
      
      Updates binary submodule for (de)serializing the new runtime reps.
      0a9b77b8
  26. 16 Jul, 2019 1 commit
  27. 13 Jul, 2019 2 commits
    • Ömer Sinan Ağacan's avatar
      Minor refactoring in CmmBuildInfoTables · a7176fa1
      Ömer Sinan Ağacan authored
      - Replace `catMaybes (map ...)` with `mapMaybe ...`
      - Remove a list->set->list conversion
      a7176fa1
    • Andreas Klebinger's avatar
      Add two CmmSwitch optimizations. · 348cc8eb
      Andreas Klebinger authored
      Move switch expressions into a local variable when generating switches.
      This avoids duplicating the expression if we translate the switch
      to a tree search. This fixes #16933.
      
      Further we now check if all branches of a switch have the same
      destination, replacing the switch with a direct branch if that
      is the case.
      
      Both of these patterns appear in the ENTER macro used by the RTS
      but are unlikely to occur in intermediate Cmm generated by GHC.
      
      Nofib result summary:
      
      --------------------------------------------------------------------------------
              Program           Size    Allocs   Runtime   Elapsed  TotalMem
      --------------------------------------------------------------------------------
                  Min          -0.0%     -0.0%    -15.7%    -15.6%      0.0%
                  Max          -0.0%      0.0%     +5.4%     +5.5%      0.0%
       Geometric Mean          -0.0%     -0.0%     -1.0%     -1.0%     -0.0%
      
      Compiler allocations go up slightly: +0.2%
      
      Example output before and after the change taken from RTS code below.
      
      All but one of the memory loads `I32[_c3::I64 - 8]` are eliminated.
      Instead the data is loaded once from memory in block c6.
      
      Also the switch in block `ud` in the original code has been
      eliminated completely.
      
      Cmm without this commit:
      
      ```
      stg_ap_0_fast() { //  [R1]
              { []
              }
          {offset
            ca: _c1::P64 = R1;   // CmmAssign
                goto c2;   // CmmBranch
            c2: if (_c1::P64 & 7 != 0) goto c4; else goto c6;
            c6: _c3::I64 = I64[_c1::P64];
                if (I32[_c3::I64 - 8] < 26 :: W32) goto ub; else goto ug;
            ub: if (I32[_c3::I64 - 8] < 15 :: W32) goto uc; else goto ue;
            uc: if (I32[_c3::I64 - 8] < 8 :: W32) goto c7; else goto ud;
            ud: switch [8 .. 14] (%MO_SS_Conv_W32_W64(I32[_c3::I64 - 8])) {
                    case 8, 9, 10, 11, 12, 13, 14 : goto c4;
                }
            ue: if (I32[_c3::I64 - 8] >= 25 :: W32) goto c4; else goto uf;
            uf: if (%MO_SS_Conv_W32_W64(I32[_c3::I64 - 8]) != 23) goto c7; else goto c4;
            c4: R1 = _c1::P64;
                call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
            ug: if (I32[_c3::I64 - 8] < 28 :: W32) goto uh; else goto ui;
            uh: if (I32[_c3::I64 - 8] < 27 :: W32) goto c7; else goto c8;
            ui: if (I32[_c3::I64 - 8] < 29 :: W32) goto c8; else goto c7;
            c8: _c1::P64 = P64[_c1::P64 + 8];
                goto c2;
            c7: R1 = _c1::P64;
                call (_c3::I64)(R1) args: 8, res: 0, upd: 8;
          }
      }
      ```
      
      Cmm with this commit:
      
      ```
      stg_ap_0_fast() { //  [R1]
              { []
              }
          {offset
            ca: _c1::P64 = R1;
                goto c2;
            c2: if (_c1::P64 & 7 != 0) goto c4; else goto c6;
            c6: _c3::I64 = I64[_c1::P64];
                _ub::I64 = %MO_SS_Conv_W32_W64(I32[_c3::I64 - 8]);
                if (_ub::I64 < 26) goto uc; else goto uh;
            uc: if (_ub::I64 < 15) goto ud; else goto uf;
            ud: if (_ub::I64 < 8) goto c7; else goto c4;
            uf: if (_ub::I64 >= 25) goto c4; else goto ug;
            ug: if (_ub::I64 != 23) goto c7; else goto c4;
            c4: R1 = _c1::P64;
                call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
            uh: if (_ub::I64 < 28) goto ui; else goto uj;
            ui: if (_ub::I64 < 27) goto c7; else goto c8;
            uj: if (_ub::I64 < 29) goto c8; else goto c7;
            c8: _c1::P64 = P64[_c1::P64 + 8];
                goto c2;
            c7: R1 = _c1::P64;
                call (_c3::I64)(R1) args: 8, res: 0, upd: 8;
          }
      }
      ```
      348cc8eb
  28. 03 Jul, 2019 1 commit
  29. 28 Jun, 2019 1 commit
    • Travis Whitaker's avatar
      Correct closure observation, construction, and mutation on weak memory machines. · 11bac115
      Travis Whitaker authored
      Here the following changes are introduced:
          - A read barrier machine op is added to Cmm.
          - The order in which a closure's fields are read and written is changed.
          - Memory barriers are added to RTS code to ensure correctness on
            out-or-order machines with weak memory ordering.
      
      Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this
      is lowered to an instruction that ensures memory reads that occur after said
      instruction in program order are not performed before reads coming before said
      instruction in program order. On machines with strong memory ordering properties
      (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so
      MO_ReadBarrier is simply erased. However, such an instruction is necessary on
      weakly ordered machines, e.g. ARM and PowerPC.
      
      Weam memory ordering has consequences for how closures are observed and mutated.
      For example, consider a closure that needs to be updated to an indirection. In
      order for the indirection to be safe for concurrent observers to enter, said
      observers must read the indirection's info table before they read the
      indirectee. Furthermore, the entering observer makes assumptions about the
      closure based on its info table contents, e.g. an INFO_TYPE of IND imples the
      closure has an indirectee pointer that is safe to follow.
      
      When a closure is updated with an indirection, both its info table and its
      indirectee must be written. With weak memory ordering, these two writes can be
      arbitrarily reordered, and perhaps even interleaved with other threads' reads
      and writes (in the absence of memory barrier instructions). Consider this
      example of a bad reordering:
      
      - An updater writes to a closure's info table (INFO_TYPE is now IND).
      - A concurrent observer branches upon reading the closure's INFO_TYPE as IND.
      - A concurrent observer reads the closure's indirectee and enters it. (!!!)
      - An updater writes the closure's indirectee.
      
      Here the update to the indirectee comes too late and the concurrent observer has
      jumped off into the abyss. Speculative execution can also cause us issues,
      consider:
      
      - An observer is about to case on a value in closure's info table.
      - The observer speculatively reads one or more of closure's fields.
      - An updater writes to closure's info table.
      - The observer takes a branch based on the new info table value, but with the
        old closure fields!
      - The updater writes to the closure's other fields, but its too late.
      
      Because of these effects, reads and writes to a closure's info table must be
      ordered carefully with respect to reads and writes to the closure's other
      fields, and memory barriers must be placed to ensure that reads and writes occur
      in program order. Specifically, updates to a closure must follow the following
      pattern:
      
      - Update the closure's (non-info table) fields.
      - Write barrier.
      - Update the closure's info table.
      
      Observing a closure's fields must follow the following pattern:
      
      - Read the closure's info pointer.
      - Read barrier.
      - Read the closure's (non-info table) fields.
      
      This patch updates RTS code to obey this pattern. This should fix long-standing
      SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting
      out-of-order execution) and PowerPC. This fixes issue #15449.
      Co-Authored-By: Ben Gamari's avatarBen Gamari <ben@well-typed.com>
      11bac115
  30. 25 Jun, 2019 1 commit
  31. 20 Jun, 2019 1 commit
    • John Ericson's avatar
      Move 'Platform' to ghc-boot · bff2f24b
      John Ericson authored
      ghc-pkg needs to be aware of platforms so it can figure out which
      subdire within the user package db to use. This is admittedly
      roundabout, but maybe Cabal could use the same notion of a platform as
      GHC to good affect too.
      bff2f24b
  32. 19 Jun, 2019 1 commit
  33. 16 Jun, 2019 1 commit
  34. 12 Jun, 2019 1 commit
  35. 09 Jun, 2019 1 commit
  36. 08 Jun, 2019 1 commit
  37. 29 May, 2019 1 commit
    • John Ericson's avatar
      Inline `Settings` into `DynFlags` · bfccd832
      John Ericson authored
      After the previous commit, `Settings` is just a thin wrapper around
      other groups of settings. While `Settings` is used by GHC-the-executable
      to initalize `DynFlags`, in principle another consumer of
      GHC-the-library could initialize `DynFlags` a different way. It
      therefore doesn't make sense for `DynFlags` itself (library code) to
      separate the settings that typically come from `Settings` from the
      settings that typically don't.
      bfccd832
  38. 27 May, 2019 1 commit
    • Ömer Sinan Ağacan's avatar
      Add missing opening braces in Cmm dumps · db8e3275
      Ömer Sinan Ağacan authored
      Previously -ddump-cmm was generating code with unbalanced curly braces:
      
           stg_atomically_entry() //  [R1]
                   { info_tbls: [(cfl,
                                  label: stg_atomically_info
                                  rep: tag:16 HeapRep 1 ptrs { Thunk }
                                  srt: Nothing)]
                     stack_info: arg_space: 8 updfr_space: Just 8
                   }
               {offset
                 cfl: // cfk
                     unwind Sp = Just Sp + 0;
                     _cfk::P64 = R1;
                     //tick src<rts/PrimOps.cmm:(1243,1)-(1245,1)>
                     R1 = I64[_cfk::P64 + 8 + 8 + 0 * 8];
                     call stg_atomicallyzh(R1) args: 8, res: 0, upd: 8;
               }
           }, <---- OPENING BRACE MISSING
      
      After this patch:
      
           stg_atomically_entry() { //  [R1] <---- MISSING OPENING BRACE HERE
                   { info_tbls: [(cfl,
                                  label: stg_atomically_info
                                  rep: tag:16 HeapRep 1 ptrs { Thunk }
                                  srt: Nothing)]
                     stack_info: arg_space: 8 updfr_space: Just 8
                   }
               {offset
                 cfl: // cfk
                     unwind Sp = Just Sp + 0;
                     _cfk::P64 = R1;
                     //tick src<rts/PrimOps.cmm:(1243,1)-(1245,1)>
                     R1 = I64[_cfk::P64 + 8 + 8 + 0 * 8];
                     call stg_atomicallyzh(R1) args: 8, res: 0, upd: 8;
               }
           },
      db8e3275