Skip to content
Snippets Groups Projects
  1. Jul 12, 2019
    • Andreas Klebinger's avatar
      Add two CmmSwitch optimizations. · 9fe1e067
      Andreas Klebinger authored
      Move switch expressions into a local variable when generating switches.
      This avoids duplicating the expression if we translate the switch
      to a tree search. This fixes #16933.
      
      Further we now check if all branches of a switch have the same
      destination, replacing the switch with a direct branch if that
      is the case.
      
      Both of these patterns appear in the ENTER macro used by the RTS
      but are unlikely to occur in intermediate Cmm generated by GHC.
      
      Nofib result summary:
      
      --------------------------------------------------------------------------------
              Program           Size    Allocs   Runtime   Elapsed  TotalMem
      --------------------------------------------------------------------------------
                  Min          -0.0%     -0.0%    -15.7%    -15.6%      0.0%
                  Max          -0.0%      0.0%     +5.4%     +5.5%      0.0%
       Geometric Mean          -0.0%     -0.0%     -1.0%     -1.0%     -0.0%
      
      Compiler allocations go up slightly: +0.2%
      
      Example output before and after the change taken from RTS code below.
      
      All but one of the memory loads `I32[_c3::I64 - 8]` are eliminated.
      Instead the data is loaded once from memory in block c6.
      
      Also the switch in block `ud` in the original code has been
      eliminated completely.
      
      Cmm without this commit:
      
      ```
      stg_ap_0_fast() { //  [R1]
              { []
              }
          {offset
            ca: _c1::P64 = R1;   // CmmAssign
                goto c2;   // CmmBranch
            c2: if (_c1::P64 & 7 != 0) goto c4; else goto c6;
            c6: _c3::I64 = I64[_c1::P64];
                if (I32[_c3::I64 - 8] < 26 :: W32) goto ub; else goto ug;
            ub: if (I32[_c3::I64 - 8] < 15 :: W32) goto uc; else goto ue;
            uc: if (I32[_c3::I64 - 8] < 8 :: W32) goto c7; else goto ud;
            ud: switch [8 .. 14] (%MO_SS_Conv_W32_W64(I32[_c3::I64 - 8])) {
                    case 8, 9, 10, 11, 12, 13, 14 : goto c4;
                }
            ue: if (I32[_c3::I64 - 8] >= 25 :: W32) goto c4; else goto uf;
            uf: if (%MO_SS_Conv_W32_W64(I32[_c3::I64 - 8]) != 23) goto c7; else goto c4;
            c4: R1 = _c1::P64;
                call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
            ug: if (I32[_c3::I64 - 8] < 28 :: W32) goto uh; else goto ui;
            uh: if (I32[_c3::I64 - 8] < 27 :: W32) goto c7; else goto c8;
            ui: if (I32[_c3::I64 - 8] < 29 :: W32) goto c8; else goto c7;
            c8: _c1::P64 = P64[_c1::P64 + 8];
                goto c2;
            c7: R1 = _c1::P64;
                call (_c3::I64)(R1) args: 8, res: 0, upd: 8;
          }
      }
      ```
      
      Cmm with this commit:
      
      ```
      stg_ap_0_fast() { //  [R1]
              { []
              }
          {offset
            ca: _c1::P64 = R1;
                goto c2;
            c2: if (_c1::P64 & 7 != 0) goto c4; else goto c6;
            c6: _c3::I64 = I64[_c1::P64];
                _ub::I64 = %MO_SS_Conv_W32_W64(I32[_c3::I64 - 8]);
                if (_ub::I64 < 26) goto uc; else goto uh;
            uc: if (_ub::I64 < 15) goto ud; else goto uf;
            ud: if (_ub::I64 < 8) goto c7; else goto c4;
            uf: if (_ub::I64 >= 25) goto c4; else goto ug;
            ug: if (_ub::I64 != 23) goto c7; else goto c4;
            c4: R1 = _c1::P64;
                call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
            uh: if (_ub::I64 < 28) goto ui; else goto uj;
            ui: if (_ub::I64 < 27) goto c7; else goto c8;
            uj: if (_ub::I64 < 29) goto c8; else goto c7;
            c8: _c1::P64 = P64[_c1::P64 + 8];
                goto c2;
            c7: R1 = _c1::P64;
                call (_c3::I64)(R1) args: 8, res: 0, upd: 8;
          }
      }
      ```
      9fe1e067
  2. Jul 08, 2019
  3. Jul 05, 2019
    • Alex D's avatar
      Fix #16895 by checking whether infix expression operator is a variable · 2fd1ed54
      Alex D authored and Marge Bot's avatar Marge Bot committed
      2fd1ed54
    • Ryan Scott's avatar
      More sensible SrcSpans for recursive pattern synonym errors (#16900) · 62b82135
      Ryan Scott authored and Marge Bot's avatar Marge Bot committed
      Attach the `SrcSpan` of the first pattern synonym binding involved in
      the recursive group when throwing the corresponding error message,
      similarly to how it is done for type synonyms.
      
      Fixes #16900.
      62b82135
    • Artem Pelenitsyn's avatar
      Make all submodules have absolute URLs · a76b233d
      Artem Pelenitsyn authored and Marge Bot's avatar Marge Bot committed
      The relative URLs were a workaround to let most contributors fork from
      Github due to a weakness in the haskell.org server.
      
      This workaround is no longer needed. And relative submodule URLs are
      an impediment to forking which makes contributions harder than they
      should be.
      
      The URLs are chosen to clone from https, because this makes sure that
      anybody, even not a registered Gitlab user, can clone a fork
      recursively.
      a76b233d
    • Andreas Klebinger's avatar
      Dont gather ticks when only striping them in STG. · f002250a
      Andreas Klebinger authored and Marge Bot's avatar Marge Bot committed
      Adds stripStgTicksTopE which only returns the stripped expression.
      So far we also allocated a list for the stripped ticks which was
      never used.
      
      Allocation difference is as expected very small but present.
      About 0.02% difference when compiling with -O.
      f002250a
    • Simon Peyton Jones's avatar
      Fix over-eager implication constraint discard · 80afdf6b
      Simon Peyton Jones authored and Marge Bot's avatar Marge Bot committed
      Ticket #16247 showed that we were discarding an implication
      constraint that had empty ic_wanted, when we still needed to
      keep it so we could check whether it had a bad telescope.
      
      Happily it's a one line fix.  All the rest is comments!
      80afdf6b
    • Daniel Gröber (dxld)'s avatar
      rts: Fix -hT option with profiling rts · ed662901
      Daniel Gröber (dxld) authored and Marge Bot's avatar Marge Bot committed
      In dumpCensus we switch/case on doHeapProfile twice. The second switch
      tries to barf on unknown doHeapProfile modes but HEAP_BY_CLOSURE_TYPE is
      checked by the first switch and not included in the second.
      
      So when trying to pass -hT to the profiling rts it barfs.
      
      This commit simply merges the two switches into one which fixes this
      problem.
      ed662901
    • Simon Peyton Jones's avatar
      Add a missing zonk (fixes #16902) · 53aa59f3
      Simon Peyton Jones authored and Marge Bot's avatar Marge Bot committed
      In the eager unifier, when unifying (tv1 ~ tv2),
      when we decide to swap them over, to unify (tv2 ~ tv1),
      I'd forgotten to ensure that tv1's kind was fully zonked,
      which is an invariant of uUnfilledTyVar2.
      
      That could lead us to build an infinite kind, or (in the
      case of #16902) update the same unification variable twice.
      
      Yikes.
      
      Now we get an error message rather than non-termination,
      which is much better.  The error message is not great,
      but it's a very strange program, and I can't see an easy way
      to improve it, so for now I'm just committing this fix.
      
      Here's the decl
       data F (a :: k) :: (a ~~ k) => Type where
          MkF :: F a
      
      and the rather error message of which I am not proud
      
        T16902.hs:11:10: error:
          • Expected a type, but found something with kind ‘a1’
          • In the type ‘F a’
      53aa59f3
    • Vladislav Zavialov's avatar
      Produce all DerivInfo in tcTyAndClassDecls · 679427f8
      Vladislav Zavialov authored and Marge Bot's avatar Marge Bot committed
      Before this refactoring:
      
      * DerivInfo for data family instances was returned from tcTyAndClassDecls
      * DerivInfo for data declarations was generated with mkDerivInfos and added at a
        later stage of the pipeline in tcInstDeclsDeriv
      
      After this refactoring:
      
      * DerivInfo for both data family instances and data declarations is returned from
        tcTyAndClassDecls in a single list.
      
      This uniform treatment results in a more convenient arrangement to fix #16731.
      679427f8
    • Ben Gamari's avatar
      gitlab: Reduce size of template headings · 675d27fc
      Ben Gamari authored and Marge Bot's avatar Marge Bot committed
      675d27fc
    • Siddharth Bhat's avatar
      Make printer untag when chasing a pointer in a RET_FUN frame · d7f7e1ed
      Siddharth Bhat authored and Marge Bot's avatar Marge Bot committed
      This is to mimic what `Scav.c` does. This should fix a crash in
      the printer.
      d7f7e1ed
  4. Jul 04, 2019
  5. Jul 03, 2019
  6. Jul 02, 2019
  7. Jun 28, 2019
    • Ben Gamari's avatar
      rts: Assert that LDV profiling isn't used with parallel GC · bd660ede
      Ben Gamari authored
      I'm not entirely sure we are careful about ensuring this; this is a
      last-ditch check.
      bd660ede
    • Travis Whitaker's avatar
      Correct closure observation, construction, and mutation on weak memory machines. · 11bac115
      Travis Whitaker authored and Ben Gamari's avatar Ben Gamari committed
      
      Here the following changes are introduced:
          - A read barrier machine op is added to Cmm.
          - The order in which a closure's fields are read and written is changed.
          - Memory barriers are added to RTS code to ensure correctness on
            out-or-order machines with weak memory ordering.
      
      Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this
      is lowered to an instruction that ensures memory reads that occur after said
      instruction in program order are not performed before reads coming before said
      instruction in program order. On machines with strong memory ordering properties
      (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so
      MO_ReadBarrier is simply erased. However, such an instruction is necessary on
      weakly ordered machines, e.g. ARM and PowerPC.
      
      Weam memory ordering has consequences for how closures are observed and mutated.
      For example, consider a closure that needs to be updated to an indirection. In
      order for the indirection to be safe for concurrent observers to enter, said
      observers must read the indirection's info table before they read the
      indirectee. Furthermore, the entering observer makes assumptions about the
      closure based on its info table contents, e.g. an INFO_TYPE of IND imples the
      closure has an indirectee pointer that is safe to follow.
      
      When a closure is updated with an indirection, both its info table and its
      indirectee must be written. With weak memory ordering, these two writes can be
      arbitrarily reordered, and perhaps even interleaved with other threads' reads
      and writes (in the absence of memory barrier instructions). Consider this
      example of a bad reordering:
      
      - An updater writes to a closure's info table (INFO_TYPE is now IND).
      - A concurrent observer branches upon reading the closure's INFO_TYPE as IND.
      - A concurrent observer reads the closure's indirectee and enters it. (!!!)
      - An updater writes the closure's indirectee.
      
      Here the update to the indirectee comes too late and the concurrent observer has
      jumped off into the abyss. Speculative execution can also cause us issues,
      consider:
      
      - An observer is about to case on a value in closure's info table.
      - The observer speculatively reads one or more of closure's fields.
      - An updater writes to closure's info table.
      - The observer takes a branch based on the new info table value, but with the
        old closure fields!
      - The updater writes to the closure's other fields, but its too late.
      
      Because of these effects, reads and writes to a closure's info table must be
      ordered carefully with respect to reads and writes to the closure's other
      fields, and memory barriers must be placed to ensure that reads and writes occur
      in program order. Specifically, updates to a closure must follow the following
      pattern:
      
      - Update the closure's (non-info table) fields.
      - Write barrier.
      - Update the closure's info table.
      
      Observing a closure's fields must follow the following pattern:
      
      - Read the closure's info pointer.
      - Read barrier.
      - Read the closure's (non-info table) fields.
      
      This patch updates RTS code to obey this pattern. This should fix long-standing
      SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting
      out-of-order execution) and PowerPC. This fixes issue #15449.
      
      Co-Authored-By: default avatarBen Gamari <ben@well-typed.com>
      11bac115
    • Artem Pelenitsyn's avatar
      typo in the docs for DynFlags.hs · ef6d9a50
      Artem Pelenitsyn authored and Marge Bot's avatar Marge Bot committed
      ef6d9a50
    • Sylvain Henry's avatar
      Fix GCC warnings with __clear_cache builtin (#16867) · 4ec233ec
      Sylvain Henry authored and Marge Bot's avatar Marge Bot committed
      4ec233ec
  8. Jun 27, 2019
  9. Jun 26, 2019
Loading