1. 22 Nov, 2018 3 commits
    • David Eichmann's avatar
      Fix unused-import warnings · 6353efc7
      David Eichmann authored
      This patch fixes a fairly long-standing bug (dating back to 2015) in
      RdrName.bestImport, namely
      
         commit 9376249b
         Author: Simon Peyton Jones <simonpj@microsoft.com>
         Date:   Wed Oct 28 17:16:55 2015 +0000
      
         Fix unused-import stuff in a better way
      
      In that patch got the sense of the comparison back to front, and
      thereby failed to implement the unused-import rules described in
        Note [Choosing the best import declaration] in RdrName
      
      This led to Trac #13064 and #15393
      
      Fixing this bug revealed a bunch of unused imports in libraries;
      the ones in the GHC repo are part of this commit.
      
      The two important changes are
      
      * Fix the bug in bestImport
      
      * Modified the rules by adding (a) in
           Note [Choosing the best import declaration] in RdrName
        Reason: the previosu rules made Trac #5211 go bad again.  And
        the new rule (a) makes sense to me.
      
      In unravalling this I also ended up doing a few other things
      
      * Refactor RnNames.ImportDeclUsage to use a [GlobalRdrElt] for the
        things that are used, rather than [AvailInfo]. This is simpler
        and more direct.
      
      * Rename greParentName to greParent_maybe, to follow GHC
        naming conventions
      
      * Delete dead code RdrName.greUsedRdrName
      
      Bumps a few submodules.
      
      Reviewers: hvr, goldfire, bgamari, simonmar, jrtc27
      
      Subscribers: rwbarton, carter
      
      Differential Revision: https://phabricator.haskell.org/D5312
      6353efc7
    • Andreas Klebinger's avatar
      Fixup the new code layout patch for SplitObjs. · 6c26b3f8
      Andreas Klebinger authored
      When splitting objects we sometimes generate
      dummy CmmProcs containing bottom in some fields.
      
      Code introduced in the new code layout patch looked
      at these which blew up the compiler. Now we instead
      check first if the function actually contains code.
      
      Reviewers: bgamari
      
      Subscribers: simonpj, rwbarton, carter
      
      Differential Revision: https://phabricator.haskell.org/D5357
      6c26b3f8
    • Sylvain Henry's avatar
      Rename literal constructors · 13bb4bf4
      Sylvain Henry authored
      In a previous patch we replaced some built-in literal constructors
      (MachInt, MachWord, etc.) with a single LitNumber constructor.
      
      In this patch we replace the `Mach` prefix of the remaining constructors
      with `Lit` for consistency (e.g., LitChar, LitLabel, etc.).
      
      Sadly the name `LitString` was already taken for a kind of FastString
      and it would become misleading to have both `LitStr` (literal
      constructor renamed after `MachStr`) and `LitString` (FastString
      variant). Hence this patch renames the FastString variant `PtrString`
      (which is more accurate) and the literal string constructor now uses the
      least surprising `LitString` name.
      
      Both `Literal` and `LitString/PtrString` have recently seen breaking
      changes so doing this kind of renaming now shouldn't harm much.
      
      Reviewers: hvr, goldfire, bgamari, simonmar, jrtc27, tdammers
      
      Subscribers: tdammers, rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4881
      13bb4bf4
  2. 17 Nov, 2018 1 commit
    • Andreas Klebinger's avatar
      NCG: New code layout algorithm. · 912fd2b6
      Andreas Klebinger authored
      Summary:
      This patch implements a new code layout algorithm.
      It has been tested for x86 and is disabled on other platforms.
      
      Performance varies slightly be CPU/Machine but in general seems to be better
      by around 2%.
      Nofib shows only small differences of about +/- ~0.5% overall depending on
      flags/machine performance in other benchmarks improved significantly.
      
      Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec,
      containers, text and xeno.
      
      While the magnitude of gains differed three different CPUs where tested with
      all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell,
      Skylake
      
      * Library benchmark results summarized:
        * containers: ~1.5% faster
        * aeson: ~2% faster
        * megaparsec: ~2-5% faster
        * xml library benchmarks: 0.2%-1.1% faster
        * vector-benchmarks: 1-4% faster
        * text: 5.5% faster
      
      On average GHC compile times go down, as GHC compiled with the new layout
      is faster than the overhead introduced by using the new layout algorithm,
      
      Things this patch does:
      
      * Move code responsilbe for block layout in it's own module.
      * Move the NcgImpl Class into the NCGMonad module.
      * Extract a control flow graph from the input cmm.
      * Update this cfg to keep it in sync with changes during
        asm codegen. This has been tested on x64 but should work on x86.
        Other platforms still use the old codelayout.
      * Assign weights to the edges in the CFG based on type and limited static
        analysis which are then used for block layout.
      * Once we have the final code layout eliminate some redundant jumps.
      
        In particular turn a sequences of:
            jne .foo
            jmp .bar
          foo:
        into
            je bar
          foo:
            ..
      
      Test Plan: ci
      
      Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott
      
      Reviewed By: RyanGlScott
      
      Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton
      
      GHC Trac Issues: #15124
      
      Differential Revision: https://phabricator.haskell.org/D4726
      912fd2b6
  3. 02 Nov, 2018 1 commit
    • Michal Terepeta's avatar
      Add Int8# and Word8# · 2c959a18
      Michal Terepeta authored
      This is the first step of implementing:
      https://github.com/ghc-proposals/ghc-proposals/pull/74
      
      The main highlights/changes:
      
          primops.txt.pp gets two new sections for two new primitive types for
          signed and unsigned 8-bit integers (Int8# and Word8 respectively) along
          with basic arithmetic and comparison operations. PrimRep/RuntimeRep get
          two new constructors for them. All of the primops translate into the
          existing MachOPs.
      
          For CmmCalls the codegen will now zero-extend the values at call
          site (so that they can be moved to the right register) and then truncate
          them back their original width.
      
          x86 native codegen needed some updates, since it wasn't able to deal
          with the new widths, but all the changes are quite localized. LLVM
          backend seems to just work.
      
      This is the second attempt at merging this, after the first attempt in
      D4475 had to be backed out due to regressions on i386.
      
      Bumps binary submodule.
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      
      Test Plan: ./validate (on both x86-{32,64})
      
      Reviewers: bgamari, hvr, goldfire, simonmar
      
      Subscribers: rwbarton, carter
      
      Differential Revision: https://phabricator.haskell.org/D5258
      2c959a18
  4. 28 Oct, 2018 1 commit
    • Zejun Wu's avatar
      Fix rare undefined asm temp end label error in x86 · 3c452d0d
      Zejun Wu authored
      Summary:
      Encountered assembly error due to undefined label `.LcaDcU_info_end` for
      following code generated by `pprFrameProc`:
      
      ```
      .Lsat_sa8fp{v}_info_fde_end:
        .long .Lblock{v caDcU}_info_fde_end-.Lblock{v caDcU}_info_fde
      .Lblock{v caDcU}_info_fde:
        .long _nbHlD-.Lsection_frame
        .quad block{v caDcU}_info-1
        .quad .Lblock{v caDcU}_info_end-block{v caDcU}_info+1
        .byte 1
      ```
      
      This diff fixed the error.
      
      Test Plan:
        ./validate
      
      Also the case where we used to have assembly error is now fixed.
      Unfortunately, I have limited insight here and cannot get a small enough repro
      or test case for this.
      
      Ben says:
      
      > I think I see: Previously we only produced end symbols for the info
      > tables of top-level procedures. However, blocks within a procedure may
      > also have info tables, we will dutifully generate debug information for
      > and consequently we get undefined symbols.
      
      Reviewers: simonmar, scpmw, last_g, bgamari
      
      Reviewed By: bgamari
      
      Subscribers: rwbarton, carter
      
      Differential Revision: https://phabricator.haskell.org/D5246
      3c452d0d
  5. 09 Oct, 2018 1 commit
    • Ben Gamari's avatar
      Revert "Add Int8# and Word8#" · d728c3c5
      Ben Gamari authored
      This unfortunately broke i386 support since it introduced references to
      byte-sized registers that don't exist on that architecture.
      
      Reverts binary submodule
      
      This reverts commit 5d5307f9.
      d728c3c5
  6. 07 Oct, 2018 1 commit
    • Michal Terepeta's avatar
      Add Int8# and Word8# · 5d5307f9
      Michal Terepeta authored
      This is the first step of implementing:
      https://github.com/ghc-proposals/ghc-proposals/pull/74
      
      The main highlights/changes:
      
      - `primops.txt.pp` gets two new sections for two new primitive types
        for signed and unsigned 8-bit integers (`Int8#` and `Word8`
        respectively) along with basic arithmetic and comparison
        operations. `PrimRep`/`RuntimeRep` get two new constructors for
        them. All of the primops translate into the existing `MachOP`s.
      
      - For `CmmCall`s the codegen will now zero-extend the values at call
        site (so that they can be moved to the right register) and then
        truncate them back their original width.
      
      - x86 native codegen needed some updates, since it wasn't able to deal
        with the new widths, but all the changes are quite localized. LLVM
        backend seems to just work.
      
      Bumps binary submodule.
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      
      Test Plan: ./validate with new tests
      
      Reviewers: hvr, goldfire, bgamari, simonmar
      
      Subscribers: Abhiroop, dfeuer, rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4475
      5d5307f9
  7. 18 Sep, 2018 1 commit
    • Andreas Klebinger's avatar
      Invert FP conditions to eliminate the explicit NaN check. · 6bb9bc7d
      Andreas Klebinger authored
      Summary:
      Optimisation: we don't have to test the parity flag if we
      know the test has already excluded the unordered case: eg >
      and >= test for a zero carry flag, which can only occur for
      ordered operands.
      
      By reversing comparisons we can avoid testing the parity
      for < and <= as well. This works since:
      * If any of the arguments is an NaN CF gets set. Resulting in a false result.
      * Since this allows us to rule out NaN we can exchange the arguments and invert the
        direction of the arrows.
      
      Test Plan: ci/nofib
      
      Reviewers: carter, bgamari, alpmestan
      
      Reviewed By: alpmestan
      
      Subscribers: alpmestan, simonpj, jmct, rwbarton, thomie
      
      GHC Trac Issues: #15196
      
      Differential Revision: https://phabricator.haskell.org/D4990
      6bb9bc7d
  8. 14 Sep, 2018 1 commit
    • Sergei Azovskov's avatar
      Mark code related symbols as @function not @object · c23f057f
      Sergei Azovskov authored
      Summary:
      This diff is a part of the bigger project which goal is to improve
      common profiling tools support (perf) for GHC binaries.
      
      A similar job was already done and reverted in the past:
       * https://phabricator.haskell.org/rGHCb1f453e16f0ce11a2ab18cc4c350bdcbd36299a6
       * https://phabricator.haskell.org/rGHCf1f3c4f50650110ad0f700d6566a44c515b0548f
      
      Reasoning:
      
      `Perf` and similar tools build in memory symbol table from the .symtab
      section of the ELF file to display human-readable function names instead
      of the addresses in the output. `Perf` uses only two types of symbols:
      `@function` and `@notype` but GHC is not capable to produce any
      `@function` symbols so the `perf` output is pretty useless (All the
      haskell symbols that you can see in `perf` now are `@notype` internal
      symbols extracted by mistake/hack).
      
      The changes:
       * mark code related symbols as @function
       * small hack to mark InfoTable symbols as code if TABLES_NEXT_TO_CODE is true
      
      Limitations:
       * The perf symbolization support is not complete after this patch but
         I'm working on the second patch.
       * Constructor symbols are not supported. To fix that we can issue extra
         local symbols which mark code sections as code and will be only used
         for debug.
      
      Test Plan:
      tests
      any additional ideas?
      
      Perf output on stock ghc 8.4.1:
      ```
           9.78%  FibbSlow  FibbSlow            [.] ckY_info
           9.59%  FibbSlow  FibbSlow            [.] cjqd_info
           7.17%  FibbSlow  FibbSlow            [.] c3sg_info
           6.62%  FibbSlow  FibbSlow            [.] c1X_info
           5.32%  FibbSlow  FibbSlow            [.] cjsX_info
           4.18%  FibbSlow  FibbSlow            [.] s3rN_info
           3.82%  FibbSlow  FibbSlow            [.] c2m_info
           3.68%  FibbSlow  FibbSlow            [.] cjlJ_info
           3.26%  FibbSlow  FibbSlow            [.] c3sb_info
           3.19%  FibbSlow  FibbSlow            [.] cjPQ_info
           3.05%  FibbSlow  FibbSlow            [.] cjQd_info
           2.97%  FibbSlow  FibbSlow            [.] cjAB_info
           2.78%  FibbSlow  FibbSlow            [.] cjzP_info
           2.40%  FibbSlow  FibbSlow            [.] cjOS_info
           2.38%  FibbSlow  FibbSlow            [.] s3rK_info
           2.27%  FibbSlow  FibbSlow            [.] cjq0_info
           2.18%  FibbSlow  FibbSlow            [.] cKQ_info
           2.13%  FibbSlow  FibbSlow            [.] cjSl_info
           1.99%  FibbSlow  FibbSlow            [.] s3rL_info
           1.98%  FibbSlow  FibbSlow            [.] c2cC_info
           1.80%  FibbSlow  FibbSlow            [.] s3rO_info
           1.37%  FibbSlow  FibbSlow            [.] c2f2_info
      ...
      ```
      
      Perf output on patched ghc:
      ```
           7.97%  FibbSlow  FibbSlow            [.] c3rM_info
           6.75%  FibbSlow  FibbSlow            [.] 0x000000000032cfa8
           6.63%  FibbSlow  FibbSlow            [.] cifA_info
           4.98%  FibbSlow  FibbSlow            [.] integerzmgmp_GHCziIntegerziType_eqIntegerzh_info
           4.55%  FibbSlow  FibbSlow            [.] chXn_info
           4.52%  FibbSlow  FibbSlow            [.] c3rH_info
           4.45%  FibbSlow  FibbSlow            [.] chZB_info
           4.04%  FibbSlow  FibbSlow            [.] Main_fibbzuslow_info
           4.03%  FibbSlow  FibbSlow            [.] stg_ap_0_fast
           3.76%  FibbSlow  FibbSlow            [.] chXA_info
           3.67%  FibbSlow  FibbSlow            [.] cifu_info
           3.25%  FibbSlow  FibbSlow            [.] ci4r_info
           2.64%  FibbSlow  FibbSlow            [.] s3rf_info
           2.42%  FibbSlow  FibbSlow            [.] s3rg_info
           2.39%  FibbSlow  FibbSlow            [.] integerzmgmp_GHCziIntegerziType_eqInteger_info
           2.25%  FibbSlow  FibbSlow            [.] integerzmgmp_GHCziIntegerziType_minusInteger_info
           2.17%  FibbSlow  FibbSlow            [.] ghczmprim_GHCziClasses_zeze_info
           2.09%  FibbSlow  FibbSlow            [.] cicc_info
           2.03%  FibbSlow  FibbSlow            [.] 0x0000000000331e15
           2.02%  FibbSlow  FibbSlow            [.] s3ri_info
           1.91%  FibbSlow  FibbSlow            [.] 0x0000000000331bb8
           1.89%  FibbSlow  FibbSlow            [.] ci4N_info
      ...
      ```
      
      Reviewers: simonmar, niteria, bgamari, goldfire
      
      Reviewed By: simonmar, bgamari
      
      Subscribers: lelf, rwbarton, thomie, carter
      
      GHC Trac Issues: #15501
      
      Differential Revision: https://phabricator.haskell.org/D4713
      c23f057f
  9. 21 Aug, 2018 2 commits
    • Artem Pelenitsyn's avatar
      Fix precision of asinh/acosh/atanh by making them primops · c6f4eb4f
      Artem Pelenitsyn authored
      Reviewers: hvr, bgamari, simonmar, jrtc27
      
      Reviewed By: bgamari
      
      Subscribers: alpmestan, rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D5034
      c6f4eb4f
    • Andreas Klebinger's avatar
      Replace most occurences of foldl with foldl'. · 09c1d5af
      Andreas Klebinger authored
      This patch adds foldl' to GhcPrelude and changes must occurences
      of foldl to foldl'. This leads to better performance especially
      for quick builds where GHC does not perform strictness analysis.
      
      It does change strictness behaviour when we use foldl' to turn
      a argument list into function applications. But this is only a
      drawback if code looks ONLY at the last argument but not at the first.
      And as the benchmarks show leads to fewer allocations in practice
      at O2.
      
      Compiler performance for Nofib:
      
      O2 Allocations:
              -1 s.d.                -----            -0.0%
              +1 s.d.                -----            -0.0%
              Average                -----            -0.0%
      
      O2 Compile Time:
              -1 s.d.                -----            -2.8%
              +1 s.d.                -----            +1.3%
              Average                -----            -0.8%
      
      O0 Allocations:
              -1 s.d.                -----            -0.2%
              +1 s.d.                -----            -0.1%
              Average                -----            -0.2%
      
      Test Plan: ci
      
      Reviewers: goldfire, bgamari, simonmar, tdammers, monoidal
      
      Reviewed By: bgamari, monoidal
      
      Subscribers: tdammers, rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4929
      09c1d5af
  10. 07 Aug, 2018 1 commit
    • Herbert Valerio Riedel's avatar
      Turn on MonadFail desugaring by default · aab8656b
      Herbert Valerio Riedel authored
      Summary:
      This contains two commits:
      
      ----
      
      Make GHC's code-base compatible w/ `MonadFail`
      
      There were a couple of use-sites which implicitly used pattern-matches
      in `do`-notation even though the underlying `Monad` didn't explicitly
      support `fail`
      
      This refactoring turns those use-sites into explicit case
      discrimations and adds an `MonadFail` instance for `UniqSM`
      (`UniqSM` was the worst offender so this has been postponed for a
      follow-up refactoring)
      
      ---
      
      Turn on MonadFail desugaring by default
      
      This finally implements the phase scheduled for GHC 8.6 according to
      
      https://prime.haskell.org/wiki/Libraries/Proposals/MonadFail#Transitionalstrategy
      
      This also preserves some tests that assumed MonadFail desugaring to be
      active; all ghc boot libs were already made compatible with this
      `MonadFail` long ago, so no changes were needed there.
      
      Test Plan: Locally performed ./validate --fast
      
      Reviewers: bgamari, simonmar, jrtc27, RyanGlScott
      
      Reviewed By: bgamari
      
      Subscribers: bgamari, RyanGlScott, rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D5028
      aab8656b
  11. 18 Jul, 2018 1 commit
    • Tamar Christina's avatar
      stack: fix stack allocations on Windows · d0bbe1bf
      Tamar Christina authored
      Summary:
      On Windows one is not allowed to drop the stack by more than a page size.
      The reason for this is that the OS only allocates enough stack till what
      the TEB specifies. After that a guard page is placed and the rest of the
      virtual address space is unmapped.
      
      The intention is that doing stack allocations will cause you to hit the
      guard which will then map the next page in and move the guard.  This is
      done to prevent what in the Linux world is known as stack clash
      vulnerabilities https://access.redhat.com/security/cve/cve-2017-1000364.
      
      There are modules in GHC for which the liveliness analysis thinks the
      reserved 8KB of spill slots isn't enough.  One being DynFlags and the
      other being Cabal.
      
      Though I think the Cabal one is likely a bug:
      
      ```
        4d6544:       81 ec 00 46 00 00       sub    $0x4600,%esp
        4d654a:       8d 85 94 fe ff ff       lea    -0x16c(%ebp),%eax
        4d6550:       3b 83 1c 03 00 00       cmp    0x31c(%ebx),%eax
        4d6556:       0f 82 de 8d 02 00       jb     4ff33a <_cLpg_info+0x7a>
        4d655c:       c7 45 fc 14 3d 50 00    movl   $0x503d14,-0x4(%ebp)
        4d6563:       8b 75 0c                mov    0xc(%ebp),%esi
        4d6566:       83 c5 fc                add    $0xfffffffc,%ebp
        4d6569:       66 f7 c6 03 00          test   $0x3,%si
        4d656e:       0f 85 a6 d7 02 00       jne    503d1a <_cLpb_info+0x6>
        4d6574:       81 c4 00 46 00 00       add    $0x4600,%esp
      ```
      
      It allocates nearly 18KB of spill slots for a simple 4 line function
      and doesn't even use it.  Note that this doesn't happen on x64 or
      when making a validate build.  Only when making a build without a
      validate and build.mk.
      
      This and the allocation in DynFlags means the stack allocation will jump
      over the guard page into unmapped memory areas and GHC or an end program
      segfaults.
      
      The pagesize on x86 Windows is 4KB which means we hit it very easily for
      these two modules, which explains the total DOA of GHC 32bit for the past
      3 releases and the "random" segfaults on Windows.
      
      ```
      0:000> bp 00503d29
      0:000> gn
      Breakpoint 0 hit
      WARNING: Stack overflow detected. The unwound frames are extracted from outside
               normal stack bounds.
      eax=03b6b9c9 ebx=00dc90f0 ecx=03cac48c edx=03cac43d esi=03b6b9c9 edi=03abef40
      eip=00503d29 esp=013e96fc ebp=03cf8f70 iopl=0         nv up ei pl nz na po nc
      cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
      setup+0x103d29:
      00503d29 89442440        mov     dword ptr [esp+40h],eax ss:002b:013e973c=????????
      WARNING: Stack overflow detected. The unwound frames are extracted from outside
               normal stack bounds.
      WARNING: Stack overflow detected. The unwound frames are extracted from outside
               normal stack bounds.
      0:000> !teb
      TEB at 00384000
          ExceptionList:        013effcc
          StackBase:            013f0000
          StackLimit:           013eb000
      ```
      
      This doesn't fix the liveliness analysis but does fix the allocations, by
      emitting a function call to `__chkstk_ms` when doing allocations of larger
      than a page, this will make sure the stack is probed every page so the kernel
      maps in the next page.
      
      `__chkstk_ms` is provided by `libGCC`, which is under the
      `GNU runtime exclusion license`, so it's safe to link against it, even for
      proprietary code. (Technically we already do since we link compiled C code in.)
      
      For allocations smaller than a page we drop the stack and probe the new address.
      This avoids the function call and still makes sure we hit the guard if needed.
      
      PS: In case anyone is Wondering why we didn't notice this before, it's because we
      only test x86_64 and on Windows 10.  On x86_64 the page size is 8KB and also the
      kernel is a bit more lenient on Windows 10 in that it seems to catch the segfault
      and resize the stack if it was unmapped:
      
      ```
      0:000> t
      eax=03b6b9c9 ebx=00dc90f0 ecx=03cac48c edx=03cac43d esi=03b6b9c9 edi=03abef40
      eip=00503d2d esp=013e96fc ebp=03cf8f70 iopl=0         nv up ei pl nz na po nc
      cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
      setup+0x103d2d:
      00503d2d 8b461b          mov     eax,dword ptr [esi+1Bh] ds:002b:03b6b9e4=03cac431
      0:000> !teb
      TEB at 00384000
          ExceptionList:        013effcc
          StackBase:            013f0000
          StackLimit:           013e9000
      ```
      
      Likely Windows 10 has a guard page larger than previous versions.
      
      This fixes the stack allocations, and as soon as I get the time I will look at
      the liveliness analysis. I find it highly unlikely that simple Cabal function
      requires ~2200 spill slots.
      
      Test Plan: ./validate
      
      Reviewers: simonmar, bgamari
      
      Reviewed By: bgamari
      
      Subscribers: AndreasK, rwbarton, thomie, carter
      
      GHC Trac Issues: #15154
      
      Differential Revision: https://phabricator.haskell.org/D4917
      d0bbe1bf
  12. 26 Jun, 2018 1 commit
  13. 03 Jun, 2018 1 commit
  14. 31 May, 2018 1 commit
    • Andreas Klebinger's avatar
      Change jump targets in JMP_TBL from blocks to X86.JumpDest. · 5748c79e
      Andreas Klebinger authored
      Jump tables always point to blocks when we first generate them.  However
      there are rare situations where we can shortcut one of these blocks to a
      static address during the asm shortcutting pass.
      
      While we already updated the data section accordingly this patch also
      extends this to the references stored in JMP_TBL.
      
      Test Plan: ci
      
      Reviewers: bgamari
      
      Reviewed By: bgamari
      
      Subscribers: thomie, carter
      
      GHC Trac Issues: #15104
      
      Differential Revision: https://phabricator.haskell.org/D4595
      5748c79e
  15. 16 May, 2018 1 commit
    • Simon Marlow's avatar
      Allow CmmLabelDiffOff with different widths · fbd28e2c
      Simon Marlow authored
      Summary:
      This change makes it possible to generate a static 32-bit relative label
      offset on x86_64. Currently we can only generate word-sized label
      offsets.
      
      This will be used in D4634 to shrink info tables.  See D4632 for more
      details.
      
      Test Plan: See D4632
      
      Reviewers: bgamari, niteria, michalt, erikd, jrtc27, osa1
      
      Subscribers: thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4633
      fbd28e2c
  16. 05 May, 2018 1 commit
    • Sebastian Graf's avatar
      Add 'addWordC#' PrimOp · 6243bba7
      Sebastian Graf authored
      This is mostly for congruence with 'subWordC#' and '{add,sub}IntC#'.
      I found 'plusWord2#' while implementing this, which both lacks
      documentation and has a slightly different specification than
      'addWordC#', which means the generic implementation is unnecessarily
      complex.
      
      While I was at it, I also added lacking meta-information on PrimOps
      and refactored 'subWordC#'s generic implementation to be branchless.
      
      Reviewers: bgamari, simonmar, jrtc27, dfeuer
      
      Reviewed By: bgamari, dfeuer
      
      Subscribers: dfeuer, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4592
      6243bba7
  17. 03 May, 2018 2 commits
    • niteria's avatar
      Correctly add unwinding info in manifestSp and makeFixupBlocks · 6132d7c5
      niteria authored
      In `manifestSp` the unwind info was before the relevant instruction, not
      after.  I added some notes to establish semantics.  Also removes
      redundant annotation in stg_catch_frame.
      
      For `makeFixupBlocks` it looks like we were off by `wORD_SIZE dflags`.
      I'm not sure why, but it lines up with `manifestSp`.  In fact it lines
      up so well so that I can consolidate the Sp unwind logic in
      `maybeAddUnwind`.  I detected the problems with `makeFixupBlocks` by
      running T14779b after patching D4559.
      
      Test Plan: added a new test
      
      Reviewers: bgamari, scpmw, simonmar, erikd
      
      Reviewed By: bgamari
      
      Subscribers: thomie, carter
      
      GHC Trac Issues: #14999
      
      Differential Revision: https://phabricator.haskell.org/D4606
      6132d7c5
    • Bertram Felgenhauer's avatar
      Compute DW_FORM_block length correctly; also fixes #15068 · 358b5080
      Bertram Felgenhauer authored
      Before this patch, the pprUnwindwExpr function computed the length of
      by the following assembly fragment:
      
      	.uleb128 1f-.-1
      	<expression data>
      1:
      
      That is, to compute the length, it takes the difference of the label 1
      and the address of the .uleb128 directive, and subtracts 1.
      
      In #15068 it was reported that `as` from binutils 4.30 has trouble with
      evaluating the `.` part of the expression. However, there is actually a
      problem with the expression, if the length of the data ever becomes
      larger than 128: In that case, the .uleb128 directive will emit more
      than 1 byte, and the computed length will be wrong.
      
      The present patch changes the assembly fragment to use two labels,
      which fixes both these problems.
      
      	.uleb128 2f-1f
      1:
      	<expression data>
      2:
      
      Test Plan: validate
      
      Reviewers: bgamari, osa1
      
      Reviewed By: bgamari
      
      Subscribers: thomie, carter
      
      GHC Trac Issues: #15068
      
      Differential Revision: https://phabricator.haskell.org/D4654
      358b5080
  18. 13 Apr, 2018 2 commits
    • Andreas Klebinger's avatar
      Make shortcutting at the asm stage toggleable and default for O2. · 3c7f9e74
      Andreas Klebinger authored
      Shortcutting during the asm stage of codegen is often redundant as most
      cases get caught during the Cmm passes.  For example during compilation
      of all of nofib only 508 jumps are eleminated.
      
      For this reason I moved the pass from -O1 to -O2. I also made it
      toggleable with -fasm-shortcutting.
      
      Test Plan: ci
      
      Reviewers: bgamari
      
      Reviewed By: bgamari
      
      Subscribers: thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4555
      3c7f9e74
    • Andreas Klebinger's avatar
      Update JMP_TBL targets during shortcutting in X86 NCG. · 120a2617
      Andreas Klebinger authored
      Without updating the JMP_TBL information the block list in
      JMP_TBL contained blocks which were eliminated in some circumstances.
      
      The actual assembly generation doesn't look at these fields so this
      didn't cause any bugs yet. However as long as we carry this information
      around we should make an effort to keep it correct.
      
      Especially since it's useful for debugging purposes and can be used
      for passes near the end of the codegen pipeline.
      In particular it's used by jumpDestsOfInstr which without these changes
      returns the wrong destinations.
      
      Test Plan: ci
      
      Reviewers: bgamari
      
      Subscribers: thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4566
      120a2617
  19. 01 Apr, 2018 1 commit
    • Richard Eisenberg's avatar
      Track type variable scope more carefully. · faec8d35
      Richard Eisenberg authored
      The main job of this commit is to track more accurately the scope
      of tyvars introduced by user-written foralls. For example, it would
      be to have something like this:
      
        forall a. Int -> (forall k (b :: k). Proxy '[a, b]) -> Bool
      
      In that type, a's kind must be k, but k isn't in scope. We had a
      terrible way of doing this before (not worth repeating or describing
      here, but see the old tcImplicitTKBndrs and friends), but now
      we have a principled approach: make an Implication when kind-checking
      a forall. Doing so then hooks into the existing machinery for
      preventing skolem-escape, performing floating, etc. This also means
      that we bump the TcLevel whenever going into a forall.
      
      The new behavior is done in TcHsType.scopeTyVars, but see also
      TcHsType.tc{Im,Ex}plicitTKBndrs, which have undergone significant
      rewriting. There are several Notes near there to guide you. Of
      particular interest there is that Implication constraints can now
      have skolems that are out of order; this situation is reported in
      TcErrors.
      
      A major consequence of this is a slightly tweaked process for type-
      checking type declarations. The new Note [Use SigTvs in kind-checking
      pass] in TcTyClsDecls lays it out.
      
      The error message for dependent/should_fail/TypeSkolEscape has become
      noticeably worse. However, this is because the code in TcErrors goes to
      some length to preserve pre-8.0 error messages for kind errors. It's time
      to rip off that plaster and get rid of much of the kind-error-specific
      error messages. I tried this, and doing so led to a lovely error message
      for TypeSkolEscape. So: I'm accepting the error message quality regression
      for now, but will open up a new ticket to fix it, along with a larger
      error-message improvement I've been pondering. This applies also to
      dependent/should_fail/{BadTelescope2,T14066,T14066e}, polykinds/T11142.
      
      Other minor changes:
       - isUnliftedTypeKind didn't look for tuples and sums. It does now.
      
       - check_type used check_arg_type on both sides of an AppTy. But the left
         side of an AppTy isn't an arg, and this was causing a bad error message.
         I've changed it to use check_type on the left-hand side.
      
       - Some refactoring around when we print (TYPE blah) in error messages.
         The changes decrease the times when we do so, to good effect.
         Of course, this is still all controlled by
         -fprint-explicit-runtime-reps
      
      Fixes #14066 #14749
      
      Test cases: dependent/should_compile/{T14066a,T14749},
                  dependent/should_fail/T14066{,c,d,e,f,g,h}
      faec8d35
  20. 25 Mar, 2018 1 commit
    • niteria's avatar
      Don't refer to blocks in debug info when -g1 · 0cbb13b3
      niteria authored
      -g1 removes block information, but it turns out that procs can
      refer to block information through parents.
      Note [Splitting DebugBlocks] explains the parentage relationship.
      
      Test Plan:
      * ./validate
      * added a new test
      
      Reviewers: bgamari, simonmar
      
      Reviewed By: bgamari
      
      Subscribers: rwbarton, thomie, carter
      
      GHC Trac Issues: #14894
      
      Differential Revision: https://phabricator.haskell.org/D4496
      0cbb13b3
  21. 19 Mar, 2018 3 commits
  22. 08 Mar, 2018 1 commit
    • Simon Marlow's avatar
      Add -fexternal-dynamic-refs · d99a65a8
      Simon Marlow authored
      Summary:
      The `-dynamic` flag does two things:
      
      * In the code generator, it generates code designed to link against
        external shared libraries.  References outside of the current module
        go through platform-specific indirection tables (e.g. the GOT on ELF).
      
      * It enables a "way", which changes which hi files we look
        for (`Foo.dyn_hi`) and which libraries we link against.
      
      Some specialised applications want the first of these without the
      second. (I could go into detail here but it's probably not all that
      important).
      
      This diff splits out the code-generation effects of `-dynamic` from the
      "way" parts of its behaviour, via a new flag `-fexternal-dynamic-refs`.
      
      Test Plan: validate
      
      Reviewers: niteria, bgamari, erikd
      
      Subscribers: rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4477
      d99a65a8
  23. 06 Feb, 2018 1 commit
    • Tao He's avatar
      Improve X86CodeGen's pprASCII. · 2987b041
      Tao He authored
      The original implementation generates a list of SDoc then concatenates
      them using `hcat`. For memory optimization, we can transform the given
      literal string into escaped string the construct SDoc directly.
      
      This optimization will decreate the memory allocation when there's big
      literal strings in haskell code, see Trac #14741.
      Signed-off-by: Tao He's avatarHE, Tao <sighingnow@gmail.com>
      
      Reviewers: bgamari, mpickering, simonpj
      
      Reviewed By: simonpj
      
      Subscribers: simonpj, rwbarton, thomie, carter
      
      GHC Trac Issues: #14741
      
      Differential Revision: https://phabricator.haskell.org/D4384
      2987b041
  24. 02 Feb, 2018 1 commit
    • Michal Terepeta's avatar
      Hoopl.Collections: change right folds to strict left folds · 2974b2b8
      Michal Terepeta authored
      It seems that most uses of these folds should be strict left folds
      (I could only find a single place that benefits from a right fold).
      So this removes the existing `setFold`/`mapFold`/`mapFoldWihKey`
      replaces them with:
      - `setFoldl`/`mapFoldl`/`mapFoldlWithKey` (strict left folds)
      - `setFoldr`/`mapFoldr` (for the less common case where a right fold
        actually makes sense, e.g., `CmmProcPoint`)
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      
      Test Plan: ./validate
      
      Reviewers: bgamari, simonmar
      
      Reviewed By: bgamari
      
      Subscribers: rwbarton, thomie, carter, kavon
      
      Differential Revision: https://phabricator.haskell.org/D4356
      2974b2b8
  25. 01 Feb, 2018 1 commit
  26. 26 Jan, 2018 1 commit
    • Andreas Klebinger's avatar
      Handle the likely:True case in CmmContFlowOpt · 52dfb25c
      Andreas Klebinger authored
      It's better to fall through to the likely case than to jump to it.
      
      We optimize for this in CmmContFlowOpt when likely:False.
      This commit extends the logic there to handle cases with likely:True
      as well.
      
      Test Plan: ci
      
      Reviewers: bgamari, simonmar
      
      Reviewed By: bgamari
      
      Subscribers: simonmar, alexbiehl, rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4306
      52dfb25c
  27. 21 Jan, 2018 1 commit
    • John Ky's avatar
      Add new mbmi and mbmi2 compiler flags · f8557696
      John Ky authored
      This adds support for the bit deposit and extraction operations provided
      by the BMI and BMI2 instruction set extensions on modern amd64 machines.
      
      Implement x86 code generator for pdep and pext.  Properly initialise
      bmiVersion field.
      
      pdep and pext test cases
      
      Fix pattern match for pdep and pext instructions
      
      Fix build of pdep and pext code for 32-bit architectures
      
      Test Plan: Validate
      
      Reviewers: austin, simonmar, bgamari, angerman
      
      Reviewed By: bgamari
      
      Subscribers: trommler, carter, angerman, thomie, rwbarton, newhoggy
      
      GHC Trac Issues: #14206
      
      Differential Revision: https://phabricator.haskell.org/D4236
      f8557696
  28. 19 Dec, 2017 1 commit
  29. 28 Nov, 2017 3 commits
  30. 22 Nov, 2017 1 commit
  31. 15 Nov, 2017 1 commit
    • John Ky's avatar
      Add new mbmi and mbmi2 compiler flags · f5dc8ccc
      John Ky authored
      This adds support for the bit deposit and extraction operations provided
      by the BMI and BMI2 instruction set extensions on modern amd64 machines.
      
      Test Plan: Validate
      
      Reviewers: austin, simonmar, bgamari, hvr, goldfire, erikd
      
      Reviewed By: bgamari
      
      Subscribers: goldfire, erikd, trommler, newhoggy, rwbarton, thomie
      
      GHC Trac Issues: #14206
      
      Differential Revision: https://phabricator.haskell.org/D4063
      f5dc8ccc