1. 09 Apr, 2019 2 commits
    • Artem Pyanykh's avatar
      bd2de4f0
    • Artem Pyanykh's avatar
      codegen: fix memset unroll for small bytearrays, add 64-bit sets · af4cea7f
      Artem Pyanykh authored
      Fixes #16052
      
      When the offset in `setByteArray#` is statically known, we can provide
      better alignment guarantees then just 1 byte.
      
      Also, memset can now do 64-bit wide sets.
      
      The current memset intrinsic is not optimal however and can be
      improved for the case when we know that we deal with
      
      (baseAddress at known alignment) + offset
      
      For instance, on 64-bit
      
      `setByteArray# s 1# 23# 0#`
      
      given that bytearray is 8 bytes aligned could be unrolled into
      `movb, movw, movl, movq, movq`; but currently it is
      `movb x23` since alignment of 1 is all we can embed into MO_Memset op.
      af4cea7f
  2. 01 Apr, 2019 1 commit
  3. 10 Feb, 2019 1 commit
  4. 30 Jan, 2019 3 commits
  5. 17 Nov, 2018 1 commit
    • Andreas Klebinger's avatar
      NCG: New code layout algorithm. · 912fd2b6
      Andreas Klebinger authored
      Summary:
      This patch implements a new code layout algorithm.
      It has been tested for x86 and is disabled on other platforms.
      
      Performance varies slightly be CPU/Machine but in general seems to be better
      by around 2%.
      Nofib shows only small differences of about +/- ~0.5% overall depending on
      flags/machine performance in other benchmarks improved significantly.
      
      Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec,
      containers, text and xeno.
      
      While the magnitude of gains differed three different CPUs where tested with
      all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell,
      Skylake
      
      * Library benchmark results summarized:
        * containers: ~1.5% faster
        * aeson: ~2% faster
        * megaparsec: ~2-5% faster
        * xml library benchmarks: 0.2%-1.1% faster
        * vector-benchmarks: 1-4% faster
        * text: 5.5% faster
      
      On average GHC compile times go down, as GHC compiled with the new layout
      is faster than the overhead introduced by using the new layout algorithm,
      
      Things this patch does:
      
      * Move code responsilbe for block layout in it's own module.
      * Move the NcgImpl Class into the NCGMonad module.
      * Extract a control flow graph from the input cmm.
      * Update this cfg to keep it in sync with changes during
        asm codegen. This has been tested on x64 but should work on x86.
        Other platforms still use the old codelayout.
      * Assign weights to the edges in the CFG based on type and limited static
        analysis which are then used for block layout.
      * Once we have the final code layout eliminate some redundant jumps.
      
        In particular turn a sequences of:
            jne .foo
            jmp .bar
          foo:
        into
            je bar
          foo:
            ..
      
      Test Plan: ci
      
      Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott
      
      Reviewed By: RyanGlScott
      
      Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton
      
      GHC Trac Issues: #15124
      
      Differential Revision: https://phabricator.haskell.org/D4726
      912fd2b6
  6. 02 Nov, 2018 1 commit
    • Michal Terepeta's avatar
      Add Int8# and Word8# · 2c959a18
      Michal Terepeta authored
      This is the first step of implementing:
      https://github.com/ghc-proposals/ghc-proposals/pull/74
      
      
      
      The main highlights/changes:
      
          primops.txt.pp gets two new sections for two new primitive types for
          signed and unsigned 8-bit integers (Int8# and Word8 respectively) along
          with basic arithmetic and comparison operations. PrimRep/RuntimeRep get
          two new constructors for them. All of the primops translate into the
          existing MachOPs.
      
          For CmmCalls the codegen will now zero-extend the values at call
          site (so that they can be moved to the right register) and then truncate
          them back their original width.
      
          x86 native codegen needed some updates, since it wasn't able to deal
          with the new widths, but all the changes are quite localized. LLVM
          backend seems to just work.
      
      This is the second attempt at merging this, after the first attempt in
      D4475 had to be backed out due to regressions on i386.
      
      Bumps binary submodule.
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      
      Test Plan: ./validate (on both x86-{32,64})
      
      Reviewers: bgamari, hvr, goldfire, simonmar
      
      Subscribers: rwbarton, carter
      
      Differential Revision: https://phabricator.haskell.org/D5258
      2c959a18
  7. 09 Oct, 2018 1 commit
    • Ben Gamari's avatar
      Revert "Add Int8# and Word8#" · d728c3c5
      Ben Gamari authored
      This unfortunately broke i386 support since it introduced references to
      byte-sized registers that don't exist on that architecture.
      
      Reverts binary submodule
      
      This reverts commit 5d5307f9.
      d728c3c5
  8. 07 Oct, 2018 1 commit
    • Michal Terepeta's avatar
      Add Int8# and Word8# · 5d5307f9
      Michal Terepeta authored
      This is the first step of implementing:
      https://github.com/ghc-proposals/ghc-proposals/pull/74
      
      
      
      The main highlights/changes:
      
      - `primops.txt.pp` gets two new sections for two new primitive types
        for signed and unsigned 8-bit integers (`Int8#` and `Word8`
        respectively) along with basic arithmetic and comparison
        operations. `PrimRep`/`RuntimeRep` get two new constructors for
        them. All of the primops translate into the existing `MachOP`s.
      
      - For `CmmCall`s the codegen will now zero-extend the values at call
        site (so that they can be moved to the right register) and then
        truncate them back their original width.
      
      - x86 native codegen needed some updates, since it wasn't able to deal
        with the new widths, but all the changes are quite localized. LLVM
        backend seems to just work.
      
      Bumps binary submodule.
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      
      Test Plan: ./validate with new tests
      
      Reviewers: hvr, goldfire, bgamari, simonmar
      
      Subscribers: Abhiroop, dfeuer, rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4475
      5d5307f9
  9. 18 Sep, 2018 1 commit
    • Andreas Klebinger's avatar
      Invert FP conditions to eliminate the explicit NaN check. · 6bb9bc7d
      Andreas Klebinger authored
      Summary:
      Optimisation: we don't have to test the parity flag if we
      know the test has already excluded the unordered case: eg >
      and >= test for a zero carry flag, which can only occur for
      ordered operands.
      
      By reversing comparisons we can avoid testing the parity
      for < and <= as well. This works since:
      * If any of the arguments is an NaN CF gets set. Resulting in a false result.
      * Since this allows us to rule out NaN we can exchange the arguments and invert the
        direction of the arrows.
      
      Test Plan: ci/nofib
      
      Reviewers: carter, bgamari, alpmestan
      
      Reviewed By: alpmestan
      
      Subscribers: alpmestan, simonpj, jmct, rwbarton, thomie
      
      GHC Trac Issues: #15196
      
      Differential Revision: https://phabricator.haskell.org/D4990
      6bb9bc7d
  10. 21 Aug, 2018 1 commit
  11. 31 May, 2018 1 commit
    • Andreas Klebinger's avatar
      Change jump targets in JMP_TBL from blocks to X86.JumpDest. · 5748c79e
      Andreas Klebinger authored
      Jump tables always point to blocks when we first generate them.  However
      there are rare situations where we can shortcut one of these blocks to a
      static address during the asm shortcutting pass.
      
      While we already updated the data section accordingly this patch also
      extends this to the references stored in JMP_TBL.
      
      Test Plan: ci
      
      Reviewers: bgamari
      
      Reviewed By: bgamari
      
      Subscribers: thomie, carter
      
      GHC Trac Issues: #15104
      
      Differential Revision: https://phabricator.haskell.org/D4595
      5748c79e
  12. 16 May, 2018 1 commit
    • Simon Marlow's avatar
      Allow CmmLabelDiffOff with different widths · fbd28e2c
      Simon Marlow authored
      Summary:
      This change makes it possible to generate a static 32-bit relative label
      offset on x86_64. Currently we can only generate word-sized label
      offsets.
      
      This will be used in D4634 to shrink info tables.  See D4632 for more
      details.
      
      Test Plan: See D4632
      
      Reviewers: bgamari, niteria, michalt, erikd, jrtc27, osa1
      
      Subscribers: thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4633
      fbd28e2c
  13. 05 May, 2018 1 commit
    • Sebastian Graf's avatar
      Add 'addWordC#' PrimOp · 6243bba7
      Sebastian Graf authored
      This is mostly for congruence with 'subWordC#' and '{add,sub}IntC#'.
      I found 'plusWord2#' while implementing this, which both lacks
      documentation and has a slightly different specification than
      'addWordC#', which means the generic implementation is unnecessarily
      complex.
      
      While I was at it, I also added lacking meta-information on PrimOps
      and refactored 'subWordC#'s generic implementation to be branchless.
      
      Reviewers: bgamari, simonmar, jrtc27, dfeuer
      
      Reviewed By: bgamari, dfeuer
      
      Subscribers: dfeuer, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4592
      6243bba7
  14. 19 Mar, 2018 1 commit
  15. 26 Jan, 2018 1 commit
    • Andreas Klebinger's avatar
      Handle the likely:True case in CmmContFlowOpt · 52dfb25c
      Andreas Klebinger authored
      It's better to fall through to the likely case than to jump to it.
      
      We optimize for this in CmmContFlowOpt when likely:False.
      This commit extends the logic there to handle cases with likely:True
      as well.
      
      Test Plan: ci
      
      Reviewers: bgamari, simonmar
      
      Reviewed By: bgamari
      
      Subscribers: simonmar, alexbiehl, rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4306
      52dfb25c
  16. 21 Jan, 2018 1 commit
    • John Ky's avatar
      Add new mbmi and mbmi2 compiler flags · f8557696
      John Ky authored
      This adds support for the bit deposit and extraction operations provided
      by the BMI and BMI2 instruction set extensions on modern amd64 machines.
      
      Implement x86 code generator for pdep and pext.  Properly initialise
      bmiVersion field.
      
      pdep and pext test cases
      
      Fix pattern match for pdep and pext instructions
      
      Fix build of pdep and pext code for 32-bit architectures
      
      Test Plan: Validate
      
      Reviewers: austin, simonmar, bgamari, angerman
      
      Reviewed By: bgamari
      
      Subscribers: trommler, carter, angerman, thomie, rwbarton, newhoggy
      
      GHC Trac Issues: #14206
      
      Differential Revision: https://phabricator.haskell.org/D4236
      f8557696
  17. 22 Nov, 2017 1 commit
  18. 15 Nov, 2017 1 commit
    • John Ky's avatar
      Add new mbmi and mbmi2 compiler flags · f5dc8ccc
      John Ky authored
      This adds support for the bit deposit and extraction operations provided
      by the BMI and BMI2 instruction set extensions on modern amd64 machines.
      
      Test Plan: Validate
      
      Reviewers: austin, simonmar, bgamari, hvr, goldfire, erikd
      
      Reviewed By: bgamari
      
      Subscribers: goldfire, erikd, trommler, newhoggy, rwbarton, thomie
      
      GHC Trac Issues: #14206
      
      Differential Revision: https://phabricator.haskell.org/D4063
      f5dc8ccc
  19. 30 Oct, 2017 2 commits
  20. 26 Sep, 2017 1 commit
  21. 19 Sep, 2017 2 commits
  22. 22 Aug, 2017 1 commit
  23. 01 Aug, 2017 1 commit
    • Ryan Scott's avatar
      Drop GHC 7.10 compatibility · c13720c8
      Ryan Scott authored
      GHC 8.2.1 is out, so now GHC's support window only extends back to GHC
      8.0. This means we can delete gobs of code that was only used for GHC
      7.10 support. Hooray!
      
      Test Plan: ./validate
      
      Reviewers: hvr, bgamari, austin, goldfire, simonmar
      
      Reviewed By: bgamari
      
      Subscribers: Phyx, rwbarton, thomie
      
      Differential Revision: https://phabricator.haskell.org/D3781
      c13720c8
  24. 23 Jun, 2017 1 commit
    • Michal Terepeta's avatar
      Hoopl: remove dependency on Hoopl package · 42eee6ea
      Michal Terepeta authored
      
      
      This copies the subset of Hoopl's functionality needed by GHC to
      `cmm/Hoopl` and removes the dependency on the Hoopl package.
      
      The main motivation for this change is the confusing/noisy interface
      between GHC and Hoopl:
      - Hoopl has `Label` which is GHC's `BlockId` but different than
        GHC's `CLabel`
      - Hoopl has `Unique` which is different than GHC's `Unique`
      - Hoopl has `Unique{Map,Set}` which are different than GHC's
        `Uniq{FM,Set}`
      - GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is
        needed just to filter the exposed functions (filter out some of the
        Hoopl's and add the GHC ones)
      With this change, we'll be able to simplify this significantly.
      It'll also be much easier to do invasive changes (Hoopl is a public
      package on Hackage with users that depend on the current behavior)
      
      This should introduce no changes in functionality - it merely
      copies the relevant code.
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      
      Test Plan: ./validate
      
      Reviewers: austin, bgamari, simonmar
      
      Reviewed By: bgamari, simonmar
      
      Subscribers: simonpj, kavon, rwbarton, thomie
      
      Differential Revision: https://phabricator.haskell.org/D3616
      42eee6ea
  25. 28 Apr, 2017 1 commit
  26. 10 Mar, 2017 1 commit
  27. 07 Mar, 2017 1 commit
  28. 23 Feb, 2017 1 commit
  29. 14 Feb, 2017 1 commit
    • Ben Gamari's avatar
      Debug: Use local symbols for unwind points (#13278) · 2d6e91ea
      Ben Gamari authored
      While this apparently didn't matter on Linux, the OS X toolchain seems
      to treat local and external symbols differently during linking. Namely,
      the linker assumes that an external symbol marks the beginning of a new,
      unused procedure, and consequently drops it.
      
      Fixes regression introduced in D2741.
      
      Test Plan: `debug` testcase on OS X
      
      Reviewers: austin, simonmar, rwbarton
      
      Reviewed By: rwbarton
      
      Subscribers: rwbarton, thomie
      
      Differential Revision: https://phabricator.haskell.org/D3135
      2d6e91ea
  30. 08 Feb, 2017 2 commits
    • Ben Gamari's avatar
      Cmm: Add support for undefined unwinding statements · 3328ddb8
      Ben Gamari authored
      And use to mark `stg_stack_underflow_frame`, which we are unable to
      determine a caller from.
      
      To simplify parsing at the moment we steal the `return` keyword to
      indicate an undefined unwind value. Perhaps this should be revisited.
      
      Reviewers: scpmw, simonmar, austin, erikd
      
      Subscribers: dfeuer, thomie
      
      Differential Revision: https://phabricator.haskell.org/D2738
      3328ddb8
    • Ben Gamari's avatar
      Generalize CmmUnwind and pass unwind information through NCG · 3eb737ee
      Ben Gamari authored
      As discussed in D1532, Trac Trac #11337, and Trac Trac #11338, the stack
      unwinding information produced by GHC is currently quite approximate.
      Essentially we assume that register values do not change at all within a
      basic block. While this is somewhat true in normal Haskell code, blocks
      containing foreign calls often break this assumption. This results in
      unreliable call stacks, especially in the code containing foreign calls.
      This is worse than it sounds as unreliable unwinding information can at
      times result in segmentation faults.
      
      This patch set attempts to improve this situation by tracking unwinding
      information with finer granularity. By dispensing with the assumption of
      one unwinding table per block, we allow the compiler to accurately
      represent the areas surrounding foreign calls.
      
      Towards this end we generalize the representation of unwind information
      in the backend in three ways,
      
       * Multiple CmmUnwind nodes can occur per block
      
       * CmmUnwind nodes can now carry unwind information for multiple
         registers (while not strictly necessary; this makes emitting
         unwinding information a bit more convenient in the compiler)
      
       * The NCG backend is given an opportunity to modify the unwinding
         records since it may need to make adjustments due to, for instance,
         native calling convention requirements for foreign calls (see
         #11353).
      
      This sets the stage for resolving #11337 and #11338.
      
      Test Plan: Validate
      
      Reviewers: scpmw, simonmar, austin, erikd
      
      Subscribers: qnikst, thomie
      
      Differential Revision: https://phabricator.haskell.org/D2741
      3eb737ee
  31. 01 Oct, 2016 1 commit
    • Sylvain HENRY's avatar
      CodeGen X86: fix unsafe foreign calls wrt inlining · b61b7c24
      Sylvain HENRY authored
      Foreign calls (unsafe and safe) interact badly with inlining and
      register passing ABIs (see #11792 and #12614):
      the inlined code to compute a parameter of the call may overwrite a
      register already set to pass a preceding parameter.
      
      With this patch, we compute all parameters which are not simple
      expressions before assigning them to fixed registers required by the
      ABI.
      
      Test Plan:
         - Add test (test both reg and stack parameters)
         - Validate
      
      Reviewers: osa1, bgamari, austin, simonmar
      
      Reviewed By: simonmar
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2263
      
      GHC Trac Issues: #11792, #12614
      b61b7c24
  32. 15 Sep, 2016 1 commit
    • Simon Marlow's avatar
      Fix codegen bug in PIC version of genSwitch (#12433) · 86836a2e
      Simon Marlow authored
      Summary:
      * getNonClobberedReg instead of getSomeReg, because the reg needs to
        survive across t_code
      * Use a new reg for the table offset calculation instead of clobbering
        the reg returned by expr (this was the bug affecting #12433)
      
      Test Plan: New unit test; validate
      
      Reviewers: rwbarton, bgamari, austin, erikd
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2529
      
      GHC Trac Issues: #12433
      86836a2e
  33. 19 Aug, 2016 1 commit
  34. 05 Aug, 2016 1 commit
    • avd's avatar
      codeGen: Remove binutils<2.17 hack, fixes T11758 · e3e2e49a
      avd authored
      There was a complication on the x86_64 platform, where pointers were 64
      bits, but the tools didn't support 64-bit relative relocations.  This
      was true before binutils 2.17, which nowadays is quite standart (even
      CentOs 5 is shipped with 2.17).
      
      Hacks were removed from x86 genSwitch and asm pretty printer. Also
      [x86-64-relative] note was dropped from
      includes/rts/storage/InfoTables.h as it's not referenced anywhere now.
      
      Reviewers: austin, simonmar, rwbarton, erikd, bgamari
      
      Reviewed By: simonmar, erikd, bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2426
      e3e2e49a