1. 02 Feb, 2018 1 commit
    • Michal Terepeta's avatar
      Hoopl.Collections: change right folds to strict left folds · 2974b2b8
      Michal Terepeta authored
      It seems that most uses of these folds should be strict left folds
      (I could only find a single place that benefits from a right fold).
      So this removes the existing `setFold`/`mapFold`/`mapFoldWihKey`
      replaces them with:
      - `setFoldl`/`mapFoldl`/`mapFoldlWithKey` (strict left folds)
      - `setFoldr`/`mapFoldr` (for the less common case where a right fold
        actually makes sense, e.g., `CmmProcPoint`)
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      
      Test Plan: ./validate
      
      Reviewers: bgamari, simonmar
      
      Reviewed By: bgamari
      
      Subscribers: rwbarton, thomie, carter, kavon
      
      Differential Revision: https://phabricator.haskell.org/D4356
      2974b2b8
  2. 29 Jan, 2018 1 commit
  3. 26 Jan, 2018 4 commits
    • U-Maokai\andi's avatar
      cmm: Use two equality checks for two alt switch with default · 7ff60235
      U-Maokai\andi authored
      For code like:
      f 1 = e1
      f 7 = e2
      f _ = e3
      
      We can treat it as a sparse jump table, check if we are outside of the
      range in one direction first and then start checking the values.
      
      GHC currently does this by checking for x>7, then x <= 7 and at last x
      == 1.
      
      This patch changes this such that we only compare for equality against
      the two values and jump to the default if non are equal.
      
      The resulting code is both faster and smaller.
      wheel-sieve1 improves by 4-8% depending on problem size.
      
      This implements the idea from #14644
      
      Reviewers: bgamari, simonmar, simonpj, nomeata
      
      Reviewed By: simonpj, nomeata
      
      Subscribers: nomeata, simonpj, rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4294
      7ff60235
    • Michal Terepeta's avatar
      Remove Hoopl.Unique · bd58e290
      Michal Terepeta authored
      Reasons to remove:
      - It's confusing - we already have a widely used `Unique` module in
        `basicTypes/` that defines a newtype called `Unique`
      - `Hoopl.Unique` is not actually used much
      
      I've also moved the `Unique{Map,Set}` from `Hoopl.Unique` to
      `Hoopl.Collections` to keep things together. But that module is also a
      bit funny - it defines two type-classes that have only one instance
      each. So we should probably either remove them or use them more
      widely... In any case, that will be a separate change.
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      
      Test Plan: ./validate
      
      Reviewers: bgamari, simonmar
      
      Reviewed By: bgamari
      
      Subscribers: kavon, rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4331
      bd58e290
    • Andreas Klebinger's avatar
      Add ability to parse likely flags for ifs in Cmm. · e7dcc708
      Andreas Klebinger authored
      Adding the ability to parse likely flags in Cmm allows better codegen
      for cmm files.
      
      Test Plan: ci
      
      Reviewers: bgamari, simonmar
      
      Reviewed By: bgamari
      
      Subscribers: rwbarton, thomie, carter
      
      GHC Trac Issues: #14672
      
      Differential Revision: https://phabricator.haskell.org/D4316
      e7dcc708
    • Andreas Klebinger's avatar
      Handle the likely:True case in CmmContFlowOpt · 52dfb25c
      Andreas Klebinger authored
      It's better to fall through to the likely case than to jump to it.
      
      We optimize for this in CmmContFlowOpt when likely:False.
      This commit extends the logic there to handle cases with likely:True
      as well.
      
      Test Plan: ci
      
      Reviewers: bgamari, simonmar
      
      Reviewed By: bgamari
      
      Subscribers: simonmar, alexbiehl, rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4306
      52dfb25c
  4. 21 Jan, 2018 2 commits
    • niteria's avatar
      Use IntSet in Dataflow · 88297438
      niteria authored
      Before this change, a list was used as a substitute for a heap.
      This led to quadratic behavior on a simple program (see new
      test case).
      
      This change replaces it with IntSet in effect reverting
      5a1a2633. @simonmar said it's fine to revert as long as nofib
      results are good.
      
      Test Plan:
      new test case:
      
      20% improvement
      3x improvement when N=10000
      
      nofib:
      
      I run it twice for before and after because the compile time
      results are noisy.
      
      - Compile Allocations:
      
      ```
                before    before re-run    after     after re-run
      -1 s.d.   -----     -0.0%            -0.1%     -0.1%
      +1 s.d.   -----     +0.0%            +0.1%     +0.1%
      Average   -----     +0.0%            -0.0%     -0.0%
      ```
      - Compile Time:
      
      ```
                before    before re-run    after     after re-run
      -1 s.d.   -----     -0.1%            -2.3%     -2.6%
      +1 s.d.   -----     +5.2%            +3.7%     +4.4%
      Average   -----     +2.5%            +0.7%     +0.8%
      
      ```
      I checked each case and couldn't find consistent slow-down/speed-up on
      compile time. Full results here: P173
      
      Reviewers: simonpj, simonmar, bgamari
      
      Reviewed By: bgamari
      
      Subscribers: rwbarton, thomie, carter, simonmar
      
      GHC Trac Issues: #14667
      
      Differential Revision: https://phabricator.haskell.org/D4329
      88297438
    • John Ky's avatar
      Add new mbmi and mbmi2 compiler flags · f8557696
      John Ky authored
      This adds support for the bit deposit and extraction operations provided
      by the BMI and BMI2 instruction set extensions on modern amd64 machines.
      
      Implement x86 code generator for pdep and pext.  Properly initialise
      bmiVersion field.
      
      pdep and pext test cases
      
      Fix pattern match for pdep and pext instructions
      
      Fix build of pdep and pext code for 32-bit architectures
      
      Test Plan: Validate
      
      Reviewers: austin, simonmar, bgamari, angerman
      
      Reviewed By: bgamari
      
      Subscribers: trommler, carter, angerman, thomie, rwbarton, newhoggy
      
      GHC Trac Issues: #14206
      
      Differential Revision: https://phabricator.haskell.org/D4236
      f8557696
  5. 18 Jan, 2018 2 commits
  6. 15 Jan, 2018 1 commit
    • Andreas Klebinger's avatar
      Simplify guard in createSwitchPlan. · bc383f20
      Andreas Klebinger authored
      Given that we have two unique keys (guaranteed by Map) checking that
      `|range| == 1` is faster.
      
      The fact that `x1 == lo` and `x2 == hi` is guaranteed by mkSwitchTargets
      which removes values outside of the range.
      
      Test Plan: ci
      
      Reviewers: bgamari, simonmar
      
      Reviewed By: simonmar
      
      Subscribers: rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4295
      bc383f20
  7. 19 Dec, 2017 1 commit
  8. 28 Nov, 2017 6 commits
  9. 22 Nov, 2017 2 commits
  10. 15 Nov, 2017 3 commits
  11. 09 Nov, 2017 1 commit
    • Peter Trommler's avatar
      Fix PPC NCG after blockID patch · f8e7fece
      Peter Trommler authored
      Commit rGHC8b007ab assigns the same label to the first basic block
      of a proc and to the proc entry point. This violates the PPC 64-bit ELF
      v. 1.9 and v. 2.0 ABIs and leads to duplicate symbols.
      
      This patch fixes duplicate symbols caused by block labels
      
      In commit rGHCd7b8da1 an info table label is generated from a block id.
      Getting the entry label from that info label leads to an undefined
      symbol because a suffix "_entry" that is not present in the block label.
      
      To fix that issue add a new info table label flavour for labels
      derived from block ids. Converting such a label with toEntryLabel
      produces the original block label.
      
      Fixes #14311
      
      Test Plan: ./validate
      
      Reviewers: austin, bgamari, simonmar, erikd, hvr, angerman
      
      Reviewed By: bgamari
      
      Subscribers: rwbarton, thomie
      
      GHC Trac Issues: #14311
      
      Differential Revision: https://phabricator.haskell.org/D4149
      f8e7fece
  12. 06 Nov, 2017 2 commits
    • Ben Gamari's avatar
      cmm/CBE: Fix a few more zip uses · a27056f9
      Ben Gamari authored
      Ensure that we don't consider lists of equal length to be equal when
      they are not. I noticed these while working on the fix for #14361.
      
      Reviewers: austin, simonmar, michalt
      
      Reviewed By: michalt
      
      Subscribers: rwbarton, thomie
      
      GHC Trac Issues: #14361
      
      Differential Revision: https://phabricator.haskell.org/D4153
      a27056f9
    • Ben Gamari's avatar
      cmm/CBE: Fix comparison between blocks of different lengths · 6f990c54
      Ben Gamari authored
      Previously CBE computed equality by taking the lists of middle nodes of
      the blocks being compared and zipping them together. It would then map
      over this list with the equality relation, and accumulate the result.
      
      However, this is completely wrong: Consider what will happen when we
      compare a block with no middle nodes with one with one or more. The
      result of `zip` will be empty and consequently the pass may conclude
      that the two are indeed equivalent (if their last nodes also match).
      This is very bad and the cause of #14361.
      
      The solution I chose was just to write out an explicit recursion, like I
      distinctly recall considering doing when I first wrote this code.
      Unfortunately I was feeling clever at the time.
      
      Unfortunately this case was just rare enough not to be triggered by the
      testsuite. I still need to find a testcase that doesn't have external
      dependencies.
      
      Test Plan: Need to find a more minimal testcase
      
      Reviewers: austin, simonmar, michalt
      
      Reviewed By: michalt
      
      Subscribers: michalt, rwbarton, thomie, hvr
      
      GHC Trac Issues: #14361
      
      Differential Revision: https://phabricator.haskell.org/D4152
      6f990c54
  13. 03 Nov, 2017 1 commit
    • alexbiehl's avatar
      CmmSink: Use a IntSet instead of a list · 43537568
      alexbiehl authored
      CmmProcs which have *lots* of local variables take a considerable
      amount of time in CmmSink. This was noticed by @tdammers in #7258
      while compiling files with large records (~200-400 fields).
      
      Before:
      
      ```
              Sun Oct 29 19:58 2017 Time and Allocation Profiling Report (Final)
      
                 ghc-stage2 +RTS -p -RTS
      -B/Users/alexbiehl/git/ghc/inplace/lib /Users/alexbiehl/Downloads/W2.hs
      -fforce-recomp -O2
      
              total time  =       26.00 secs   (25996 ticks @ 1000 us, 1 processor)
              total alloc = 14,921,627,912 bytes  (excludes profiling overheads)
      
      COST CENTRE     MODULE      SRC %time %alloc
      
      sink            CmmPipeline
      compiler/cmm/CmmPipeline.hs:(104,13)-(105,59)        55.7   15.9
      SimplTopBinds   SimplCore   compiler/simplCore/SimplCore.hs:761:39-74 19.5   30.6
      FloatOutwards   SimplCore   compiler/simplCore/SimplCore.hs:471:40-66 4.2    9.0
      RegAlloc-linear AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55)    4.0   11.1
      pprNativeCode   AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65)    2.8    6.3
      NewStranal      SimplCore   compiler/simplCore/SimplCore.hs:480:40-63 1.6    3.7
      OccAnal         SimplCore compiler/simplCore/SimplCore.hs:(739,22)-(740,67)     1.5    3.5
      StgCmm          HscMain compiler/main/HscMain.hs:(1426,13)-(1427,62)          1.2    2.4
      regLiveness     AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(591,17)-(593,52)    1.2    1.9
      genMachCode     AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(580,17)-(582,62)    0.9    1.8
      NativeCodeGen   CodeOutput  compiler/main/CodeOutput.hs:171:18-78 0.9    2.1
      CoreTidy        HscMain     compiler/main/HscMain.hs:1253:27-67 0.8    1.9
      ```
      
      After:
      
      ```
              Sun Oct 29 19:18 2017 Time and Allocation Profiling Report (Final)
      
                 ghc-stage2 +RTS -p -RTS
      -B/Users/alexbiehl/git/ghc/inplace/lib /Users/alexbiehl/Downloads/W2.hs
      -fforce-recomp -O2
      
              total time  =       13.31 secs   (13307 ticks @ 1000 us, 1 processor)
              total alloc = 15,772,184,488 bytes  (excludes profiling overheads)
      
      COST CENTRE     MODULE         SRC %time %alloc
      
      SimplTopBinds   SimplCore
      compiler/simplCore/SimplCore.hs:761:39-74            38.3   29.0
      sink            CmmPipeline compiler/cmm/CmmPipeline.hs:(104,13)-(105,59)        13.2   20.3
      RegAlloc-linear AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55)    8.3   10.5
      FloatOutwards   SimplCore compiler/simplCore/SimplCore.hs:471:40-66             8.1    8.5
      pprNativeCode   AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65)    5.4    5.9
      NewStranal      SimplCore compiler/simplCore/SimplCore.hs:480:40-63             3.1    3.5
      OccAnal         SimplCore compiler/simplCore/SimplCore.hs:(739,22)-(740,67)     2.9    3.3
      StgCmm          HscMain compiler/main/HscMain.hs:(1426,13)-(1427,62)          2.3    2.3
      regLiveness     AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(591,17)-(593,52)    2.1    1.8
      NativeCodeGen   CodeOutput     compiler/main/CodeOutput.hs:171:18-78 1.7    2.0
      genMachCode     AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(580,17)-(582,62)    1.6    1.7
      CoreTidy        HscMain        compiler/main/HscMain.hs:1253:27-67 1.4    1.8
      foldNodesBwdOO  Hoopl.Dataflow compiler/cmm/Hoopl/Dataflow.hs:(397,1)-(403,17)       1.1    0.8
      ```
      
      Reviewers: austin, bgamari, simonmar
      
      Reviewed By: bgamari
      
      Subscribers: duog, rwbarton, thomie, tdammers
      
      GHC Trac Issues: #7258
      
      Differential Revision: https://phabricator.haskell.org/D4145
      43537568
  14. 30 Oct, 2017 3 commits
    • Michal Terepeta's avatar
      Allow packing constructor fields · cca2d6b7
      Michal Terepeta authored
      This is another step for fixing #13825 and is based on D38 by Simon
      Marlow.
      
      The change allows storing multiple constructor fields within the same
      word. This currently applies only to `Float`s, e.g.,
      ```
      data Foo = Foo {-# UNPACK #-} !Float {-# UNPACK #-} !Float
      ```
      on 64-bit arch, will now store both fields within the same constructor
      word. For `WordX/IntX` we'll need to introduce new primop types.
      
      Main changes:
      
      - We now use sizes in bytes when we compute the offsets for
        constructor fields in `StgCmmLayout` and introduce padding if
        necessary (word-sized fields are still word-aligned)
      
      - `ByteCodeGen` had to be updated to correctly construct the data
        types. This required some new bytecode instructions to allow pushing
        things that are not full words onto the stack (and updating
        `Interpreter.c`). Note that we only use the packed stuff when
        constructing data types (i.e., for `PACK`), in all other cases the
        behavior should not change.
      
      - `RtClosureInspect` was changed to handle the new layout when
        extracting subterms. This seems to be used by things like `:print`.
        I've also added a test for this.
      
      - I deviated slightly from Simon's approach and use `PrimRep` instead
        of `ArgRep` for computing the size of fields.  This seemed more
        natural and in the future we'll probably want to introduce new
        primitive types (e.g., `Int8#`) and `PrimRep` seems like a better
        place to do that (where we already have `Int64Rep` for example).
        `ArgRep` on the other hand seems to be more focused on calling
        functions.
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      
      Test Plan: ./validate
      
      Reviewers: bgamari, simonmar, austin, hvr, goldfire, erikd
      
      Reviewed By: bgamari
      
      Subscribers: maoe, rwbarton, thomie
      
      GHC Trac Issues: #13825
      
      Differential Revision: https://phabricator.haskell.org/D3809
      cca2d6b7
    • alexbiehl's avatar
      Turn `compareByteArrays#` out-of-line primop into inline primop · 76735615
      alexbiehl authored
      Depends on D4090
      
      Reviewers: austin, bgamari, erikd, simonmar, alexbiehl
      
      Reviewed By: bgamari
      
      Subscribers: rwbarton, thomie
      
      Differential Revision: https://phabricator.haskell.org/D4091
      76735615
    • Ben Gamari's avatar
      Add -falignment-sanitization flag · cecd2f2d
      Ben Gamari authored
      Here we add a flag to instruct the native code generator to add
      alignment checks in all info table dereferences. This is helpful in
      catching pointer tagging issues.
      
      Thanks to @jrtc27 for uncovering the tagging issues on Sparc which
      inspired this flag.
      
      Test Plan: Validate
      
      Reviewers: simonmar, austin, erikd
      
      Reviewed By: simonmar
      
      Subscribers: rwbarton, trofi, thomie, jrtc27
      
      Differential Revision: https://phabricator.haskell.org/D4101
      cecd2f2d
  15. 18 Oct, 2017 1 commit
  16. 26 Sep, 2017 1 commit
  17. 24 Sep, 2017 1 commit
  18. 23 Sep, 2017 1 commit
    • Moritz Angermann's avatar
      Fix AsmTempLabel · d5596126
      Moritz Angermann authored
      Summary:
      This is another fallout from 8b007abb
      should fix Trac #14264. I am not sure if this is
      complete. It does however allow me to build an iOS
      LLVM cross compiler.
      
      Reviewers: bgamari, trofi, austin, simonmar
      
      Reviewed By: trofi
      
      Subscribers: rwbarton, thomie
      
      GHC Trac Issues: #14264
      
      Differential Revision: https://phabricator.haskell.org/D4014
      d5596126
  19. 22 Sep, 2017 1 commit
    • Moritz Angermann's avatar
      Fix broken LLVM code gen · d7b8da1f
      Moritz Angermann authored
      In 8b007abb (nativeGen: Consistently use blockLbl to generate
      CLabels from BlockIds) all blockLbls were changed. This interfered with
      the `toInfoLbl` call in CmmProcPoint, and caused the LLVM backend to
      fall over.
      
      Reviewers: bgamari, austin, simonmar
      
      Reviewed By: bgamari
      
      Subscribers: rwbarton, thomie
      
      Differential Revision: https://phabricator.haskell.org/D4006
      d7b8da1f
  20. 21 Sep, 2017 1 commit
    • Ben Gamari's avatar
      cmm/CBE: Use foldLocalRegsDefd · 9aa73892
      Ben Gamari authored
      Simonpj suggested this as a follow-on to #14226 to avoid code
      duplication. This also gives us the ability to CBE cases involving
      foreign calls for free.
      
      Test Plan: Validate
      
      Reviewers: austin, simonmar, simonpj
      
      Reviewed By: simonpj
      
      Subscribers: michalt, simonpj, rwbarton, thomie
      
      GHC Trac Issues: #14226
      
      Differential Revision: https://phabricator.haskell.org/D3999
      9aa73892
  21. 19 Sep, 2017 3 commits
    • Ben Gamari's avatar
      cmm/CBE: Collapse blocks equivalent up to alpha renaming of local registers · 7920a7d9
      Ben Gamari authored
      As noted in #14226, the common block elimination pass currently
      implements an extremely strict equivalence relation, demanding that two
      blocks are equivalent including the names of their local registers. This
      is quite restrictive and severely hampers the effectiveness of the pass.
      
      Here we allow the CBE pass to collapse blocks which are equivalent up to
      alpha renaming of locally-bound local registers. This is completely safe
      and catches many more duplicate blocks.
      
      Test Plan: Validate
      
      Reviewers: austin, simonmar, michalt
      
      Reviewed By: michalt
      
      Subscribers: rwbarton, thomie
      
      GHC Trac Issues: #14226
      
      Differential Revision: https://phabricator.haskell.org/D3973
      7920a7d9
    • Herbert Valerio Riedel's avatar
      compiler: introduce custom "GhcPrelude" Prelude · f63bc730
      Herbert Valerio Riedel authored
      This switches the compiler/ component to get compiled with
      -XNoImplicitPrelude and a `import GhcPrelude` is inserted in all
      modules.
      
      This is motivated by the upcoming "Prelude" re-export of
      `Semigroup((<>))` which would cause lots of name clashes in every
      modulewhich imports also `Outputable`
      
      Reviewers: austin, goldfire, bgamari, alanz, simonmar
      
      Reviewed By: bgamari
      
      Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari
      
      Differential Revision: https://phabricator.haskell.org/D3989
      f63bc730
    • Ben Gamari's avatar
      nativeGen: Consistently use blockLbl to generate CLabels from BlockIds · 8b007abb
      Ben Gamari authored
      This fixes #14221, where the NCG and the DWARF code were apparently
      giving two different names to the same block.
      
      Test Plan: Validate with DWARF support enabled.
      
      Reviewers: simonmar, austin
      
      Subscribers: rwbarton, thomie
      
      GHC Trac Issues: #14221
      
      Differential Revision: https://phabricator.haskell.org/D3977
      8b007abb
  22. 14 Sep, 2017 1 commit