    • Sylvain Henry's avatar
      Module hierarchy: StgToCmm (#13009) · 447864a9
      Sylvain Henry authored
      Add StgToCmm module hierarchy. Platform modules that are used in several
      other places (NCG, LLVM codegen, Cmm transformations) are put into
    • James Foster's avatar
      Remove unused imports of the form 'import foo ()' (Fixes #17065)
      James Foster authored
      These kinds of imports are necessary in some cases such as
      importing instances of typeclasses or intentionally creating
      dependencies in the build system, but '-Wunused-imports' can't
      detect when they are no longer needed. This commit removes the
      unused ones currently in the code base (not including test files
      or submodules), with the hope that doing so may increase
      parallelism in the build system by removing unnecessary
    • John Ericson's avatar
      Move 'Platform' to ghc-boot
      John Ericson authored
      ghc-pkg needs to be aware of platforms so it can figure out which
      subdire within the user package db to use. This is admittedly
      roundabout, but maybe Cabal could use the same notion of a platform as
      GHC to good affect too.
    • Andreas Klebinger's avatar
      Replace most occurences of foldl with foldl'.
      Andreas Klebinger authored
      This patch adds foldl' to GhcPrelude and changes must occurences
      of foldl to foldl'. This leads to better performance especially
      for quick builds where GHC does not perform strictness analysis.
      It does change strictness behaviour when we use foldl' to turn
      a argument list into function applications. But this is only a
      drawback if code looks ONLY at the last argument but not at the first.
      And as the benchmarks show leads to fewer allocations in practice
      at O2.
      Compiler performance for Nofib:
      O2 Allocations:
              -1 s.d.                -----            -0.0%
              +1 s.d.                -----            -0.0%
              Average                -----            -0.0%
      O2 Compile Time:
              -1 s.d.                -----            -2.8%
              +1 s.d.                -----            +1.3%
              Average                -----            -0.8%
      O0 Allocations:
              -1 s.d.                -----            -0.2%
              +1 s.d.                -----            -0.1%
              Average                -----            -0.2%
      Test Plan: ci
      Reviewers: goldfire, bgamari, simonmar, tdammers, monoidal
      Reviewed By: bgamari, monoidal
      Subscribers: tdammers, rwbarton, thomie, carter
      Differential Revision: https://phabricator.haskell.org/D4929
    • Michal Terepeta's avatar
      Hoopl: improve postorder calculation
      Michal Terepeta authored
      - Fix the naming and comments to indicate that we are calculating
        *reverse* postorder (and not the standard postorder).
      - Rewrite the calculation to avoid CPS code. I found it fairly
        difficult to understand and the new one seems faster (according to
        nofib, decreases compiler allocations by 0.2%)
      - Remove `LabelsPtr`, which seems unnecessary and could be *really*
        confusing. For instance, previously:
        `postorder_dfs_from <block with label X>`
        `postorder_dfs_from <label X>`
        would actually mean quite different things (and give different
      - Change the `Dataflow` module to always use entry of the graph for
        reverse postorder calculation. This should be the only change in
        behavior of this commit.
        Previously, if the caller provided initial facts for some of the
        labels, we would use those labels for our postorder calculation.
        However, I don't think that's correct in general - if the initial
        facts did not contain the entry of the graph, we would never analyze
        the blocks reachable from the entry but unreachable from the labels
        provided with the initial facts. It seems that the only analysis that
        used this was proc-point analysis, which I think would always include
        the entry block (so I don't think there's any bug due to this).
      Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
      Test Plan: ./validate
      Reviewers: bgamari, simonmar
      Reviewed By: simonmar
      Subscribers: rwbarton, thomie, carter
      Differential Revision: https://phabricator.haskell.org/D4464
    • Simon Marlow's avatar
      Be more selective in which conditionals we invert
      Simon Marlow authored
      Test Plan: validate
      Reviewers: bgamari, AndreasK, erikd
      Reviewed By: AndreasK
      Subscribers: rwbarton, thomie, carter
      Differential Revision: https://phabricator.haskell.org/D4398
    • alexbiehl's avatar
      CmmSink: Use a IntSet instead of a list
      alexbiehl authored
      CmmProcs which have *lots* of local variables take a considerable
      amount of time in CmmSink. This was noticed by @tdammers in #7258
      while compiling files with large records (~200-400 fields).
              Sun Oct 29 19:58 2017 Time and Allocation Profiling Report (Final)
                 ghc-stage2 +RTS -p -RTS
      -B/Users/alexbiehl/git/ghc/inplace/lib /Users/alexbiehl/Downloads/W2.hs
      -fforce-recomp -O2
              total time  =       26.00 secs   (25996 ticks @ 1000 us, 1 processor)
              total alloc = 14,921,627,912 bytes  (excludes profiling overheads)
      COST CENTRE     MODULE      SRC %time %alloc
      sink            CmmPipeline
      compiler/cmm/CmmPipeline.hs:(104,13)-(105,59)        55.7   15.9
      SimplTopBinds   SimplCore   compiler/simplCore/SimplCore.hs:761:39-74 19.5   30.6
      FloatOutwards   SimplCore   compiler/simplCore/SimplCore.hs:471:40-66 4.2    9.0
      RegAlloc-linear AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55)    4.0   11.1
      pprNativeCode   AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65)    2.8    6.3
      NewStranal      SimplCore   compiler/simplCore/SimplCore.hs:480:40-63 1.6    3.7
      OccAnal         SimplCore compiler/simplCore/SimplCore.hs:(739,22)-(740,67)     1.5    3.5
      StgCmm          HscMain compiler/main/HscMain.hs:(1426,13)-(1427,62)          1.2    2.4
      regLiveness     AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(591,17)-(593,52)    1.2    1.9
      genMachCode     AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(580,17)-(582,62)    0.9    1.8
      NativeCodeGen   CodeOutput  compiler/main/CodeOutput.hs:171:18-78 0.9    2.1
      CoreTidy        HscMain     compiler/main/HscMain.hs:1253:27-67 0.8    1.9
              Sun Oct 29 19:18 2017 Time and Allocation Profiling Report (Final)
                 ghc-stage2 +RTS -p -RTS
      -B/Users/alexbiehl/git/ghc/inplace/lib /Users/alexbiehl/Downloads/W2.hs
      -fforce-recomp -O2
              total time  =       13.31 secs   (13307 ticks @ 1000 us, 1 processor)
              total alloc = 15,772,184,488 bytes  (excludes profiling overheads)
      COST CENTRE     MODULE         SRC %time %alloc
      SimplTopBinds   SimplCore
      compiler/simplCore/SimplCore.hs:761:39-74            38.3   29.0
      sink            CmmPipeline compiler/cmm/CmmPipeline.hs:(104,13)-(105,59)        13.2   20.3
      RegAlloc-linear AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55)    8.3   10.5
      FloatOutwards   SimplCore compiler/simplCore/SimplCore.hs:471:40-66             8.1    8.5
      pprNativeCode   AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65)    5.4    5.9
      NewStranal      SimplCore compiler/simplCore/SimplCore.hs:480:40-63             3.1    3.5
      OccAnal         SimplCore compiler/simplCore/SimplCore.hs:(739,22)-(740,67)     2.9    3.3
      StgCmm          HscMain compiler/main/HscMain.hs:(1426,13)-(1427,62)          2.3    2.3
      regLiveness     AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(591,17)-(593,52)    2.1    1.8
      NativeCodeGen   CodeOutput     compiler/main/CodeOutput.hs:171:18-78 1.7    2.0
      genMachCode     AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(580,17)-(582,62)    1.6    1.7
      CoreTidy        HscMain        compiler/main/HscMain.hs:1253:27-67 1.4    1.8
      foldNodesBwdOO  Hoopl.Dataflow compiler/cmm/Hoopl/Dataflow.hs:(397,1)-(403,17)       1.1    0.8
      Reviewers: austin, bgamari, simonmar
      Reviewed By: bgamari
      Subscribers: duog, rwbarton, thomie, tdammers
      GHC Trac Issues: #7258
      Differential Revision: https://phabricator.haskell.org/D4145
    • Herbert Valerio Riedel's avatar
      compiler: introduce custom "GhcPrelude" Prelude
      Herbert Valerio Riedel authored
      This switches the compiler/ component to get compiled with
      -XNoImplicitPrelude and a `import GhcPrelude` is inserted in all
      This is motivated by the upcoming "Prelude" re-export of
      `Semigroup((<>))` which would cause lots of name clashes in every
      modulewhich imports also `Outputable`
      Reviewers: austin, goldfire, bgamari, alanz, simonmar
      Reviewed By: bgamari
      Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari
      Differential Revision: https://phabricator.haskell.org/D3989
    • Michal Terepeta's avatar
      Hoopl: remove dependency on Hoopl package
      Michal Terepeta authored
      This copies the subset of Hoopl's functionality needed by GHC to
      `cmm/Hoopl` and removes the dependency on the Hoopl package.
      The main motivation for this change is the confusing/noisy interface
      between GHC and Hoopl:
      - Hoopl has `Label` which is GHC's `BlockId` but different than
        GHC's `CLabel`
      - Hoopl has `Unique` which is different than GHC's `Unique`
      - Hoopl has `Unique{Map,Set}` which are different than GHC's
      - GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is
        needed just to filter the exposed functions (filter out some of the
        Hoopl's and add the GHC ones)
      With this change, we'll be able to simplify this significantly.
      It'll also be much easier to do invasive changes (Hoopl is a public
      package on Hackage with users that depend on the current behavior)
      This should introduce no changes in functionality - it merely
      copies the relevant code.
      Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
      Test Plan: ./validate
      Reviewers: austin, bgamari, simonmar
      Reviewed By: bgamari, simonmar
      Subscribers: simonpj, kavon, rwbarton, thomie
      Differential Revision: https://phabricator.haskell.org/D3616
    • Simon Peyton Jones's avatar
      Improve code generation for conditionals
      Simon Peyton Jones authored
      This patch in in preparation for the fix to Trac #13397
      The code generator has a special case for
        case tagToEnum (a>#b) of
          False -> e1
          True  -> e2
      but it was not doing nearly so well on
        case a>#b of
          DEFAULT -> e1
          1#      -> e2
      This patch arranges to behave essentially identically in
      both cases.  In due course we can eliminate the special
      case for tagToEnum#, once we've completed Trac #13397.
      The changes are:
      * Make CmmSink swizzle the order of a conditional where necessary;
        see Note [Improving conditionals] in CmmSink
      * Hack the general case of StgCmmExpr.cgCase so that it use
        NoGcInAlts for conditionals.  This doesn't seem right, but it's
        the same choice as the tagToEnum version. Without it, code size
        increases a lot (more heap checks).
        There's a loose end here.
      * Add comments in CmmOpt.cmmMachOpFoldM
  20. 25 Jun, 2015 1 commit
  21. 21 Oct, 2014 1 commit
    • Moritz Angermann's avatar
      Fixes the ARM build
      Moritz Angermann authored
      CodeGen.Platform.hs was changed with the following diff:
          globalRegMaybe _                        = Nothing
         +#elif MACHREGS_NO_REGS
         +globalRegMaybe _ = Nothing
         +globalRegMaybe = panic "globalRegMaybe not defined for this platform"
      which causes globalRegMaybe ot panic for arch ARM.
      This patch ensures globalRegMaybe is not called on ARM.
      Signed-off-by: Moritz Angermann <moritz@lichtzwerge.de>
      Test Plan: Building arm cross-compiler (e.g. --target=arm-apple-darwin10)
      Reviewers: hvr, ezyang, simonmar, rwbarton, austin
      Reviewed By: austin
      Subscribers: dterei, bgamari, simonmar, ezyang, carter
      Differential Revision: https://phabricator.haskell.org/D208
      GHC Trac Issues: #9593
    • tibbe's avatar
      Re-add more primops for atomic ops on byte arrays
      tibbe authored
      This is the second attempt to add this functionality. The first
      attempt was reverted in 950fcae4, due
      to register allocator failure on x86. Given how the register
      allocator currently works, we don't have enough registers on x86 to
      support cmpxchg using complicated addressing modes. Instead we fall
      back to a simpler addressing mode on x86.
      Adds the following primops:
       * atomicReadIntArray#
       * atomicWriteIntArray#
       * fetchSubIntArray#
       * fetchOrIntArray#
       * fetchXorIntArray#
       * fetchAndIntArray#
      Makes these pre-existing out-of-line primops inline:
       * fetchAddIntArray#
       * casIntArray#
    • tibbe's avatar
      Add more primops for atomic ops on byte arrays
      tibbe authored
      Add more primops for atomic ops on byte arrays
      Adds the following primops:
       * atomicReadIntArray#
       * atomicWriteIntArray#
       * fetchSubIntArray#
       * fetchOrIntArray#
       * fetchXorIntArray#
       * fetchAndIntArray#
      Makes these pre-existing out-of-line primops inline:
       * fetchAddIntArray#
       * casIntArray#
    • Simon Marlow's avatar
      Discard dead assignments in tryToInline
      Simon Marlow authored
      Inlining global registers and constants made code slightly larger in
      some cases.  I finally got around to looking into why, and discovered
      one reason: we weren't discarding dead code in some cases.  This patch
      fixes it.
    • Jan Stolarek's avatar
      Improve sinking pass
      Jan Stolarek authored
      This commit does two things:
        * Allows duplicating of global registers and literals by inlining
          them. Previously we would only inline global register or literal
          if it was used only once.
        * Changes method of determining conflicts between a node and an
          assignment. New method has two advantages. It relies on
          DefinerOfRegs and UserOfRegs typeclasses, so if a set of registers
          defined or used by a node should ever change, `conflicts` function
          will use the changed definition. This definition also catches
          more cases than the previous one (namely CmmCall and CmmForeignCall)
          which is a step towards making it possible to run sinking pass
          before stack layout (currently this doesn't work).
      This patch also adds a lot of comments that are result of about two-week
      long investigation of how sinking pass works and why it does what it does.
