1. 27 Jan, 2020 1 commit
  2. 25 Jan, 2020 1 commit
  3. 18 Dec, 2019 1 commit
    • Sylvain Henry's avatar
      Add GHC-API logging hooks · 58655b9d
      Sylvain Henry authored
      * Add 'dumpAction' hook to DynFlags.
      
      It allows GHC API users to catch dumped intermediate codes and
      information. The format of the dump (Core, Stg, raw text, etc.) is now
      reported allowing easier automatic handling.
      
      * Add 'traceAction' hook to DynFlags.
      
      Some dumps go through the trace mechanism (for instance unfoldings that
      have been considered for inlining). This is problematic because:
      1) dumps aren't written into files even with -ddump-to-file on
      2) dumps are written on stdout even with GHC API
      3) in this specific case, dumping depends on unsafe globally stored
      DynFlags which is bad for GHC API users
      
      We introduce 'traceAction' hook which allows GHC API to catch those
      traces and to avoid using globally stored DynFlags.
      
      * Avoid dumping empty logs via dumpAction/traceAction (but still write
      empty files to keep the existing behavior)
      58655b9d
  4. 23 Oct, 2019 2 commits
    • Andreas Klebinger's avatar
      Make dynflag argument for withTiming pure. · 6beea836
      Andreas Klebinger authored
      19 times out of 20 we already have dynflags in scope.
      
      We could just always use `return dflags`. But this is in fact not free.
      When looking at some STG code I noticed that we always allocate a
      closure for this expression in the heap. Clearly a waste in these cases.
      
      For the other cases we can either just modify the callsite to
      get dynflags or use the _D variants of withTiming I added which
      will use getDynFlags under the hood.
      6beea836
    • Andreas Klebinger's avatar
      Fix bug in the x86 backend involving the CFG. · aa778152
      Andreas Klebinger authored
      This is part two of fixing #17334.
      
      There are two parts to this commit:
      
      - A bugfix for computing loop levels
      - A bugfix of basic block invariants in the NCG.
      
      -----------------------------------------------------------
      
      In the first bug we ended up with a CFG of the sort: [A -> B -> C]
      This was represented via maps as fromList [(A,B),(B,C)] and later
      transformed into a adjacency array. However the transformation did
      not include block C in the array (since we only looked at the keys of
      the map).
      
      This was still fine until we tried to look up successors for C and tried
      to read outside of the array bounds when accessing C.
      
      In order to prevent this in the future I refactored to code to include
      all nodes as keys in the map representation. And make this a invariant
      which is checked in a few places.
      
      Overall I expect this to make the code more robust as now any failed
      lookup will represent an error, versus failed lookups sometimes being
      expected and sometimes not.
      
      In terms of performance this makes some things cheaper (getting a list
      of all nodes) and others more expensive (adding a new edge). Overall
      this adds up to no noteable performance difference.
      
      -----------------------------------------------------------
      
      Part 2: When the NCG generated a new basic block, it did
      not always insert a NEWBLOCK meta instruction in the stream which
      caused a quite subtle bug.
      
          During instruction selection a statement `s`
          in a block B with control of the sort: B -> C
          will sometimes result in control
          flow of the sort:
      
                  ┌ < ┐
                  v   ^
            B ->  B1  ┴ -> C
      
          as is the case for some atomic operations.
      
          Now to keep the CFG in sync when introducing B1 we clearly
          want to insert it between B and C. However there is
          a catch when we have to deal with self loops.
      
          We might start with code and a CFG of these forms:
      
          loop:
              stmt1               ┌ < ┐
              ....                v   ^
              stmtX              loop ┘
              stmtY
              ....
              goto loop:
      
          Now we introduce B1:
                                  ┌ ─ ─ ─ ─ ─┐
              loop:               │   ┌ <  ┐ │
              instrs              v   │    │ ^
              ....               loop ┴ B1 ┴ ┘
              instrsFromX
              stmtY
              goto loop:
      
          This is simple, all outgoing edges from loop now simply
          start from B1 instead and the code generator knows which
          new edges it introduced for the self loop of B1.
      
          Disaster strikes if the statement Y follows the same pattern.
          If we apply the same rule that all outgoing edges change then
          we end up with:
      
              loop ─> B1 ─> B2 ┬─┐
                │      │    └─<┤ │
                │      └───<───┘ │
                └───────<────────┘
      
          This is problematic. The edge B1->B1 is modified as expected.
          However the modification is wrong!
      
          The assembly in this case looked like this:
      
          _loop:
              <instrs>
          _B1:
              ...
              cmpxchgq ...
              jne _B1
              <instrs>
              <end _B1>
          _B2:
              ...
              cmpxchgq ...
              jne _B2
              <instrs>
              jmp loop
      
          There is no edge _B2 -> _B1 here. It's still a self loop onto _B1.
      
          The problem here is that really B1 should be two basic blocks.
          Otherwise we have control flow in the *middle* of a basic block.
          A contradiction!
      
          So to account for this we add yet another basic block marker:
      
          _B:
              <instrs>
          _B1:
              ...
              cmpxchgq ...
              jne _B1
              jmp _B1'
          _B1':
              <instrs>
              <end _B1>
          _B2:
              ...
      
          Now when inserting B2 we will only look at the outgoing edges of B1' and
          everything will work out nicely.
      
          You might also wonder why we don't insert jumps at the end of _B1'. There is
          no way another block ends up jumping to the labels _B1 or _B2 since they are
          essentially invisible to other blocks. View them as control flow labels local
          to the basic block if you'd like.
      
          Not doing this ultimately caused (part 2 of) #17334.
      aa778152
  5. 22 Oct, 2019 1 commit
    • Stefan Schulze Frielinghaus's avatar
      Implement s390x LLVM backend. · fd8b666a
      Stefan Schulze Frielinghaus authored
      This patch adds support for the s390x architecture for the LLVM code
      generator. The patch includes a register mapping of STG registers onto
      s390x machine registers which enables a registerised build.
      fd8b666a
  6. 16 Oct, 2019 1 commit
    • Andreas Klebinger's avatar
      Add loop level analysis to the NCG backend. · 535a88e1
      Andreas Klebinger authored
      For backends maintaining the CFG during codegen
      we can now find loops and their nesting level.
      
      This is based on the Cmm CFG and dominator analysis.
      
      As a result we can estimate edge frequencies a lot better
      for methods, resulting in far better code layout.
      
      Speedup on nofib: ~1.5%
      Increase in compile times: ~1.9%
      
      To make this feasible this commit adds:
      * Dominator analysis based on the Lengauer-Tarjan Algorithm.
      * An algorithm estimating global edge frequences from branch
      probabilities - In CFG.hs
      
      A few static branch prediction heuristics:
      
      * Expect to take the backedge in loops.
      * Expect to take the branch NOT exiting a loop.
      * Expect integer vs constant comparisons to be false.
      
      We also treat heap/stack checks special for branch prediction
      to avoid them being treated as loops.
      535a88e1
  7. 13 Oct, 2019 1 commit
    • Andreas Klebinger's avatar
      Fix #17334 where NCG did not properly update the CFG. · c1bd07cd
      Andreas Klebinger authored
      Statements can change the basic block in which instructions
      are placed during instruction selection.
      
      We have to keep track of this switch of the current basic block
      as we need this information in order to properly update the CFG.
      
      This commit implements this change and fixes #17334.
      
      We do so by having stmtToInstr return the new block id
      if a statement changed the basic block.
      c1bd07cd
  8. 20 Sep, 2019 1 commit
    • Alp Mestanogullari's avatar
      ErrUtils: split withTiming into withTiming and withTimingSilent · b3e5c731
      Alp Mestanogullari authored
      'withTiming' becomes a function that, when passed '-vN' (N >= 2) or
      '-ddump-timings', will print timing (and possibly allocations) related
      information. When additionally built with '-eventlog' and executed with
      '+RTS -l', 'withTiming' will also emit both 'traceMarker' and 'traceEvent'
      events to the eventlog.
      
      'withTimingSilent' on the other hand will never print any timing information,
      under any circumstance, and will only emit 'traceEvent' events to the eventlog.
      As pointed out in !1672, 'traceMarker' is better suited for things that we
      might want to visualize in tools like eventlog2html, while 'traceEvent'
      is better suited for internal events that occur a lot more often and that we
      don't necessarily want to visualize.
      
      This addresses #17138 by using 'withTimingSilent' for all the codegen bits
      that are expressed as a bunch of small computations over streams of codegen
      ASTs.
      b3e5c731
  9. 13 Sep, 2019 1 commit
  10. 09 Sep, 2019 1 commit
    • Sylvain Henry's avatar
      Module hierarchy: StgToCmm (#13009) · 447864a9
      Sylvain Henry authored
      Add StgToCmm module hierarchy. Platform modules that are used in several
      other places (NCG, LLVM codegen, Cmm transformations) are put into
      GHC.Platform.
      447864a9
  11. 02 Sep, 2019 1 commit
  12. 28 Aug, 2019 1 commit
    • Ömer Sinan Ağacan's avatar
      Return results of Cmm streams in backends · 1c7ec449
      Ömer Sinan Ağacan authored
      This generalizes code generators (outputAsm, outputLlvm, outputC, and
      the call site codeOutput) so that they'll return the return values of
      the passed Cmm streams.
      
      This allows accumulating data during Cmm generation and returning it to
      the call site in HscMain.
      
      Previously the Cmm streams were assumed to return (), so the code
      generators returned () as well.
      
      This change is required by !1304 and !1530.
      
      Skipping CI as this was tested before and I only updated the commit
      message.
      
      [skip ci]
      1c7ec449
  13. 03 Aug, 2019 1 commit
  14. 20 Jun, 2019 1 commit
    • John Ericson's avatar
      Move 'Platform' to ghc-boot · bff2f24b
      John Ericson authored
      ghc-pkg needs to be aware of platforms so it can figure out which
      subdire within the user package db to use. This is admittedly
      roundabout, but maybe Cabal could use the same notion of a platform as
      GHC to good affect too.
      bff2f24b
  15. 12 Jun, 2019 1 commit
  16. 22 May, 2019 1 commit
  17. 14 May, 2019 1 commit
    • John Ericson's avatar
      Remove all target-specific portions of Config.hs · e529c65e
      John Ericson authored
      1. If GHC is to be multi-target, these cannot be baked in at compile
         time.
      
      2. Compile-time flags have a higher maintenance than run-time flags.
      
      3. The old way makes build system implementation (various bootstrapping
         details) with the thing being built. E.g. GHC doesn't need to care
         about which integer library *will* be used---this is purely a crutch
         so the build system doesn't need to pass flags later when using that
         library.
      
      4. Experience with cross compilation in Nixpkgs has shown things work
         nicer when compiler's can *optionally* delegate the bootstrapping the
         package manager. The package manager knows the entire end-goal build
         plan, and thus can make top-down decisions on bootstrapping. GHC can
         just worry about GHC, not even core library like base and ghc-prim!
      e529c65e
  18. 11 Apr, 2019 1 commit
    • Carter Schonwald's avatar
      removing x87 register support from native code gen · 42504f4a
      Carter Schonwald authored
      * simplifies registers to have GPR, Float and Double, by removing the SSE2 and X87 Constructors
      * makes -msse2 assumed/default for x86 platforms, fixing a long standing nondeterminism in rounding
      behavior in 32bit haskell code
      * removes the 80bit floating point representation from the supported float sizes
      * theres still 1 tiny bit of x87 support needed,
      for handling float and double return values in FFI calls  wrt the C ABI on x86_32,
      but this one piece does not leak into the rest of NCG.
      * Lots of code thats not been touched in a long time got deleted as a
      consequence of all of this
      
      all in all, this change paves the way towards a lot of future further
      improvements in how GHC handles floating point computations, along with
      making the native code gen more accessible to a larger pool of contributors.
      42504f4a
  19. 09 Mar, 2019 1 commit
  20. 06 Mar, 2019 1 commit
    • Ben Gamari's avatar
      Rip out object splitting · 37f257af
      Ben Gamari authored
      The splitter is an evil Perl script that processes assembler code.
      Its job can be done better by the linker's --gc-sections flag. GHC
      passes this flag to the linker whenever -split-sections is passed on
      the command line.
      
      This is based on @DemiMarie's D2768.
      
      Fixes Trac #11315
      Fixes Trac #9832
      Fixes Trac #8964
      Fixes Trac #8685
      Fixes Trac #8629
      37f257af
  21. 15 Feb, 2019 1 commit
    • Andreas Klebinger's avatar
      Don't wrap the entry map for LiveInfo in Maybe. · bcaba30a
      Andreas Klebinger authored
      It never really encoded a invariant.
      
      * The linear register allocator just did partial pattern matches
      * The graph allocator just set it to (Just mapEmpty) for Nothing
      
      So I changed LiveInfo to directly contain the map.
      
      Further natCmmTopToLive which filled in Nothing is no longer exported.
      Instead we know call cmmTopLiveness which changes the type AND fills
      in the map.
      bcaba30a
  22. 08 Feb, 2019 1 commit
    • Andreas Klebinger's avatar
      Allow resizing the stack for the graph allocator. · 03b7abc1
      Andreas Klebinger authored
      The graph allocator now dynamically resizes the number of stack
      slots when running into the limit.
      
      This fixes #8657.
      
      Also loop membership of basic blocks is now available
      in the register allocator for cost heuristics.
      03b7abc1
  23. 17 Nov, 2018 1 commit
    • Andreas Klebinger's avatar
      NCG: New code layout algorithm. · 912fd2b6
      Andreas Klebinger authored
      Summary:
      This patch implements a new code layout algorithm.
      It has been tested for x86 and is disabled on other platforms.
      
      Performance varies slightly be CPU/Machine but in general seems to be better
      by around 2%.
      Nofib shows only small differences of about +/- ~0.5% overall depending on
      flags/machine performance in other benchmarks improved significantly.
      
      Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec,
      containers, text and xeno.
      
      While the magnitude of gains differed three different CPUs where tested with
      all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell,
      Skylake
      
      * Library benchmark results summarized:
        * containers: ~1.5% faster
        * aeson: ~2% faster
        * megaparsec: ~2-5% faster
        * xml library benchmarks: 0.2%-1.1% faster
        * vector-benchmarks: 1-4% faster
        * text: 5.5% faster
      
      On average GHC compile times go down, as GHC compiled with the new layout
      is faster than the overhead introduced by using the new layout algorithm,
      
      Things this patch does:
      
      * Move code responsilbe for block layout in it's own module.
      * Move the NcgImpl Class into the NCGMonad module.
      * Extract a control flow graph from the input cmm.
      * Update this cfg to keep it in sync with changes during
        asm codegen. This has been tested on x64 but should work on x86.
        Other platforms still use the old codelayout.
      * Assign weights to the edges in the CFG based on type and limited static
        analysis which are then used for block layout.
      * Once we have the final code layout eliminate some redundant jumps.
      
        In particular turn a sequences of:
            jne .foo
            jmp .bar
          foo:
        into
            je bar
          foo:
            ..
      
      Test Plan: ci
      
      Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott
      
      Reviewed By: RyanGlScott
      
      Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton
      
      GHC Trac Issues: #15124
      
      Differential Revision: https://phabricator.haskell.org/D4726
      912fd2b6
  24. 21 Aug, 2018 1 commit
    • Andreas Klebinger's avatar
      Replace most occurences of foldl with foldl'. · 09c1d5af
      Andreas Klebinger authored
      This patch adds foldl' to GhcPrelude and changes must occurences
      of foldl to foldl'. This leads to better performance especially
      for quick builds where GHC does not perform strictness analysis.
      
      It does change strictness behaviour when we use foldl' to turn
      a argument list into function applications. But this is only a
      drawback if code looks ONLY at the last argument but not at the first.
      And as the benchmarks show leads to fewer allocations in practice
      at O2.
      
      Compiler performance for Nofib:
      
      O2 Allocations:
              -1 s.d.                -----            -0.0%
              +1 s.d.                -----            -0.0%
              Average                -----            -0.0%
      
      O2 Compile Time:
              -1 s.d.                -----            -2.8%
              +1 s.d.                -----            +1.3%
              Average                -----            -0.8%
      
      O0 Allocations:
              -1 s.d.                -----            -0.2%
              +1 s.d.                -----            -0.1%
              Average                -----            -0.2%
      
      Test Plan: ci
      
      Reviewers: goldfire, bgamari, simonmar, tdammers, monoidal
      
      Reviewed By: bgamari, monoidal
      
      Subscribers: tdammers, rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4929
      09c1d5af
  25. 13 Apr, 2018 1 commit
  26. 28 Nov, 2017 2 commits
  27. 19 Sep, 2017 1 commit
    • Herbert Valerio Riedel's avatar
      compiler: introduce custom "GhcPrelude" Prelude · f63bc730
      Herbert Valerio Riedel authored
      This switches the compiler/ component to get compiled with
      -XNoImplicitPrelude and a `import GhcPrelude` is inserted in all
      modules.
      
      This is motivated by the upcoming "Prelude" re-export of
      `Semigroup((<>))` which would cause lots of name clashes in every
      modulewhich imports also `Outputable`
      
      Reviewers: austin, goldfire, bgamari, alanz, simonmar
      
      Reviewed By: bgamari
      
      Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari
      
      Differential Revision: https://phabricator.haskell.org/D3989
      f63bc730
  28. 22 Aug, 2017 1 commit
  29. 11 Jul, 2017 1 commit
    • Ben Gamari's avatar
      Use correct section types syntax for architecture · 9b9f978f
      Ben Gamari authored
      Previously GHC would always assume that section types began with `@` while
      producing assembly, which is not true. For instance, in ARM assembly syntax
      section types begin with `%`. This abstracts out section type pretty-printing
      and adjusts it to correctly account for the target architectures assembly
      flavor.
      
      Reviewers: austin, hvr, Phyx
      
      Reviewed By: Phyx
      
      Subscribers: Phyx, rwbarton, thomie, erikd
      
      GHC Trac Issues: #13937
      
      Differential Revision: https://phabricator.haskell.org/D3712
      9b9f978f
  30. 23 Jun, 2017 1 commit
    • Michal Terepeta's avatar
      Hoopl: remove dependency on Hoopl package · 42eee6ea
      Michal Terepeta authored
      This copies the subset of Hoopl's functionality needed by GHC to
      `cmm/Hoopl` and removes the dependency on the Hoopl package.
      
      The main motivation for this change is the confusing/noisy interface
      between GHC and Hoopl:
      - Hoopl has `Label` which is GHC's `BlockId` but different than
        GHC's `CLabel`
      - Hoopl has `Unique` which is different than GHC's `Unique`
      - Hoopl has `Unique{Map,Set}` which are different than GHC's
        `Uniq{FM,Set}`
      - GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is
        needed just to filter the exposed functions (filter out some of the
        Hoopl's and add the GHC ones)
      With this change, we'll be able to simplify this significantly.
      It'll also be much easier to do invasive changes (Hoopl is a public
      package on Hackage with users that depend on the current behavior)
      
      This should introduce no changes in functionality - it merely
      copies the relevant code.
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      
      Test Plan: ./validate
      
      Reviewers: austin, bgamari, simonmar
      
      Reviewed By: bgamari, simonmar
      
      Subscribers: simonpj, kavon, rwbarton, thomie
      
      Differential Revision: https://phabricator.haskell.org/D3616
      42eee6ea
  31. 05 Apr, 2017 1 commit
  32. 08 Feb, 2017 1 commit
    • Ben Gamari's avatar
      Generalize CmmUnwind and pass unwind information through NCG · 3eb737ee
      Ben Gamari authored
      As discussed in D1532, Trac Trac #11337, and Trac Trac #11338, the stack
      unwinding information produced by GHC is currently quite approximate.
      Essentially we assume that register values do not change at all within a
      basic block. While this is somewhat true in normal Haskell code, blocks
      containing foreign calls often break this assumption. This results in
      unreliable call stacks, especially in the code containing foreign calls.
      This is worse than it sounds as unreliable unwinding information can at
      times result in segmentation faults.
      
      This patch set attempts to improve this situation by tracking unwinding
      information with finer granularity. By dispensing with the assumption of
      one unwinding table per block, we allow the compiler to accurately
      represent the areas surrounding foreign calls.
      
      Towards this end we generalize the representation of unwind information
      in the backend in three ways,
      
       * Multiple CmmUnwind nodes can occur per block
      
       * CmmUnwind nodes can now carry unwind information for multiple
         registers (while not strictly necessary; this makes emitting
         unwinding information a bit more convenient in the compiler)
      
       * The NCG backend is given an opportunity to modify the unwinding
         records since it may need to make adjustments due to, for instance,
         native calling convention requirements for foreign calls (see
         #11353).
      
      This sets the stage for resolving #11337 and #11338.
      
      Test Plan: Validate
      
      Reviewers: scpmw, simonmar, austin, erikd
      
      Subscribers: qnikst, thomie
      
      Differential Revision: https://phabricator.haskell.org/D2741
      3eb737ee
  33. 10 Jan, 2017 1 commit
    • Rufflewind's avatar
      Fix terminal corruption bug and clean up SDoc interface. · 22845adc
      Rufflewind authored
      - Fix #13076 by wrapping `printDoc_` so that the terminal color is
        reset even if an exception occurs.
      
      - Add `printSDoc`, `printSDocLn`, and `bufLeftRenderSDoc` to keep `SDoc`
        values abstract (they are wrappers of `printDoc_`, `printDoc`, and
        `bufLeftRender` respectively).
      
      - Remove unused function: `printForAsm`
      
      Test Plan: manual
      
      Reviewers: RyanGlScott, austin, dfeuer, bgamari
      
      Reviewed By: dfeuer, bgamari
      
      Subscribers: dfeuer, mpickering, thomie
      
      Differential Revision: https://phabricator.haskell.org/D2932
      
      GHC Trac Issues: #13076
      22845adc
  34. 08 Dec, 2016 1 commit
  35. 29 Nov, 2016 2 commits
  36. 16 Nov, 2016 1 commit
  37. 06 Aug, 2016 1 commit