1. 19 Sep, 2017 1 commit
  2. 22 Aug, 2017 1 commit
  3. 23 Jun, 2017 1 commit
    • Michal Terepeta's avatar
      Hoopl: remove dependency on Hoopl package · 42eee6ea
      Michal Terepeta authored
      This copies the subset of Hoopl's functionality needed by GHC to
      `cmm/Hoopl` and removes the dependency on the Hoopl package.
      
      The main motivation for this change is the confusing/noisy interface
      between GHC and Hoopl:
      - Hoopl has `Label` which is GHC's `BlockId` but different than
        GHC's `CLabel`
      - Hoopl has `Unique` which is different than GHC's `Unique`
      - Hoopl has `Unique{Map,Set}` which are different than GHC's
        `Uniq{FM,Set}`
      - GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is
        needed just to filter the exposed functions (filter out some of the
        Hoopl's and add the GHC ones)
      With this change, we'll be able to simplify this significantly.
      It'll also be much easier to do invasive changes (Hoopl is a public
      package on Hackage with users that depend on the current behavior)
      
      This should introduce no changes in functionality - it merely
      copies the relevant code.
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      
      Test Plan: ./validate
      
      Reviewers: austin, bgamari, simonmar
      
      Reviewed By: bgamari, simonmar
      
      Subscribers: simonpj, kavon, rwbarton, thomie
      
      Differential Revision: https://phabricator.haskell.org/D3616
      42eee6ea
  4. 01 May, 2017 1 commit
  5. 25 Apr, 2017 1 commit
    • Peter Trommler's avatar
      PPC NCG: Implement callish prim ops · 89a3241f
      Peter Trommler authored
      Provide PowerPC optimised implementations of callish prim ops.
      
      MO_?_QuotRem
      The generic implementation of quotient remainder prim ops uses
      a division and a remainder operation. There is no remainder on
      PowerPC and so we need to implement remainder "by hand" which
      results in a duplication of the divide operation when using the
      generic code.
      
      Avoid this duplication by implementing the prim op in the native
      code generator.
      
      MO_U_Mul2
      Use PowerPC's instructions for long multiplication.
      
      Addition and subtraction
      Use PowerPC add/subtract with carry/overflow instructions
      
      MO_Clz and MO_Ctz
      Use PowerPC's CNTLZ instruction and implement count trailing
      zeros using count leading zeros
      
      MO_QuotRem2
      Implement an algorithm given by Henry Warren in "Hacker's Delight"
      using PowerPC divide instruction. TODO: Use long division instructions
      when available (POWER7 and later).
      
      Test Plan: validate on AIX and 32-bit Linux
      
      Reviewers: simonmar, erikd, hvr, austin, bgamari
      
      Reviewed By: erikd, hvr, bgamari
      
      Subscribers: trofi, kgardas, thomie
      
      Differential Revision: https://phabricator.haskell.org/D2973
      89a3241f
  6. 07 Mar, 2017 1 commit
  7. 17 Oct, 2016 1 commit
  8. 02 Oct, 2016 1 commit
  9. 31 Aug, 2016 1 commit
    • Peter Trommler's avatar
      PPC NCG: Implement minimal stack frame header. · 010b07aa
      Peter Trommler authored
      According to the ABI specifications a minimal stack frame consists
      of a header and a minimum size parameter save area. We reserve the
      minimal size for each ABI.
      
      On PowerPC 64-bil Linux and AIX the parameter save area can accomodate
      up to eight parameters. So calls with eight parameters and fewer
      can be done without allocating a new stack frame and deallocating
      that stack frame after the call. On AIX one additional spill slot
      is available on the stack.
      
      Code size for all nofib benchmarks is 0.3 % smaller on powerpc64.
      
      Test Plan: validate on AIX
      
      Reviewers: hvr!, erikd, austin, simonmar, bgamari
      
      Reviewed By: bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2445
      010b07aa
  10. 18 Jun, 2016 2 commits
    • Peter Trommler's avatar
      PPC NCG: Fix and refactor TOC handling. · f4b0488d
      Peter Trommler authored
      In a call to a fixed function the TOC does not need to be saved.
      The linker handles TOC saving.
      
      Refactor TOC handling by folding the two functions toc_before and
      toc_after into the code generating the call sequence. This saves
      repeating the case distinction in those two functions.
      
      Test Plan: validate on PowerPC 32-bit Linux and AIX
      
      Reviewers: hvr, simonmar, austin, erikd, bgamari
      
      Reviewed By: bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2328
      f4b0488d
    • Peter Trommler's avatar
      PPC NCG: Fix float parameter passing on 64-bit. · 2897be77
      Peter Trommler authored
      On Linux 64-bit PowerPC the first 13 floating point parameters are
      passed in registers. We only passed the first 8 floating point params.
      
      The alignment of a floating point single precision value in ELF v1.9 is
      the second word of a doubleword. For ELF v2 we support only little
      endian and the least significant word of a doubleword is the first word,
      so no special handling is required.
      
      Add a regression test.
      
      Test Plan: validate on powerpc Linux and AIX
      
      Reviewers: erikd, hvr, austin, simonmar, bgamari
      
      Reviewed By: simonmar
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2327
      
      GHC Trac Issues: #12134
      2897be77
  11. 29 Apr, 2016 1 commit
    • Peter Trommler's avatar
      PPC NCG: Improve pointer de-tagging code · b725fe0a
      Peter Trommler authored
      Generate a clrr[wd]i instruction to clear the tag bits in a pointer.
      This saves one instruction and one temporary register.
      
      Optimize signed comparison with zero after andi. operation This saves
      one instruction when comparing a pointer tag with zero.
      
      This reduces code size by 0.6 % in all nofib benchmarks.
      
      Test Plan: validate on AIX and 32-bit Linux
      
      Reviewed By: erikd, hvr
      
      Differential Revision: https://phabricator.haskell.org/D2093
      b725fe0a
  12. 24 Mar, 2016 2 commits
    • Herbert Valerio Riedel's avatar
      Remove code-duplication in the PPC NCG · 4dc88356
      Herbert Valerio Riedel authored
      Reviewed By: bgamari, trommler
      
      Differential Revision: https://phabricator.haskell.org/D2020
      4dc88356
    • Herbert Valerio Riedel's avatar
      Add NCG support for AIX/ppc32 · df26b955
      Herbert Valerio Riedel authored
      This extends the previous work to revive the unregisterised GHC build
      for AIX/ppc32. Strictly speaking, AIX runs on POWER4 (and later)
      hardware, but the PPC32 instructions implemented in the PPC NCG
      represent a compatible subset of the POWER4 ISA.
      
      IBM AIX follows the PowerOpen ABI (and shares many similiarites with the
      Linux PPC64 ELF V1 NCG backend) but uses the rather limited XCOFF
      format (compared to ELF).
      
      This doesn't support dynamic libraries yet.
      
      A major limiting factor is that the AIX assembler does not support the
      `@ha`/`@l` relocation types nor the ha16()/lo16() functions Darwin's
      assembler supports. Therefore we need to avoid emitting those. In case
      of numeric literals we simply compute the functions ourselves, while for
      labels we have to use local TOCs and hope everything fits into a 16bit
      offset (for ppc32 this gives us at most 16384 entries per TOC section,
      which is enough to compile GHC).
      
      Another issue is that XCOFF doesn't seem to have a relocation type for
      label-differences, and therefore the label-differences placed into
      tables-next-to-code can't be relocated, but the linker may rearrange
      different sections, so we need to place all read-only sections into the
      same `.text[PR]` section to workaround this.
      
      Finally, the PowerOpen ABI distinguishes between function-descriptors
      and actualy entry-point addresses. For AIX we need to be specific when
      emitting assembler code whether we want the address of the function
      descriptor `printf`) or for the entry-point (`.printf`). So we let the
      asm pretty-printer prefix a dot to all emitted subroutine
      calls (i.e. `BL`) on AIX only. For now, STG routines' entry-point labels
      are not prefixed by a label and don't have any associated
      function-descriptor.
      
      Reviewers: austin, trommler, erikd, bgamari
      
      Reviewed By: trommler, erikd, bgamari
      
      Differential Revision: https://phabricator.haskell.org/D2019
      df26b955
  13. 12 Nov, 2015 1 commit
    • olsner's avatar
      Implement function-sections for Haskell code, #8405 · 4a32bf92
      olsner authored
      This adds a flag -split-sections that does similar things to
      -split-objs, but using sections in single object files instead of
      relying on the Satanic Splitter and other abominations. This is very
      similar to the GCC flags -ffunction-sections and -fdata-sections.
      
      The --gc-sections linker flag, which allows unused sections to actually
      be removed, is added to all link commands (if the linker supports it) so
      that space savings from having base compiled with sections can be
      realized.
      
      Supported both in LLVM and the native code-gen, in theory for all
      architectures, but really tested on x86 only.
      
      In the GHC build, a new SplitSections variable enables -split-sections
      for relevant parts of the build.
      
      Test Plan: validate with both settings of SplitSections
      
      Reviewers: dterei, Phyx, austin, simonmar, thomie, bgamari
      
      Reviewed By: simonmar, thomie, bgamari
      
      Subscribers: hsyl20, erikd, kgardas, thomie
      
      Differential Revision: https://phabricator.haskell.org/D1242
      
      GHC Trac Issues: #8405
      4a32bf92
  14. 31 Oct, 2015 1 commit
  15. 23 Sep, 2015 1 commit
    • Simon Marlow's avatar
      Annotate CmmBranch with an optional likely target · 939a7d63
      Simon Marlow authored
      Summary:
      This allows the code generator to give hints to later code generation
      steps about which branch is most likely to be taken.  Right now it
      is only taken into account in one place: a special case in
      CmmContFlowOpt that swapped branches over to maximise the chance of
      fallthrough, which is now disabled when there is a likelihood setting.
      
      Test Plan: validate
      
      Reviewers: austin, simonpj, bgamari, ezyang, tibbe
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D1273
      939a7d63
  16. 07 Jul, 2015 1 commit
  17. 03 Jul, 2015 1 commit
    • Peter Trommler's avatar
      Implement PowerPC 64-bit native code backend for Linux · d3c1dda6
      Peter Trommler authored
      Extend the PowerPC 32-bit native code generator for "64-bit
      PowerPC ELF Application Binary Interface Supplement 1.9" by
      Ian Lance Taylor and "Power Architecture 64-Bit ELF V2 ABI Specification --
      OpenPOWER ABI for Linux Supplement" by IBM.
      The latter ABI is mainly used on POWER7/7+ and POWER8
      Linux systems running in little-endian mode. The code generator
      supports both static and dynamic linking. PowerPC 64-bit
      code for ELF ABI 1.9 and 2 is mostly position independent
      anyway, and thus so is all the code emitted by the code
      generator. In other words, -fPIC does not make a difference.
      
      rts/stg/SMP.h support is implemented.
      
      Following the spirit of the introductory comment in
      PPC/CodeGen.hs, the rest of the code is a straightforward
      extension of the 32-bit implementation.
      
      Limitations:
      * Code is generated only in the medium code model, which
        is also gcc's default
      * Local symbols are not accessed directly, which seems to
        also be the case for 32-bit
      * LLVM does not work, but this does not work on 32-bit either
      * Must use the system runtime linker in GHCi, because the
        GHC linker for "static" object files (rts/Linker.c) for
        PPC 64-bit is not implemented. The system runtime
        (dynamic) linker works.
      * The handling of the system stack (register 1) is not ELF-
        compliant so stack traces break. Instead of allocating a new
        stack frame, spill code should use the "official" spill area
        in the current stack frame and deallocation code should restore
        the back chain
      * DWARF support is missing
      
      Fixes #9863
      
      Test Plan: validate (on powerpc, too)
      
      Reviewers: simonmar, trofi, erikd, austin
      
      Reviewed By: trofi
      
      Subscribers: bgamari, arnons1, kgardas, thomie
      
      Differential Revision: https://phabricator.haskell.org/D629
      
      GHC Trac Issues: #9863
      d3c1dda6
  18. 16 Jun, 2015 1 commit
  19. 30 Mar, 2015 1 commit
    • Joachim Breitner's avatar
      Refactor the story around switches (#10137) · de1160be
      Joachim Breitner authored
      This re-implements the code generation for case expressions at the Stg →
      Cmm level, both for data type cases as well as for integral literal
      cases. (Cases on float are still treated as before).
      
      The goal is to allow for fancier strategies in implementing them, for a
      cleaner separation of the strategy from the gritty details of Cmm, and
      to run this later than the Common Block Optimization, allowing for one
      way to attack #10124. The new module CmmSwitch contains a number of
      notes explaining this changes. For example, it creates larger
      consecutive jump tables than the previous code, if possible.
      
      nofib shows little significant overall improvement of runtime. The
      rather large wobbling comes from changes in the code block order
      (see #8082, not much we can do about it). But the decrease in code size
      alone makes this worthwhile.
      
      ```
              Program           Size    Allocs   Runtime   Elapsed  TotalMem
                  Min          -1.8%      0.0%     -6.1%     -6.1%     -2.9%
                  Max          -0.7%     +0.0%     +5.6%     +5.7%     +7.8%
       Geometric Mean          -1.4%     -0.0%     -0.3%     -0.3%     +0.0%
      ```
      
      Compilation time increases slightly:
      ```
              -1 s.d.                -----            -2.0%
              +1 s.d.                -----            +2.5%
              Average                -----            +0.3%
      ```
      
      The test case T783 regresses a lot, but it is the only one exhibiting
      any regression. The cause is the changed order of branches in an
      if-then-else tree, which makes the hoople data flow analysis traverse
      the blocks in a suboptimal order. Reverting that gets rid of this
      regression, but has a consistent, if only very small (+0.2%), negative
      effect on runtime. So I conclude that this test is an extreme outlier
      and no reason to change the code.
      
      Differential Revision: https://phabricator.haskell.org/D720
      de1160be
  20. 10 Feb, 2015 1 commit
  21. 16 Dec, 2014 3 commits
    • Peter Wortmann's avatar
      Add unwind information to Cmm · 711a51ad
      Peter Wortmann authored
      Unwind information allows the debugger to discover more information
      about a program state, by allowing it to "reconstruct" other states of
      the program. In practice, this means that we explain to the debugger
      how to unravel stack frames, which comes down mostly to explaining how
      to find their Sp and Ip register values.
      
      * We declare yet another new constructor for CmmNode - and this time
        there's actually little choice, as unwind information can and will
        change mid-block. We don't actually make use of these capabilities,
        and back-end support would be tricky (generate new labels?), but it
        feels like the right way to do it.
      
      * Even though we only use it for Sp so far, we allow CmmUnwind to specify
        unwind information for any register. This is pretty cheap and could
        come in useful in future.
      
      * We allow full CmmExpr expressions for specifying unwind values. The
        advantage here is that we don't have to make up new syntax, and can e.g.
        use the WDS macro directly. On the other hand, the back-end will now
        have to simplify the expression until it can sensibly be converted
        into DWARF byte code - a process which might fail, yielding NCG panics.
        On the other hand, when you're writing Cmm by hand you really ought to
        know what you're doing.
      
      (From Phabricator D169)
      711a51ad
    • Peter Wortmann's avatar
      Tick scopes · 5fecd767
      Peter Wortmann authored
      This patch solves the scoping problem of CmmTick nodes: If we just put
      CmmTicks into blocks we have no idea what exactly they are meant to
      cover.  Here we introduce tick scopes, which allow us to create
      sub-scopes and merged scopes easily.
      
      Notes:
      
      * Given that the code often passes Cmm around "head-less", we have to
        make sure that its intended scope does not get lost. To keep the amount
        of passing-around to a minimum we define a CmmAGraphScoped type synonym
        here that just bundles the scope with a portion of Cmm to be assembled
        later.
      
      * We introduce new scopes at somewhat random places, aligning with
        getCode calls. This works surprisingly well, but we might have to
        add new scopes into the mix later on if we find things too be too
        coarse-grained.
      
      (From Phabricator D169)
      5fecd767
    • Peter Wortmann's avatar
      Source notes (Cmm support) · 7ceaf96f
      Peter Wortmann authored
      This patch adds CmmTick nodes to Cmm code. This is relatively
      straight-forward, but also not very useful, as many blocks will simply
      end up with no annotations whatosever.
      
      Notes:
      
      * We use this design over, say, putting ticks into the entry node of all
        blocks, as it seems to work better alongside existing optimisations.
        Now granted, the reason for this is that currently GHC's main Cmm
        optimisations seem to mainly reorganize and merge code, so this might
        change in the future.
      
      * We have the Cmm parser generate a few source notes as well. This is
        relatively easy to do - worst part is that it complicates the CmmParse
        implementation a bit.
      
      (From Phabricator D169)
      7ceaf96f
  22. 14 Dec, 2014 1 commit
    • Sergei Trofimovich's avatar
      powerpc: fix and enable shared libraries by default on linux · fa31e8f4
      Sergei Trofimovich authored
      Summary:
      And fix things all the way down to it. Namely:
          - remove 'r30' from free registers, it's an .LCTOC1 register
            for gcc. generated .plt stubs expect it to be initialised.
          - fix PicBase computation, which originally forgot to use 'tmp'
            reg in 'initializePicBase_ppc.fetchPC'
          - mark 'ForeighTarget's as implicitly using 'PicBase' register
            (see comment for details)
          - add 64-bit MO_Sub and test on alloclimit3/4 regtests
          - fix dynamic label offsets to match with .LCTOC1 offset
      Signed-off-by: default avatarSergei Trofimovich <siarheit@google.com>
      
      Test Plan: validate passes equal amount of vanilla/dyn tests
      
      Reviewers: simonmar, erikd, austin
      
      Reviewed By: erikd, austin
      
      Subscribers: carter, thomie
      
      Differential Revision: https://phabricator.haskell.org/D560
      
      GHC Trac Issues: #8024, #9831
      fa31e8f4
  23. 23 Aug, 2014 1 commit
    • rwbarton's avatar
      Add MO_AddIntC, MO_SubIntC MachOps and implement in X86 backend · cfd08a99
      rwbarton authored
      Summary:
      These MachOps are used by addIntC# and subIntC#, which in turn are
      used in integer-gmp when adding or subtracting small Integers. The
      following benchmark shows a ~6% speedup after this commit on x86_64
      (building GHC with BuildFlavour=perf).
      
          {-# LANGUAGE MagicHash #-}
      
          import GHC.Exts
          import Criterion.Main
      
          count :: Int -> Integer
          count (I# n#) = go n# 0
            where go :: Int# -> Integer -> Integer
                  go 0# acc = acc
                  go n# acc = go (n# -# 1#) $! acc + 1
      
          main = defaultMain [bgroup "count"
                                [bench "100" $ whnf count 100]]
      
      Differential Revision: https://phabricator.haskell.org/D140
      cfd08a99
  24. 14 Aug, 2014 1 commit
    • Herbert Valerio Riedel's avatar
      Implement new CLZ and CTZ primops (re #9340) · e0c1767d
      Herbert Valerio Riedel authored
      This implements the new primops
      
        clz#, clz32#, clz64#,
        ctz#, ctz32#, ctz64#
      
      which provide efficient implementations of the popular
      count-leading-zero and count-trailing-zero respectively
      (see testcase for a pure Haskell reference implementation).
      
      On x86, NCG as well as LLVM generates code based on the BSF/BSR
      instructions (which need extra logic to make the 0-case well-defined).
      
      Test Plan: validate and succesful tests on i686 and amd64
      
      Reviewers: rwbarton, simonmar, ezyang, austin
      
      Subscribers: simonmar, relrod, ezyang, carter
      
      Differential Revision: https://phabricator.haskell.org/D144
      
      GHC Trac Issues: #9340
      e0c1767d
  25. 10 Jul, 2014 1 commit
  26. 30 Jun, 2014 1 commit
    • tibbe's avatar
      Re-add more primops for atomic ops on byte arrays · 4ee4ab01
      tibbe authored
      This is the second attempt to add this functionality. The first
      attempt was reverted in 950fcae4, due
      to register allocator failure on x86. Given how the register
      allocator currently works, we don't have enough registers on x86 to
      support cmpxchg using complicated addressing modes. Instead we fall
      back to a simpler addressing mode on x86.
      
      Adds the following primops:
      
       * atomicReadIntArray#
       * atomicWriteIntArray#
       * fetchSubIntArray#
       * fetchOrIntArray#
       * fetchXorIntArray#
       * fetchAndIntArray#
      
      Makes these pre-existing out-of-line primops inline:
      
       * fetchAddIntArray#
       * casIntArray#
      4ee4ab01
  27. 26 Jun, 2014 1 commit
  28. 24 Jun, 2014 1 commit
    • tibbe's avatar
      Add more primops for atomic ops on byte arrays · d8abf85f
      tibbe authored
      Summary:
      Add more primops for atomic ops on byte arrays
      
      Adds the following primops:
      
       * atomicReadIntArray#
       * atomicWriteIntArray#
       * fetchSubIntArray#
       * fetchOrIntArray#
       * fetchXorIntArray#
       * fetchAndIntArray#
      
      Makes these pre-existing out-of-line primops inline:
      
       * fetchAddIntArray#
       * casIntArray#
      d8abf85f
  29. 15 May, 2014 1 commit
    • Herbert Valerio Riedel's avatar
      Add LANGUAGE pragmas to compiler/ source files · 23892440
      Herbert Valerio Riedel authored
      In some cases, the layout of the LANGUAGE/OPTIONS_GHC lines has been
      reorganized, while following the convention, to
      
      - place `{-# LANGUAGE #-}` pragmas at the top of the source file, before
        any `{-# OPTIONS_GHC #-}`-lines.
      
      - Moreover, if the list of language extensions fit into a single
        `{-# LANGUAGE ... -#}`-line (shorter than 80 characters), keep it on one
        line. Otherwise split into `{-# LANGUAGE ... -#}`-lines for each
        individual language extension. In both cases, try to keep the
        enumeration alphabetically ordered.
        (The latter layout is preferable as it's more diff-friendly)
      
      While at it, this also replaces obsolete `{-# OPTIONS ... #-}` pragma
      occurences by `{-# OPTIONS_GHC ... #-}` pragmas.
      23892440
  30. 02 Oct, 2013 1 commit
  31. 17 Jul, 2013 1 commit
  32. 19 Jun, 2013 1 commit
  33. 11 Jun, 2013 1 commit
  34. 09 Jun, 2013 1 commit
    • ian@well-typed.com's avatar
      Add support for byte endian swapping for Word 16/32/64. · 1c5b0511
      ian@well-typed.com authored
      * Exposes bSwap{,16,32,64}# primops
      * Add a new machops MO_BSwap
      * Use a Stg implementation (hs_bswap{16,32,64}) for other implementation
        in NCG.
      * Generate bswap in X86 NCG for 32 and 64 bits, and for 16 bits, bswap+shr
        instead of using xchg.
      * Generate llvm.bswap intrinsics in llvm codegen.
      
      Patch from Vincent Hanquez.
      1c5b0511
  35. 13 May, 2013 1 commit
  36. 01 Feb, 2013 1 commit