1. 03 Jul, 2015 1 commit
    • Peter Trommler's avatar
      Implement PowerPC 64-bit native code backend for Linux · d3c1dda6
      Peter Trommler authored
      Extend the PowerPC 32-bit native code generator for "64-bit
      PowerPC ELF Application Binary Interface Supplement 1.9" by
      Ian Lance Taylor and "Power Architecture 64-Bit ELF V2 ABI Specification --
      OpenPOWER ABI for Linux Supplement" by IBM.
      The latter ABI is mainly used on POWER7/7+ and POWER8
      Linux systems running in little-endian mode. The code generator
      supports both static and dynamic linking. PowerPC 64-bit
      code for ELF ABI 1.9 and 2 is mostly position independent
      anyway, and thus so is all the code emitted by the code
      generator. In other words, -fPIC does not make a difference.
      
      rts/stg/SMP.h support is implemented.
      
      Following the spirit of the introductory comment in
      PPC/CodeGen.hs, the rest of the code is a straightforward
      extension of the 32-bit implementation.
      
      Limitations:
      * Code is generated only in the medium code model, which
        is also gcc's default
      * Local symbols are not accessed directly, which seems to
        also be the case for 32-bit
      * ...
      d3c1dda6
  2. 16 Jun, 2015 1 commit
  3. 11 Jun, 2015 1 commit
    • Peter Wortmann's avatar
      Fix DWARF generation for MinGW (#10468) · a66ef356
      Peter Wortmann authored
      Fortunately this is relatively straightforward - all we need to do is
      switch to a non-ELF-specific way of specifying object file sections and
      make sure that section-relative addresses work correctly. This is enough
      to make "gdb" work on MinGW builds.
      a66ef356
  4. 16 May, 2015 1 commit
  5. 03 Apr, 2015 1 commit
  6. 30 Mar, 2015 1 commit
    • Joachim Breitner's avatar
      Refactor the story around switches (#10137) · de1160be
      Joachim Breitner authored
      This re-implements the code generation for case expressions at the Stg →
      Cmm level, both for data type cases as well as for integral literal
      cases. (Cases on float are still treated as before).
      
      The goal is to allow for fancier strategies in implementing them, for a
      cleaner separation of the strategy from the gritty details of Cmm, and
      to run this later than the Common Block Optimization, allowing for one
      way to attack #10124. The new module CmmSwitch contains a number of
      notes explaining this changes. For example, it creates larger
      consecutive jump tables than the previous code, if possible.
      
      nofib shows little significant overall improvement of runtime. The
      rather large wobbling comes from changes in the code block order
      (see #8082, not much we can do about it). But the decrease in code size
      alone makes this worthwhile.
      
      ```
              Program           Size    Allocs   Runtime   Elapsed  TotalMem
                  Min          -1.8%      0.0%     -6.1%     -6.1%     -2.9%
                  Max          -0.7%     +0.0%     +5.6%     +5.7%     +7.8%
       Geometric Mean          -1.4%     -0.0%     -0.3%     -0.3%     +0.0%
      ```
      
      Compilation time increases slightly:
      ```
              -1 s.d.                -----            -2.0%
              +1 s.d.                -----            +2.5%
              Average                -----            +0.3%
      ```
      
      The test case T783 regresses a lot, but it is the only one exhibiting
      any regression. The cause is the changed order of branches in an
      if-then-else tree, which makes the hoople data flow analysis traverse
      the blocks in a suboptimal order. Reverting that gets rid of this
      regression, but has a consistent, if only very small (+0.2%), negative
      effect on runtime. So I conclude that this test is an extreme outlier
      and no reason to change the code.
      
      Differential Revision: https://phabricator.haskell.org/D720
      de1160be
  7. 10 Feb, 2015 1 commit
  8. 13 Jan, 2015 1 commit
    • Peter Wortmann's avatar
      Dwarf generation fixed pt 2 · 36df0988
      Peter Wortmann authored
      - Don't bracket HsTick expression uneccessarily
      - Generate debug information in UTF8
      - Reduce amount of information generated - we do not currently need
        block information, for example.
      
      Special thanks to slyfox for the reports!
      36df0988
  9. 06 Jan, 2015 1 commit
  10. 19 Dec, 2014 1 commit
    • Peter Wortmann's avatar
      Some Dwarf generation fixes · f85db756
      Peter Wortmann authored
      - Make abbrev offset absolute on Non-Mac systems
      - Add another termination byte at the end of the abbrev section
        (readelf complains)
      - Scope combination was wrong for the simpler cases
      - Shouldn't have a "global/" in front of all scopes
      f85db756
  11. 17 Dec, 2014 3 commits
    • Peter Wortmann's avatar
      Generate DWARF unwind information · edd6d676
      Peter Wortmann authored
      This tells debuggers such as GDB how to "unwind" a program state,
      which allows them to walk the stack up.
      
      Notes:
      
      * The code is quite general, perhaps unnecessarily so. Unless we get
        more unwind information, only the first case of pprSetUnwind will
        get used - and pprUnwindExpr and pprUndefUnwind will never be
        called. It just so happens that this is a point where we can get a
        lot of features cheaply, even if we don't use them.
      
      * When determining what location to show for a return address, most
        debuggers check the map for "rip-1", assuming that's where the
        "call" instruction is. For tables-next-to-code, that happens to
        always be the end of an info table. We therefore cheat a bit here by
        shifting .debug_frame information so it covers the end of the info
        table, as well as generating a .loc directive for the info table
        data.
      
        Debuggers will still show the wrong label for the return address,
        though.  Haven't found a way around that one yet.
      
      (From Phabricator D396)
      edd6d676
    • Peter Wortmann's avatar
      Generate DWARF info section · cc481ec8
      Peter Wortmann authored
      This is where we actually make GHC emit DWARF code. The info section
      contains all the general meta information bits as well as an entry for
      every block of native code.
      
      Notes:
      
      * We need quite a few new labels in order to properly address starts
        and ends of blocks.
      
      * Thanks to Nathan Howell for taking the iniative to get our own Haskell
        language ID for DWARF!
      
      (From Phabricator D396)
      cc481ec8
    • Peter Wortmann's avatar
      Generate .loc/.file directives from source ticks · 64678e9e
      Peter Wortmann authored
      This generates DWARF, albeit indirectly using the assembler. This is
      the easiest (and, apparently, quite standard) method of generating the
      .debug_line DWARF section.
      
      Notes:
      
      * Note we have to make sure that .file directives appear correctly
        before the respective .loc. Right now we ppr them manually, which makes
        them absent from dumps. Fixing this would require .file to become a
        native instruction.
      
      * We have to pass a lot of things around the native code generator. I
        know Ian did quite a bit of refactoring already, but having one common
        monad could *really* simplify things here...
      
      * To support SplitObjcs, we need to emit/reset all DWARF data at every
        split. We use the occassion to move split marker generation to
        cmmNativeGenStream as well, so debug data extraction doesn't have to
        choke on it.
      
      (From Phabricator D396)
      64678e9e
  12. 16 Dec, 2014 4 commits
    • Peter Wortmann's avatar
      Debug data extraction (NCG support) · f46aa733
      Peter Wortmann authored
      The purpose of the Debug module is to collect all required information
      to generate debug information (DWARF etc.) in the back-ends. Our main
      data structure is the "debug block", which carries all information we have
      about a block of code that is going to get produced.
      
      Notes:
      
      * Debug blocks are arranged into a tree according to tick scopes. This
        makes it easier to reason about inheritance rules. Note however that
        tick scopes are not guaranteed to form a tree, which requires us to
        "copy" ticks to not lose them.
      
      * This is also where we decide what source location we regard as
        representing a code block the "best". The heuristic is basically that
        we want the most specific source reference that comes from the same file
        we are currently compiling. This seems to be the most useful choice in
        my experience.
      
      * We are careful to not be too lazy so we don't end up breaking streaming.
        Debug data will be kept alive until the end of codegen, after all.
      
      * We change native assembler dumps to happen right away for every Cmm group.
        This simplifies the code somewhat and is consistent with how pretty much
        all of GHC handles dumps with respect to streamed code.
      
      (From Phabricator D169)
      f46aa733
    • Peter Wortmann's avatar
      Add unwind information to Cmm · 711a51ad
      Peter Wortmann authored
      Unwind information allows the debugger to discover more information
      about a program state, by allowing it to "reconstruct" other states of
      the program. In practice, this means that we explain to the debugger
      how to unravel stack frames, which comes down mostly to explaining how
      to find their Sp and Ip register values.
      
      * We declare yet another new constructor for CmmNode - and this time
        there's actually little choice, as unwind information can and will
        change mid-block. We don't actually make use of these capabilities,
        and back-end support would be tricky (generate new labels?), but it
        feels like the right way to do it.
      
      * Even though we only use it for Sp so far, we allow CmmUnwind to specify
        unwind information for any register. This is pretty cheap and could
        come in useful in future.
      
      * We allow full CmmExpr expressions for specifying unwind values. The
        advantage here is that we don't have to make up new syntax, and can e.g.
        use the WDS macro directly. On the other hand, the back-end will now
        have to simplify the expression until it can sensibly be converted
        into DWARF byte code - a process which might fail, yielding NCG panics.
        On the other hand, when you're writing Cmm by hand you really ought to
        know what you're doing.
      
      (From Phabricator D169)
      711a51ad
    • Peter Wortmann's avatar
      Tick scopes · 5fecd767
      Peter Wortmann authored
      This patch solves the scoping problem of CmmTick nodes: If we just put
      CmmTicks into blocks we have no idea what exactly they are meant to
      cover.  Here we introduce tick scopes, which allow us to create
      sub-scopes and merged scopes easily.
      
      Notes:
      
      * Given that the code often passes Cmm around "head-less", we have to
        make sure that its intended scope does not get lost. To keep the amount
        of passing-around to a minimum we define a CmmAGraphScoped type synonym
        here that just bundles the scope with a portion of Cmm to be assembled
        later.
      
      * We introduce new scopes at somewhat random places, aligning with
        getCode calls. This works surprisingly well, but we might have to
        add new scopes into the mix later on if we find things too be too
        coarse-grained.
      
      (From Phabricator D169)
      5fecd767
    • Peter Wortmann's avatar
      Source notes (Cmm support) · 7ceaf96f
      Peter Wortmann authored
      This patch adds CmmTick nodes to Cmm code. This is relatively
      straight-forward, but also not very useful, as many blocks will simply
      end up with no annotations whatosever.
      
      Notes:
      
      * We use this design over, say, putting ticks into the entry node of all
        blocks, as it seems to work better alongside existing optimisations.
        Now granted, the reason for this is that currently GHC's main Cmm
        optimisations seem to mainly reorganize and merge code, so this might
        change in the future.
      
      * We have the Cmm parser generate a few source notes as well. This is
        relatively easy to do - worst part is that it complicates the CmmParse
        implementation a bit.
      
      (From Phabricator D169)
      7ceaf96f
  13. 14 Dec, 2014 1 commit
    • Sergei Trofimovich's avatar
      powerpc: fix and enable shared libraries by default on linux · fa31e8f4
      Sergei Trofimovich authored
      
      
      Summary:
      And fix things all the way down to it. Namely:
          - remove 'r30' from free registers, it's an .LCTOC1 register
            for gcc. generated .plt stubs expect it to be initialised.
          - fix PicBase computation, which originally forgot to use 'tmp'
            reg in 'initializePicBase_ppc.fetchPC'
          - mark 'ForeighTarget's as implicitly using 'PicBase' register
            (see comment for details)
          - add 64-bit MO_Sub and test on alloclimit3/4 regtests
          - fix dynamic label offsets to match with .LCTOC1 offset
      Signed-off-by: default avatarSergei Trofimovich <siarheit@google.com>
      
      Test Plan: validate passes equal amount of vanilla/dyn tests
      
      Reviewers: simonmar, erikd, austin
      
      Reviewed By: erikd, austin
      
      Subscribers: carter, thomie
      
      Differential Revision: https://phabricator.haskell.org/D560
      
      GHC Trac Issues: #8024, #9831
      fa31e8f4
  14. 30 Nov, 2014 1 commit
  15. 19 Nov, 2014 1 commit
  16. 12 Nov, 2014 1 commit
  17. 04 Nov, 2014 1 commit
  18. 20 Oct, 2014 1 commit
  19. 19 Oct, 2014 1 commit
  20. 18 Oct, 2014 1 commit
    • Herbert Valerio Riedel's avatar
      Implement optimized NCG `MO_Ctz W64` op for i386 (#9340) · 612f3d12
      Herbert Valerio Riedel authored
      Summary:
      This is an optimization to the CTZ primops introduced for #9340
      
      Previously we called out to `hs_ctz64`, but we can actually generate
      better hand-tuned code while avoiding the FFI ccall.
      
      With this patch, the code
      
        {-# LANGUAGE MagicHash #-}
        module TestClz0 where
        import GHC.Prim
        ctz64 :: Word64# -> Word#
        ctz64 x = ctz64# x
      
      results in the following assembler generated by NCG on i386:
      
        TestClz.ctz64_info:
            movl (%ebp),%eax
            movl 4(%ebp),%ecx
            movl %ecx,%edx
            orl %eax,%edx
            movl $64,%edx
            je _nAO
      
            bsf %ecx,%ecx
            addl $32,%ecx
            bsf %eax,%eax
            cmovne %eax,%ecx
            movl %ecx,%edx
        _nAO:
            movl %edx,%esi
            addl $8,%ebp
            jmp *(%ebp)
      
      For comparision, here's what LLVM 3.4 currently generates:
      
        000000fc <TestClzz_ctzz64_info>:
          fc:   0f bc 45 04             bsf    0x4(%ebp),%eax
         100:   b9 20 00 00 00          mov    $0x20,%ecx
         105:   0f 45 c8                cmovne %eax,%ecx
         108:   83 c1 20                add    $0x20,%ecx
         10b:   8b 45 00                mov    0x0(%ebp),%eax
         10e:   8b 55 08                mov    0x8(%ebp),%edx
         111:   0f bc f0                bsf    %eax,%esi
         114:   85 c0                   test   %eax,%eax
         116:   0f 44 f1                cmove  %ecx,%esi
         119:   83 c5 08                add    $0x8,%ebp
         11c:   ff e2                   jmp    *%edx
      
      Reviewed By: austin
      
      Auditors: simonmar
      
      Differential Revision: https://phabricator.haskell.org/D163
      612f3d12
  21. 07 Oct, 2014 1 commit
    • rwbarton's avatar
      Code size micro-optimizations in the X86 backend · bdb0c43c
      rwbarton authored
      Summary:
      Carter Schonwald suggested looking for opportunities to replace
      instructions in GHC's output by equivalent ones that are shorter,
      as recommended by the Intel optimization manuals.
      
      This patch reduces the module sizes as reported by nofib
      by about 1.5% on x86_64.
      
      Test Plan:
      Built an i386 cross-compiler and ran the test suite; the same
      (rather large) set of tests failed before and after this commit.
      Will let Harbormaster validate on x86_64.
      
      Reviewers: austin
      
      Subscribers: thomie, carter, ezyang, simonmar
      
      Differential Revision: https://phabricator.haskell.org/D320
      bdb0c43c
  22. 02 Oct, 2014 1 commit
    • Edward Z. Yang's avatar
      Place static closures in their own section. · b23ba2a7
      Edward Z. Yang authored
      Summary:
      The primary reason for doing this is assisting debuggability:
      if static closures are all in the same section, they are
      guaranteed to be adjacent to one another.  This will help
      later when we add some code that takes section start/end and
      uses this to sanity-check the sections.
      
      Part of remove HEAP_ALLOCED patch set (#8199
      
      )
      Signed-off-by: Edward Z. Yang's avatarEdward Z. Yang <ezyang@mit.edu>
      
      Test Plan: validate
      
      Reviewers: simonmar, austin
      
      Subscribers: simonmar, ezyang, carter, thomie
      
      Differential Revision: https://phabricator.haskell.org/D263
      
      GHC Trac Issues: #8199
      b23ba2a7
  23. 27 Sep, 2014 1 commit
    • thomie's avatar
      Stop exporting, and stop using, functions marked as deprecated · 51aa2fa3
      thomie authored
      Don't export `getUs` and `getUniqueUs`. `UniqSM` has a `MonadUnique` instance:
      
          instance MonadUnique UniqSM where
              getUniqueSupplyM = getUs
              getUniqueM  = getUniqueUs
              getUniquesM = getUniquesUs
      
      Commandline-fu used:
      
          git grep -l 'getUs\>' |
              grep -v compiler/basicTypes/UniqSupply.lhs |
              xargs sed -i 's/getUs/getUniqueSupplyM/g
      
          git grep -l 'getUniqueUs\>' |
              grep -v combiler/basicTypes/UniqSupply.lhs |
              xargs sed -i 's/getUniqueUs/getUniqueM/g'
      
      Follow up on b522d3a3
      
      Reviewed By: austin, hvr
      
      Differential Revision: https://phabricator.haskell.org/D220
      51aa2fa3
  24. 26 Sep, 2014 1 commit
  25. 09 Sep, 2014 1 commit
    • Austin Seipp's avatar
      Make Applicative a superclass of Monad · d94de872
      Austin Seipp authored
      
      
      Summary:
      This includes pretty much all the changes needed to make `Applicative`
      a superclass of `Monad` finally. There's mostly reshuffling in the
      interests of avoid orphans and boot files, but luckily we can resolve
      all of them, pretty much. The only catch was that
      Alternative/MonadPlus also had to go into Prelude to avoid this.
      
      As a result, we must update the hsc2hs and haddock submodules.
      Signed-off-by: default avatarAustin Seipp <austin@well-typed.com>
      
      Test Plan: Build things, they might not explode horribly.
      
      Reviewers: hvr, simonmar
      
      Subscribers: simonmar
      
      Differential Revision: https://phabricator.haskell.org/D13
      d94de872
  26. 31 Aug, 2014 1 commit
  27. 23 Aug, 2014 1 commit
    • rwbarton's avatar
      Add MO_AddIntC, MO_SubIntC MachOps and implement in X86 backend · cfd08a99
      rwbarton authored
      Summary:
      These MachOps are used by addIntC# and subIntC#, which in turn are
      used in integer-gmp when adding or subtracting small Integers. The
      following benchmark shows a ~6% speedup after this commit on x86_64
      (building GHC with BuildFlavour=perf).
      
          {-# LANGUAGE MagicHash #-}
      
          import GHC.Exts
          import Criterion.Main
      
          count :: Int -> Integer
          count (I# n#) = go n# 0
            where go :: Int# -> Integer -> Integer
                  go 0# acc = acc
                  go n# acc = go (n# -# 1#) $! acc + 1
      
          main = defaultMain [bgroup "count"
                                [bench "100" $ whnf count 100]]
      
      Differential Revision: https://phabricator.haskell.org/D140
      cfd08a99
  28. 14 Aug, 2014 1 commit
    • Herbert Valerio Riedel's avatar
      Implement new CLZ and CTZ primops (re #9340) · e0c1767d
      Herbert Valerio Riedel authored
      This implements the new primops
      
        clz#, clz32#, clz64#,
        ctz#, ctz32#, ctz64#
      
      which provide efficient implementations of the popular
      count-leading-zero and count-trailing-zero respectively
      (see testcase for a pure Haskell reference implementation).
      
      On x86, NCG as well as LLVM generates code based on the BSF/BSR
      instructions (which need extra logic to make the 0-case well-defined).
      
      Test Plan: validate and succesful tests on i686 and amd64
      
      Reviewers: rwbarton, simonmar, ezyang, austin
      
      Subscribers: simonmar, relrod, ezyang, carter
      
      Differential Revision: https://phabricator.haskell.org/D144
      
      GHC Trac Issues: #9340
      e0c1767d
  29. 12 Aug, 2014 3 commits
    • rwbarton's avatar
      x86: zero extend the result of 16-bit popcnt instructions (#9435) · 64151913
      rwbarton authored
      Summary:
      The 'popcnt r16, r/m16' instruction only writes the low 16 bits of
      the destination register, so we have to zero-extend the result to
      a full word as popCnt16# is supposed to return a Word#.
      
      For popCnt8# we could instead zero-extend the input to 32 bits
      and then do a 32-bit popcnt, and not have to zero-extend the result.
      LLVM produces the 16-bit popcnt sequence with two zero extensions,
      though, and who am I to argue?
      
      Test Plan:
       - ran "make TEST=cgrun071 EXTRA_HC_OPTS=-msse42"
       - then ran again adding "WAY=optasm", and verified that
         the popcnt sequences we generate match the ones produced
         by LLVM for its @llvm.ctpop.* intrinsics
      
      Reviewers: austin, hvr, tibbe
      
      Reviewed By: austin, hvr, tibbe
      
      Subscribers: phaskell, hvr, simonmar, relrod, ezyang, carter
      
      Differential Revision: https://phabricator.haskell.org/D147
      
      GHC Trac Issues: #9435
      64151913
    • Herbert Valerio Riedel's avatar
      Add CMOVcc insns to x86 NCG · 9f285fa4
      Herbert Valerio Riedel authored
      This is a pre-requisite for implementing count-{leading,trailing}-zero
      prim-ops (re #9340) and may be useful to NCG to help turn some code into
      branch-less code sequences.
      
      Test Plan: Compiles and validates in combination with clz/ctz primop impl
      
      Reviewers: ezyang, rwbarton, simonmar, austin
      
      Subscribers: simonmar, relrod, ezyang, carter
      
      Differential Revision: https://phabricator.haskell.org/D141
      9f285fa4
    • Herbert Valerio Riedel's avatar
      Add bit scan {forward,reverse} insns to x86 NCG · 3669b60c
      Herbert Valerio Riedel authored
      This is a pre-requisite for implementing count-{leading,trailing}-zero
      prim-ops (re #9340)
      
      Reviewers: ezyang, rwbarton, simonmar, austin
      
      Subscribers: simonmar, relrod, ezyang, carter
      
      Differential Revision: https://phabricator.haskell.org/D141
      3669b60c
  30. 11 Aug, 2014 1 commit
  31. 10 Aug, 2014 1 commit
    • rwbarton's avatar
      Eliminate some code duplication in x86 backend (genCCall32/64) · c80d2381
      rwbarton authored
      Summary:
      No functional changes except in panic messages.
      
      These functions were identical except for
      - x87 operations in genCCall32
      - the fallback to genCCall32'/64'
      - "32" vs "64" in panic messages (one case was wrong!)
      - minor syntactic or otherwise non-functional differences.
      
      Test Plan:
      Ran "validate --no-dph --slow" before and after the change.
      Only differences were two tests that failed before the change but not after,
      further investigation revealed that those tests are in fact erratic.
      
      Reviewers: simonmar, austin
      
      Reviewed By: austin
      
      Subscribers: phaskell, simonmar, relrod, ezyang, carter
      
      Differential Revision: https://phabricator.haskell.org/D139
      c80d2381
  32. 01 Aug, 2014 1 commit
  33. 31 Jul, 2014 1 commit
    • Simon Marlow's avatar
      Allow multiple entry points when allocating recursive groups (#9303) · da70f9ef
      Simon Marlow authored
      Summary:
      In this example we ended up with some code that was only reachable via
      an info table, because a branch had been optimised away by the native
      code generator.  The register allocator then got confused because it
      was only considering the first block of the proc to be an entry point,
      when actually any of the info tables are entry points.
      
      Test Plan: validate
      
      Reviewers: simonpj, austin
      
      Subscribers: simonmar, relrod, carter
      
      Differential Revision: https://phabricator.haskell.org/D88
      da70f9ef