1. 07 Oct, 2015 1 commit
  2. 02 Oct, 2015 1 commit
    • Peter Trommler's avatar
      nativeGen PPC: fix > 16 bit offsets in stack handling · b29f20ed
      Peter Trommler authored
      Implement access to spill slots at offsets larger than 16 bits.
      Also allocation and deallocation of spill slots was restricted to
      16 bit offsets. Now 32 bit offsets are supported on all PowerPC
      platforms.
      
      The implementation of 32 bit offsets requires more than one instruction
      but the native code generator wants one instruction. So we implement
      pseudo-instructions that are pretty printed into multiple assembly
      instructions.
      
      With pseudo-instructions for spill slot allocation and deallocation
      we can also implement handling of the back chain pointer according
      to the ELF ABIs.
      
      Test Plan: validate (especially on powerpc (32 bit))
      
      Reviewers: bgamari, austin, erikd
      
      Reviewed By: erikd
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D1296
      
      GHC Trac Issues: #7830
      b29f20ed
  3. 25 Sep, 2015 2 commits
  4. 23 Sep, 2015 1 commit
    • Simon Marlow's avatar
      Annotate CmmBranch with an optional likely target · 939a7d63
      Simon Marlow authored
      Summary:
      This allows the code generator to give hints to later code generation
      steps about which branch is most likely to be taken.  Right now it
      is only taken into account in one place: a special case in
      CmmContFlowOpt that swapped branches over to maximise the chance of
      fallthrough, which is now disabled when there is a likelihood setting.
      
      Test Plan: validate
      
      Reviewers: austin, simonpj, bgamari, ezyang, tibbe
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D1273
      939a7d63
  5. 12 Sep, 2015 1 commit
  6. 29 Aug, 2015 3 commits
  7. 21 Aug, 2015 2 commits
    • thomie's avatar
      Refactor: delete most of the module FastTypes · 2f29ebbb
      thomie authored
      This reverses some of the work done in #1405, and goes back to the
      assumption that the bootstrap compiler understands GHC-haskell.
      
      In particular:
        * use MagicHash instead of _ILIT and _CLIT
        * pattern matching on I# if possible, instead of using iUnbox
          unnecessarily
        * use Int#/Char#/Addr# instead of the following type synonyms:
          - type FastInt   = Int#
          - type FastChar  = Char#
          - type FastPtr a = Addr#
        * inline the following functions:
          - iBox           = I#
          - cBox           = C#
          - fastChr        = chr#
          - fastOrd        = ord#
          - eqFastChar     = eqChar#
          - shiftLFastInt  = uncheckedIShiftL#
          - shiftR_FastInt = uncheckedIShiftRL#
          - shiftRLFastInt = uncheckedIShiftRL#
        * delete the following unused functions:
          - minFastInt
          - maxFastInt
          - uncheckedIShiftRA#
          - castFastPtr
          - panicDocFastInt and pprPanicFastInt
        * rename panicFastInt back to panic#
      
      These functions remain, since they actually do something:
        * iUnbox
        * bitAndFastInt
        * bitOrFastInt
      
      Test Plan: validate
      
      Reviewers: austin, bgamari
      
      Subscribers: rwbarton
      
      Differential Revision: https://phabricator.haskell.org/D1141
      
      GHC Trac Issues: #1405
      2f29ebbb
    • thomie's avatar
      Delete FastBool · 3452473b
      thomie authored
      This reverses some of the work done in Trac #1405, and assumes GHC is
      smart enough to do its own unboxing of booleans now.
      
      I would like to do some more performance measurements, but the code
      changes can be reviewed already.
      
      Test Plan:
      With a perf build:
      ./inplace/bin/ghc-stage2 nofib/spectral/simple/Main.hs -fforce-recomp
      +RTS -t --machine-readable
      
      before:
      ```
        [("bytes allocated", "1300744864")
        ,("num_GCs", "302")
        ,("average_bytes_used", "8811118")
        ,("max_bytes_used", "24477464")
        ,("num_byte_usage_samples", "9")
        ,("peak_megabytes_allocated", "64")
        ,("init_cpu_seconds", "0.001")
        ,("init_wall_seconds", "0.001")
        ,("mutator_cpu_seconds", "2.833")
        ,("mutator_wall_seconds", "4.283")
        ,("GC_cpu_seconds", "0.960")
        ,("GC_wall_seconds", "0.961")
        ]
      ```
      
      after:
      ```
        [("bytes allocated", "1301088064")
        ,("num_GCs", "310")
        ,("average_bytes_used", "8820253")
        ,("max_bytes_used", "24539904")
        ,("num_byte_usage_samples", "9")
        ,("peak_megabytes_allocated", "64")
        ,("init_cpu_seconds", "0.001")
        ,("init_wall_seconds", "0.001")
        ,("mutator_cpu_seconds", "2.876")
        ,("mutator_wall_seconds", "4.474")
        ,("GC_cpu_seconds", "0.965")
        ,("GC_wall_seconds", "0.979")
        ]
      ```
      
      CPU time seems to be up a bit, but I'm not sure. Unfortunately CPU time
      measurements are rather noisy.
      
      Reviewers: austin, bgamari, rwbarton
      
      Subscribers: nomeata
      
      Differential Revision: https://phabricator.haskell.org/D1143
      
      GHC Trac Issues: #1405
      3452473b
  8. 30 Jul, 2015 1 commit
  9. 07 Jul, 2015 1 commit
  10. 03 Jul, 2015 1 commit
    • Peter Trommler's avatar
      Implement PowerPC 64-bit native code backend for Linux · d3c1dda6
      Peter Trommler authored
      Extend the PowerPC 32-bit native code generator for "64-bit
      PowerPC ELF Application Binary Interface Supplement 1.9" by
      Ian Lance Taylor and "Power Architecture 64-Bit ELF V2 ABI Specification --
      OpenPOWER ABI for Linux Supplement" by IBM.
      The latter ABI is mainly used on POWER7/7+ and POWER8
      Linux systems running in little-endian mode. The code generator
      supports both static and dynamic linking. PowerPC 64-bit
      code for ELF ABI 1.9 and 2 is mostly position independent
      anyway, and thus so is all the code emitted by the code
      generator. In other words, -fPIC does not make a difference.
      
      rts/stg/SMP.h support is implemented.
      
      Following the spirit of the introductory comment in
      PPC/CodeGen.hs, the rest of the code is a straightforward
      extension of the 32-bit implementation.
      
      Limitations:
      * Code is generated only in the medium code model, which
        is also gcc's default
      * Local symbols are not accessed directly, which seems to
        also be the case for 32-bit
      * LLVM does not work, but this does not work on 32-bit either
      * Must use the system runtime linker in GHCi, because the
        GHC linker for "static" object files (rts/Linker.c) for
        PPC 64-bit is not implemented. The system runtime
        (dynamic) linker works.
      * The handling of the system stack (register 1) is not ELF-
        compliant so stack traces break. Instead of allocating a new
        stack frame, spill code should use the "official" spill area
        in the current stack frame and deallocation code should restore
        the back chain
      * DWARF support is missing
      
      Fixes #9863
      
      Test Plan: validate (on powerpc, too)
      
      Reviewers: simonmar, trofi, erikd, austin
      
      Reviewed By: trofi
      
      Subscribers: bgamari, arnons1, kgardas, thomie
      
      Differential Revision: https://phabricator.haskell.org/D629
      
      GHC Trac Issues: #9863
      d3c1dda6
  11. 16 Jun, 2015 1 commit
  12. 11 Jun, 2015 1 commit
    • Peter Wortmann's avatar
      Fix DWARF generation for MinGW (#10468) · a66ef356
      Peter Wortmann authored
      Fortunately this is relatively straightforward - all we need to do is
      switch to a non-ELF-specific way of specifying object file sections and
      make sure that section-relative addresses work correctly. This is enough
      to make "gdb" work on MinGW builds.
      a66ef356
  13. 16 May, 2015 1 commit
  14. 03 Apr, 2015 1 commit
  15. 30 Mar, 2015 1 commit
    • Joachim Breitner's avatar
      Refactor the story around switches (#10137) · de1160be
      Joachim Breitner authored
      This re-implements the code generation for case expressions at the Stg →
      Cmm level, both for data type cases as well as for integral literal
      cases. (Cases on float are still treated as before).
      
      The goal is to allow for fancier strategies in implementing them, for a
      cleaner separation of the strategy from the gritty details of Cmm, and
      to run this later than the Common Block Optimization, allowing for one
      way to attack #10124. The new module CmmSwitch contains a number of
      notes explaining this changes. For example, it creates larger
      consecutive jump tables than the previous code, if possible.
      
      nofib shows little significant overall improvement of runtime. The
      rather large wobbling comes from changes in the code block order
      (see #8082, not much we can do about it). But the decrease in code size
      alone makes this worthwhile.
      
      ```
              Program           Size    Allocs   Runtime   Elapsed  TotalMem
                  Min          -1.8%      0.0%     -6.1%     -6.1%     -2.9%
                  Max          -0.7%     +0.0%     +5.6%     +5.7%     +7.8%
       Geometric Mean          -1.4%     -0.0%     -0.3%     -0.3%     +0.0%
      ```
      
      Compilation time increases slightly:
      ```
              -1 s.d.                -----            -2.0%
              +1 s.d.                -----            +2.5%
              Average                -----            +0.3%
      ```
      
      The test case T783 regresses a lot, but it is the only one exhibiting
      any regression. The cause is the changed order of branches in an
      if-then-else tree, which makes the hoople data flow analysis traverse
      the blocks in a suboptimal order. Reverting that gets rid of this
      regression, but has a consistent, if only very small (+0.2%), negative
      effect on runtime. So I conclude that this test is an extreme outlier
      and no reason to change the code.
      
      Differential Revision: https://phabricator.haskell.org/D720
      de1160be
  16. 10 Feb, 2015 1 commit
  17. 13 Jan, 2015 1 commit
    • Peter Wortmann's avatar
      Dwarf generation fixed pt 2 · 36df0988
      Peter Wortmann authored
      - Don't bracket HsTick expression uneccessarily
      - Generate debug information in UTF8
      - Reduce amount of information generated - we do not currently need
        block information, for example.
      
      Special thanks to slyfox for the reports!
      36df0988
  18. 06 Jan, 2015 1 commit
  19. 19 Dec, 2014 1 commit
    • Peter Wortmann's avatar
      Some Dwarf generation fixes · f85db756
      Peter Wortmann authored
      - Make abbrev offset absolute on Non-Mac systems
      - Add another termination byte at the end of the abbrev section
        (readelf complains)
      - Scope combination was wrong for the simpler cases
      - Shouldn't have a "global/" in front of all scopes
      f85db756
  20. 17 Dec, 2014 3 commits
    • Peter Wortmann's avatar
      Generate DWARF unwind information · edd6d676
      Peter Wortmann authored
      This tells debuggers such as GDB how to "unwind" a program state,
      which allows them to walk the stack up.
      
      Notes:
      
      * The code is quite general, perhaps unnecessarily so. Unless we get
        more unwind information, only the first case of pprSetUnwind will
        get used - and pprUnwindExpr and pprUndefUnwind will never be
        called. It just so happens that this is a point where we can get a
        lot of features cheaply, even if we don't use them.
      
      * When determining what location to show for a return address, most
        debuggers check the map for "rip-1", assuming that's where the
        "call" instruction is. For tables-next-to-code, that happens to
        always be the end of an info table. We therefore cheat a bit here by
        shifting .debug_frame information so it covers the end of the info
        table, as well as generating a .loc directive for the info table
        data.
      
        Debuggers will still show the wrong label for the return address,
        though.  Haven't found a way around that one yet.
      
      (From Phabricator D396)
      edd6d676
    • Peter Wortmann's avatar
      Generate DWARF info section · cc481ec8
      Peter Wortmann authored
      This is where we actually make GHC emit DWARF code. The info section
      contains all the general meta information bits as well as an entry for
      every block of native code.
      
      Notes:
      
      * We need quite a few new labels in order to properly address starts
        and ends of blocks.
      
      * Thanks to Nathan Howell for taking the iniative to get our own Haskell
        language ID for DWARF!
      
      (From Phabricator D396)
      cc481ec8
    • Peter Wortmann's avatar
      Generate .loc/.file directives from source ticks · 64678e9e
      Peter Wortmann authored
      This generates DWARF, albeit indirectly using the assembler. This is
      the easiest (and, apparently, quite standard) method of generating the
      .debug_line DWARF section.
      
      Notes:
      
      * Note we have to make sure that .file directives appear correctly
        before the respective .loc. Right now we ppr them manually, which makes
        them absent from dumps. Fixing this would require .file to become a
        native instruction.
      
      * We have to pass a lot of things around the native code generator. I
        know Ian did quite a bit of refactoring already, but having one common
        monad could *really* simplify things here...
      
      * To support SplitObjcs, we need to emit/reset all DWARF data at every
        split. We use the occassion to move split marker generation to
        cmmNativeGenStream as well, so debug data extraction doesn't have to
        choke on it.
      
      (From Phabricator D396)
      64678e9e
  21. 16 Dec, 2014 4 commits
    • Peter Wortmann's avatar
      Debug data extraction (NCG support) · f46aa733
      Peter Wortmann authored
      The purpose of the Debug module is to collect all required information
      to generate debug information (DWARF etc.) in the back-ends. Our main
      data structure is the "debug block", which carries all information we have
      about a block of code that is going to get produced.
      
      Notes:
      
      * Debug blocks are arranged into a tree according to tick scopes. This
        makes it easier to reason about inheritance rules. Note however that
        tick scopes are not guaranteed to form a tree, which requires us to
        "copy" ticks to not lose them.
      
      * This is also where we decide what source location we regard as
        representing a code block the "best". The heuristic is basically that
        we want the most specific source reference that comes from the same file
        we are currently compiling. This seems to be the most useful choice in
        my experience.
      
      * We are careful to not be too lazy so we don't end up breaking streaming.
        Debug data will be kept alive until the end of codegen, after all.
      
      * We change native assembler dumps to happen right away for every Cmm group.
        This simplifies the code somewhat and is consistent with how pretty much
        all of GHC handles dumps with respect to streamed code.
      
      (From Phabricator D169)
      f46aa733
    • Peter Wortmann's avatar
      Add unwind information to Cmm · 711a51ad
      Peter Wortmann authored
      Unwind information allows the debugger to discover more information
      about a program state, by allowing it to "reconstruct" other states of
      the program. In practice, this means that we explain to the debugger
      how to unravel stack frames, which comes down mostly to explaining how
      to find their Sp and Ip register values.
      
      * We declare yet another new constructor for CmmNode - and this time
        there's actually little choice, as unwind information can and will
        change mid-block. We don't actually make use of these capabilities,
        and back-end support would be tricky (generate new labels?), but it
        feels like the right way to do it.
      
      * Even though we only use it for Sp so far, we allow CmmUnwind to specify
        unwind information for any register. This is pretty cheap and could
        come in useful in future.
      
      * We allow full CmmExpr expressions for specifying unwind values. The
        advantage here is that we don't have to make up new syntax, and can e.g.
        use the WDS macro directly. On the other hand, the back-end will now
        have to simplify the expression until it can sensibly be converted
        into DWARF byte code - a process which might fail, yielding NCG panics.
        On the other hand, when you're writing Cmm by hand you really ought to
        know what you're doing.
      
      (From Phabricator D169)
      711a51ad
    • Peter Wortmann's avatar
      Tick scopes · 5fecd767
      Peter Wortmann authored
      This patch solves the scoping problem of CmmTick nodes: If we just put
      CmmTicks into blocks we have no idea what exactly they are meant to
      cover.  Here we introduce tick scopes, which allow us to create
      sub-scopes and merged scopes easily.
      
      Notes:
      
      * Given that the code often passes Cmm around "head-less", we have to
        make sure that its intended scope does not get lost. To keep the amount
        of passing-around to a minimum we define a CmmAGraphScoped type synonym
        here that just bundles the scope with a portion of Cmm to be assembled
        later.
      
      * We introduce new scopes at somewhat random places, aligning with
        getCode calls. This works surprisingly well, but we might have to
        add new scopes into the mix later on if we find things too be too
        coarse-grained.
      
      (From Phabricator D169)
      5fecd767
    • Peter Wortmann's avatar
      Source notes (Cmm support) · 7ceaf96f
      Peter Wortmann authored
      This patch adds CmmTick nodes to Cmm code. This is relatively
      straight-forward, but also not very useful, as many blocks will simply
      end up with no annotations whatosever.
      
      Notes:
      
      * We use this design over, say, putting ticks into the entry node of all
        blocks, as it seems to work better alongside existing optimisations.
        Now granted, the reason for this is that currently GHC's main Cmm
        optimisations seem to mainly reorganize and merge code, so this might
        change in the future.
      
      * We have the Cmm parser generate a few source notes as well. This is
        relatively easy to do - worst part is that it complicates the CmmParse
        implementation a bit.
      
      (From Phabricator D169)
      7ceaf96f
  22. 14 Dec, 2014 1 commit
    • Sergei Trofimovich's avatar
      powerpc: fix and enable shared libraries by default on linux · fa31e8f4
      Sergei Trofimovich authored
      Summary:
      And fix things all the way down to it. Namely:
          - remove 'r30' from free registers, it's an .LCTOC1 register
            for gcc. generated .plt stubs expect it to be initialised.
          - fix PicBase computation, which originally forgot to use 'tmp'
            reg in 'initializePicBase_ppc.fetchPC'
          - mark 'ForeighTarget's as implicitly using 'PicBase' register
            (see comment for details)
          - add 64-bit MO_Sub and test on alloclimit3/4 regtests
          - fix dynamic label offsets to match with .LCTOC1 offset
      Signed-off-by: default avatarSergei Trofimovich <siarheit@google.com>
      
      Test Plan: validate passes equal amount of vanilla/dyn tests
      
      Reviewers: simonmar, erikd, austin
      
      Reviewed By: erikd, austin
      
      Subscribers: carter, thomie
      
      Differential Revision: https://phabricator.haskell.org/D560
      
      GHC Trac Issues: #8024, #9831
      fa31e8f4
  23. 30 Nov, 2014 1 commit
  24. 19 Nov, 2014 1 commit
  25. 12 Nov, 2014 1 commit
  26. 04 Nov, 2014 1 commit
  27. 20 Oct, 2014 1 commit
  28. 19 Oct, 2014 1 commit
  29. 18 Oct, 2014 1 commit
    • Herbert Valerio Riedel's avatar
      Implement optimized NCG `MO_Ctz W64` op for i386 (#9340) · 612f3d12
      Herbert Valerio Riedel authored
      Summary:
      This is an optimization to the CTZ primops introduced for #9340
      
      Previously we called out to `hs_ctz64`, but we can actually generate
      better hand-tuned code while avoiding the FFI ccall.
      
      With this patch, the code
      
        {-# LANGUAGE MagicHash #-}
        module TestClz0 where
        import GHC.Prim
        ctz64 :: Word64# -> Word#
        ctz64 x = ctz64# x
      
      results in the following assembler generated by NCG on i386:
      
        TestClz.ctz64_info:
            movl (%ebp),%eax
            movl 4(%ebp),%ecx
            movl %ecx,%edx
            orl %eax,%edx
            movl $64,%edx
            je _nAO
      
            bsf %ecx,%ecx
            addl $32,%ecx
            bsf %eax,%eax
            cmovne %eax,%ecx
            movl %ecx,%edx
        _nAO:
            movl %edx,%esi
            addl $8,%ebp
            jmp *(%ebp)
      
      For comparision, here's what LLVM 3.4 currently generates:
      
        000000fc <TestClzz_ctzz64_info>:
          fc:   0f bc 45 04             bsf    0x4(%ebp),%eax
         100:   b9 20 00 00 00          mov    $0x20,%ecx
         105:   0f 45 c8                cmovne %eax,%ecx
         108:   83 c1 20                add    $0x20,%ecx
         10b:   8b 45 00                mov    0x0(%ebp),%eax
         10e:   8b 55 08                mov    0x8(%ebp),%edx
         111:   0f bc f0                bsf    %eax,%esi
         114:   85 c0                   test   %eax,%eax
         116:   0f 44 f1                cmove  %ecx,%esi
         119:   83 c5 08                add    $0x8,%ebp
         11c:   ff e2                   jmp    *%edx
      
      Reviewed By: austin
      
      Auditors: simonmar
      
      Differential Revision: https://phabricator.haskell.org/D163
      612f3d12
  30. 07 Oct, 2014 1 commit
    • rwbarton's avatar
      Code size micro-optimizations in the X86 backend · bdb0c43c
      rwbarton authored
      Summary:
      Carter Schonwald suggested looking for opportunities to replace
      instructions in GHC's output by equivalent ones that are shorter,
      as recommended by the Intel optimization manuals.
      
      This patch reduces the module sizes as reported by nofib
      by about 1.5% on x86_64.
      
      Test Plan:
      Built an i386 cross-compiler and ran the test suite; the same
      (rather large) set of tests failed before and after this commit.
      Will let Harbormaster validate on x86_64.
      
      Reviewers: austin
      
      Subscribers: thomie, carter, ezyang, simonmar
      
      Differential Revision: https://phabricator.haskell.org/D320
      bdb0c43c
  31. 02 Oct, 2014 1 commit
    • Edward Z. Yang's avatar
      Place static closures in their own section. · b23ba2a7
      Edward Z. Yang authored
      Summary:
      The primary reason for doing this is assisting debuggability:
      if static closures are all in the same section, they are
      guaranteed to be adjacent to one another.  This will help
      later when we add some code that takes section start/end and
      uses this to sanity-check the sections.
      
      Part of remove HEAP_ALLOCED patch set (#8199)
      Signed-off-by: Edward Z. Yang's avatarEdward Z. Yang <ezyang@mit.edu>
      
      Test Plan: validate
      
      Reviewers: simonmar, austin
      
      Subscribers: simonmar, ezyang, carter, thomie
      
      Differential Revision: https://phabricator.haskell.org/D263
      
      GHC Trac Issues: #8199
      b23ba2a7