1. 17 Dec, 2014 1 commit
    • Peter Wortmann's avatar
      Generate .loc/.file directives from source ticks · 64678e9e
      Peter Wortmann authored
      This generates DWARF, albeit indirectly using the assembler. This is
      the easiest (and, apparently, quite standard) method of generating the
      .debug_line DWARF section.
      * Note we have to make sure that .file directives appear correctly
        before the respective .loc. Right now we ppr them manually, which makes
        them absent from dumps. Fixing this would require .file to become a
        native instruction.
      * We have to pass a lot of things around the native code generator. I
        know Ian did quite a bit of refactoring already, but having one common
        monad could *really* simplify things here...
      * To support SplitObjcs, we need to emit/reset all DWARF data at every
        split. We use the occassion to move split marker generation to
        cmmNativeGenStream as well, so debug data extraction doesn't have to
        choke on it.
      (From Phabricator D396)
  2. 16 Dec, 2014 3 commits
    • Peter Wortmann's avatar
      Add unwind information to Cmm · 711a51ad
      Peter Wortmann authored
      Unwind information allows the debugger to discover more information
      about a program state, by allowing it to "reconstruct" other states of
      the program. In practice, this means that we explain to the debugger
      how to unravel stack frames, which comes down mostly to explaining how
      to find their Sp and Ip register values.
      * We declare yet another new constructor for CmmNode - and this time
        there's actually little choice, as unwind information can and will
        change mid-block. We don't actually make use of these capabilities,
        and back-end support would be tricky (generate new labels?), but it
        feels like the right way to do it.
      * Even though we only use it for Sp so far, we allow CmmUnwind to specify
        unwind information for any register. This is pretty cheap and could
        come in useful in future.
      * We allow full CmmExpr expressions for specifying unwind values. The
        advantage here is that we don't have to make up new syntax, and can e.g.
        use the WDS macro directly. On th...
    • Peter Wortmann's avatar
      Tick scopes · 5fecd767
      Peter Wortmann authored
      This patch solves the scoping problem of CmmTick nodes: If we just put
      CmmTicks into blocks we have no idea what exactly they are meant to
      cover.  Here we introduce tick scopes, which allow us to create
      sub-scopes and merged scopes easily.
      * Given that the code often passes Cmm around "head-less", we have to
        make sure that its intended scope does not get lost. To keep the amount
        of passing-around to a minimum we define a CmmAGraphScoped type synonym
        here that just bundles the scope with a portion of Cmm to be assembled
      * We introduce new scopes at somewhat random places, aligning with
        getCode calls. This works surprisingly well, but we might have to
        add new scopes into the mix later on if we find things too be too
      (From Phabricator D169)
    • Peter Wortmann's avatar
      Source notes (Cmm support) · 7ceaf96f
      Peter Wortmann authored
      This patch adds CmmTick nodes to Cmm code. This is relatively
      straight-forward, but also not very useful, as many blocks will simply
      end up with no annotations whatosever.
      * We use this design over, say, putting ticks into the entry node of all
        blocks, as it seems to work better alongside existing optimisations.
        Now granted, the reason for this is that currently GHC's main Cmm
        optimisations seem to mainly reorganize and merge code, so this might
        change in the future.
      * We have the Cmm parser generate a few source notes as well. This is
        relatively easy to do - worst part is that it complicates the CmmParse
        implementation a bit.
      (From Phabricator D169)
  3. 12 Nov, 2014 1 commit
  4. 18 Oct, 2014 1 commit
    • Herbert Valerio Riedel's avatar
      Implement optimized NCG `MO_Ctz W64` op for i386 (#9340) · 612f3d12
      Herbert Valerio Riedel authored
      This is an optimization to the CTZ primops introduced for #9340
      Previously we called out to `hs_ctz64`, but we can actually generate
      better hand-tuned code while avoiding the FFI ccall.
      With this patch, the code
        {-# LANGUAGE MagicHash #-}
        module TestClz0 where
        import GHC.Prim
        ctz64 :: Word64# -> Word#
        ctz64 x = ctz64# x
      results in the following assembler generated by NCG on i386:
            movl (%ebp),%eax
            movl 4(%ebp),%ecx
            movl %ecx,%edx
            orl %eax,%edx
            movl $64,%edx
            je _nAO
            bsf %ecx,%ecx
            addl $32,%ecx
            bsf %eax,%eax
            cmovne %eax,%ecx
            movl %ecx,%edx
            movl %edx,%esi
            addl $8,%ebp
            jmp *(%ebp)
      For comparision, here's what LLVM 3.4 currently generates:
        000000fc <TestClzz_ctzz64_info>:
          fc:   0f bc 45 04             bsf    0x4(%ebp),%eax
         100:   b9 20 00 00 00          mov    $0x20,%ecx
         105:   0f 45 c8                cmovne %eax,%ecx
         108:   83 c1 20                add    $0x20,%ecx
         10b:   8b 45 00                mov    0x0(%ebp),%eax
         10e:   8b 55 08                mov    0x8(%ebp),%edx
         111:   0f bc f0                bsf    %eax,%esi
         114:   85 c0                   test   %eax,%eax
         116:   0f 44 f1                cmove  %ecx,%esi
         119:   83 c5 08                add    $0x8,%ebp
         11c:   ff e2                   jmp    *%edx
      Reviewed By: austin
      Auditors: simonmar
      Differential Revision: https://phabricator.haskell.org/D163
  5. 23 Aug, 2014 1 commit
    • rwbarton's avatar
      Add MO_AddIntC, MO_SubIntC MachOps and implement in X86 backend · cfd08a99
      rwbarton authored
      These MachOps are used by addIntC# and subIntC#, which in turn are
      used in integer-gmp when adding or subtracting small Integers. The
      following benchmark shows a ~6% speedup after this commit on x86_64
      (building GHC with BuildFlavour=perf).
          {-# LANGUAGE MagicHash #-}
          import GHC.Exts
          import Criterion.Main
          count :: Int -> Integer
          count (I# n#) = go n# 0
            where go :: Int# -> Integer -> Integer
                  go 0# acc = acc
                  go n# acc = go (n# -# 1#) $! acc + 1
          main = defaultMain [bgroup "count"
                                [bench "100" $ whnf count 100]]
      Differential Revision: https://phabricator.haskell.org/D140
  6. 14 Aug, 2014 1 commit
    • Herbert Valerio Riedel's avatar
      Implement new CLZ and CTZ primops (re #9340) · e0c1767d
      Herbert Valerio Riedel authored
      This implements the new primops
        clz#, clz32#, clz64#,
        ctz#, ctz32#, ctz64#
      which provide efficient implementations of the popular
      count-leading-zero and count-trailing-zero respectively
      (see testcase for a pure Haskell reference implementation).
      On x86, NCG as well as LLVM generates code based on the BSF/BSR
      instructions (which need extra logic to make the 0-case well-defined).
      Test Plan: validate and succesful tests on i686 and amd64
      Reviewers: rwbarton, simonmar, ezyang, austin
      Subscribers: simonmar, relrod, ezyang, carter
      Differential Revision: https://phabricator.haskell.org/D144
      GHC Trac Issues: #9340
  7. 12 Aug, 2014 1 commit
    • rwbarton's avatar
      x86: zero extend the result of 16-bit popcnt instructions (#9435) · 64151913
      rwbarton authored
      The 'popcnt r16, r/m16' instruction only writes the low 16 bits of
      the destination register, so we have to zero-extend the result to
      a full word as popCnt16# is supposed to return a Word#.
      For popCnt8# we could instead zero-extend the input to 32 bits
      and then do a 32-bit popcnt, and not have to zero-extend the result.
      LLVM produces the 16-bit popcnt sequence with two zero extensions,
      though, and who am I to argue?
      Test Plan:
       - ran "make TEST=cgrun071 EXTRA_HC_OPTS=-msse42"
       - then ran again adding "WAY=optasm", and verified that
         the popcnt sequences we generate match the ones produced
         by LLVM for its @llvm.ctpop.* intrinsics
      Reviewers: austin, hvr, tibbe
      Reviewed By: austin, hvr, tibbe
      Subscribers: phaskell, hvr, simonmar, relrod, ezyang, carter
      Differential Revision: https://phabricator.haskell.org/D147
      GHC Trac Issues: #9435
  8. 11 Aug, 2014 1 commit
  9. 10 Aug, 2014 1 commit
    • rwbarton's avatar
      Eliminate some code duplication in x86 backend (genCCall32/64) · c80d2381
      rwbarton authored
      No functional changes except in panic messages.
      These functions were identical except for
      - x87 operations in genCCall32
      - the fallback to genCCall32'/64'
      - "32" vs "64" in panic messages (one case was wrong!)
      - minor syntactic or otherwise non-functional differences.
      Test Plan:
      Ran "validate --no-dph --slow" before and after the change.
      Only differences were two tests that failed before the change but not after,
      further investigation revealed that those tests are in fact erratic.
      Reviewers: simonmar, austin
      Reviewed By: austin
      Subscribers: phaskell, simonmar, relrod, ezyang, carter
      Differential Revision: https://phabricator.haskell.org/D139
  10. 23 Jul, 2014 2 commits
  11. 21 Jul, 2014 1 commit
    • Edward Z. Yang's avatar
      Rename PackageId to PackageKey, distinguishing it from Cabal's PackageId. · 4bebab25
      Edward Z. Yang authored
      Previously, both Cabal and GHC defined the type PackageId, and we expected
      them to be roughly equivalent (but represented differently).  This refactoring
      separates these two notions.
      A package ID is a user-visible identifier; it's the thing you write in a
      Cabal file, e.g. containers-0.9.  The components of this ID are semantically
      meaningful, and decompose into a package name and a package vrsion.
      A package key is an opaque identifier used by GHC to generate linking symbols.
      Presently, it just consists of a package name and a package version, but
      pursuant to #9265
       we are planning to extend it to record other information.
      Within a single executable, it uniquely identifies a package.  It is *not* an
      InstalledPackageId, as the choice of a package key affects the ABI of a package
      (whereas an InstalledPackageId is computed after compilation.)  Cabal computes
      a package key for the package and passes it to GHC using -package-name (now
      *extremely* misnamed).
      As an added bonus, we don't have to worry about shadowing anymore.
      As a follow on, we should introduce -current-package-key having the same role as
      -package-name, and deprecate the old flag.  This commit is just renaming.
      The haddock submodule needed to be updated.
      Signed-off-by: default avatarEdward Z. Yang <ezyang@cs.stanford.edu>
      Test Plan: validate
      Reviewers: simonpj, simonmar, hvr, austin
      Subscribers: simonmar, relrod, carter
      Differential Revision: https://phabricator.haskell.org/D79
  12. 30 Jun, 2014 1 commit
    • tibbe's avatar
      Re-add more primops for atomic ops on byte arrays · 4ee4ab01
      tibbe authored
      This is the second attempt to add this functionality. The first
      attempt was reverted in 950fcae4, due
      to register allocator failure on x86. Given how the register
      allocator currently works, we don't have enough registers on x86 to
      support cmpxchg using complicated addressing modes. Instead we fall
      back to a simpler addressing mode on x86.
      Adds the following primops:
       * atomicReadIntArray#
       * atomicWriteIntArray#
       * fetchSubIntArray#
       * fetchOrIntArray#
       * fetchXorIntArray#
       * fetchAndIntArray#
      Makes these pre-existing out-of-line primops inline:
       * fetchAddIntArray#
       * casIntArray#
  13. 26 Jun, 2014 1 commit
  14. 24 Jun, 2014 1 commit
    • tibbe's avatar
      Add more primops for atomic ops on byte arrays · d8abf85f
      tibbe authored
      Add more primops for atomic ops on byte arrays
      Adds the following primops:
       * atomicReadIntArray#
       * atomicWriteIntArray#
       * fetchSubIntArray#
       * fetchOrIntArray#
       * fetchXorIntArray#
       * fetchAndIntArray#
      Makes these pre-existing out-of-line primops inline:
       * fetchAddIntArray#
       * casIntArray#
  15. 10 Jun, 2014 1 commit
  16. 15 May, 2014 1 commit
    • Herbert Valerio Riedel's avatar
      Add LANGUAGE pragmas to compiler/ source files · 23892440
      Herbert Valerio Riedel authored
      In some cases, the layout of the LANGUAGE/OPTIONS_GHC lines has been
      reorganized, while following the convention, to
      - place `{-# LANGUAGE #-}` pragmas at the top of the source file, before
        any `{-# OPTIONS_GHC #-}`-lines.
      - Moreover, if the list of language extensions fit into a single
        `{-# LANGUAGE ... -#}`-line (shorter than 80 characters), keep it on one
        line. Otherwise split into `{-# LANGUAGE ... -#}`-lines for each
        individual language extension. In both cases, try to keep the
        enumeration alphabetically ordered.
        (The latter layout is preferable as it's more diff-friendly)
      While at it, this also replaces obsolete `{-# OPTIONS ... #-}` pragma
      occurences by `{-# OPTIONS_GHC ... #-}` pragmas.
  17. 26 Mar, 2014 1 commit
    • tibbe's avatar
      Add flags to control memcpy and memset inlining · 11b31c3c
      tibbe authored
      This adds -fmax-inline-memcpy-insns and -fmax-inline-memset-insns.
      These flags control when we inline calls to memcpy/memset with
      statically known arguments. The flag naming style is taken from GCC
      and the same limit is used by both GCC and LLVM.
  18. 02 Oct, 2013 1 commit
  19. 23 Sep, 2013 1 commit
    • gmainlan@microsoft.com's avatar
      SIMD primops are now generated using schemas that are polymorphic in · 16b350a4
      gmainlan@microsoft.com authored
      width and element type.
      SIMD primops are now polymorphic in vector size and element type, but
      only internally to the compiler. More specifically, utils/genprimopcode
      has been extended so that it "knows" about SIMD vectors. This allows us
      to, for example, write a single definition for the "add two vectors"
      primop in primops.txt.pp and have it instantiated at many vector types.
      This generates a primop in GHC.Prim for each vector type at which "add
      two vectors" is instantiated, but only one data constructor for the
      PrimOp data type, so the code generator is much, much simpler.
  20. 17 Jul, 2013 1 commit
  21. 19 Jun, 2013 1 commit
  22. 11 Jun, 2013 1 commit
  23. 09 Jun, 2013 1 commit
    • ian@well-typed.com's avatar
      Add support for byte endian swapping for Word 16/32/64. · 1c5b0511
      ian@well-typed.com authored
      * Exposes bSwap{,16,32,64}# primops
      * Add a new machops MO_BSwap
      * Use a Stg implementation (hs_bswap{16,32,64}) for other implementation
        in NCG.
      * Generate bswap in X86 NCG for 32 and 64 bits, and for 16 bits, bswap+shr
        instead of using xchg.
      * Generate llvm.bswap intrinsics in llvm codegen.
      Patch from Vincent Hanquez.
  24. 13 May, 2013 1 commit
  25. 23 Feb, 2013 1 commit
  26. 01 Feb, 2013 4 commits
  27. 30 Jan, 2013 1 commit
  28. 10 Jan, 2013 1 commit
    • tibbe's avatar
      Add preprocessor defines when SSE is enabled · bab8dc79
      tibbe authored
      This will add the following preprocessor defines when Haskell source
      files are compiled:
       * __SSE__ - If any version of SSE is enabled
       * __SSE2__ - If SSE2 or greater is enabled
       * __SSE4_2_ - If SSE4.2 is enabled
      Note that SSE2 is enabled by default on x86-64.
  29. 13 Dec, 2012 1 commit
  30. 12 Nov, 2012 1 commit
    • Simon Marlow's avatar
      Remove OldCmm, convert backends to consume new Cmm · d92bd17f
      Simon Marlow authored
      This removes the OldCmm data type and the CmmCvt pass that converts
      new Cmm to OldCmm.  The backends (NCGs, LLVM and C) have all been
      converted to consume new Cmm.
      The main difference between the two data types is that conditional
      branches in new Cmm have both true/false successors, whereas in OldCmm
      the false case was a fallthrough.  To generate slightly better code we
      occasionally need to invert a conditional to ensure that the
      branch-not-taken becomes a fallthrough; this was previously done in
      CmmCvt, and it is now done in CmmContFlowOpt.
      We could go further and use the Hoopl Block representation for native
      code, which would mean that we could use Hoopl's postorderDfs and
      analyses for native code, but for now I've left it as is, using the
      old ListGraph representation for native code.
  31. 01 Nov, 2012 1 commit
  32. 30 Oct, 2012 2 commits
  33. 16 Oct, 2012 1 commit
    • ian@well-typed.com's avatar
      Some alpha renaming · cd33eefd
      ian@well-typed.com authored
      Mostly d -> g (matching DynFlag -> GeneralFlag).
      Also renamed if* to when*, matching the Haskell if/when names