1. 20 Jun, 2019 1 commit
    • John Ericson's avatar
      Move 'Platform' to ghc-boot · bff2f24b
      John Ericson authored
      ghc-pkg needs to be aware of platforms so it can figure out which
      subdire within the user package db to use. This is admittedly
      roundabout, but maybe Cabal could use the same notion of a platform as
      GHC to good affect too.
      bff2f24b
  2. 11 Apr, 2019 1 commit
    • Carter Schonwald's avatar
      removing x87 register support from native code gen · 42504f4a
      Carter Schonwald authored
      * simplifies registers to have GPR, Float and Double, by removing the SSE2 and X87 Constructors
      * makes -msse2 assumed/default for x86 platforms, fixing a long standing nondeterminism in rounding
      behavior in 32bit haskell code
      * removes the 80bit floating point representation from the supported float sizes
      * theres still 1 tiny bit of x87 support needed,
      for handling float and double return values in FFI calls  wrt the C ABI on x86_32,
      but this one piece does not leak into the rest of NCG.
      * Lots of code thats not been touched in a long time got deleted as a
      consequence of all of this
      
      all in all, this change paves the way towards a lot of future further
      improvements in how GHC handles floating point computations, along with
      making the native code gen more accessible to a larger pool of contributors.
      42504f4a
  3. 09 Feb, 2019 1 commit
  4. 08 Feb, 2019 1 commit
    • Andreas Klebinger's avatar
      Allow resizing the stack for the graph allocator. · 03b7abc1
      Andreas Klebinger authored
      The graph allocator now dynamically resizes the number of stack
      slots when running into the limit.
      
      This fixes #8657.
      
      Also loop membership of basic blocks is now available
      in the register allocator for cost heuristics.
      03b7abc1
  5. 30 Jan, 2019 3 commits
  6. 17 Nov, 2018 1 commit
    • Andreas Klebinger's avatar
      NCG: New code layout algorithm. · 912fd2b6
      Andreas Klebinger authored
      Summary:
      This patch implements a new code layout algorithm.
      It has been tested for x86 and is disabled on other platforms.
      
      Performance varies slightly be CPU/Machine but in general seems to be better
      by around 2%.
      Nofib shows only small differences of about +/- ~0.5% overall depending on
      flags/machine performance in other benchmarks improved significantly.
      
      Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec,
      containers, text and xeno.
      
      While the magnitude of gains differed three different CPUs where tested with
      all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell,
      Skylake
      
      * Library benchmark results summarized:
        * containers: ~1.5% faster
        * aeson: ~2% faster
        * megaparsec: ~2-5% faster
        * xml library benchmarks: 0.2%-1.1% faster
        * vector-benchmarks: 1-4% faster
        * text: 5.5% faster
      
      On average GHC compile times go down, as GHC compiled with the new layout
      is faster than the overhead introduced by using the new layout algorithm,
      
      Things this patch does:
      
      * Move code responsilbe for block layout in it's own module.
      * Move the NcgImpl Class into the NCGMonad module.
      * Extract a control flow graph from the input cmm.
      * Update this cfg to keep it in sync with changes during
        asm codegen. This has been tested on x64 but should work on x86.
        Other platforms still use the old codelayout.
      * Assign weights to the edges in the CFG based on type and limited static
        analysis which are then used for block layout.
      * Once we have the final code layout eliminate some redundant jumps.
      
        In particular turn a sequences of:
            jne .foo
            jmp .bar
          foo:
        into
            je bar
          foo:
            ..
      
      Test Plan: ci
      
      Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott
      
      Reviewed By: RyanGlScott
      
      Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton
      
      GHC Trac Issues: #15124
      
      Differential Revision: https://phabricator.haskell.org/D4726
      912fd2b6
  7. 02 Nov, 2018 1 commit
    • Michal Terepeta's avatar
      Add Int8# and Word8# · 2c959a18
      Michal Terepeta authored
      This is the first step of implementing:
      https://github.com/ghc-proposals/ghc-proposals/pull/74
      
      The main highlights/changes:
      
          primops.txt.pp gets two new sections for two new primitive types for
          signed and unsigned 8-bit integers (Int8# and Word8 respectively) along
          with basic arithmetic and comparison operations. PrimRep/RuntimeRep get
          two new constructors for them. All of the primops translate into the
          existing MachOPs.
      
          For CmmCalls the codegen will now zero-extend the values at call
          site (so that they can be moved to the right register) and then truncate
          them back their original width.
      
          x86 native codegen needed some updates, since it wasn't able to deal
          with the new widths, but all the changes are quite localized. LLVM
          backend seems to just work.
      
      This is the second attempt at merging this, after the first attempt in
      D4475 had to be backed out due to regressions on i386.
      
      Bumps binary submodule.
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      
      Test Plan: ./validate (on both x86-{32,64})
      
      Reviewers: bgamari, hvr, goldfire, simonmar
      
      Subscribers: rwbarton, carter
      
      Differential Revision: https://phabricator.haskell.org/D5258
      2c959a18
  8. 09 Oct, 2018 1 commit
    • Ben Gamari's avatar
      Revert "Add Int8# and Word8#" · d728c3c5
      Ben Gamari authored
      This unfortunately broke i386 support since it introduced references to
      byte-sized registers that don't exist on that architecture.
      
      Reverts binary submodule
      
      This reverts commit 5d5307f9.
      d728c3c5
  9. 07 Oct, 2018 1 commit
    • Michal Terepeta's avatar
      Add Int8# and Word8# · 5d5307f9
      Michal Terepeta authored
      This is the first step of implementing:
      https://github.com/ghc-proposals/ghc-proposals/pull/74
      
      The main highlights/changes:
      
      - `primops.txt.pp` gets two new sections for two new primitive types
        for signed and unsigned 8-bit integers (`Int8#` and `Word8`
        respectively) along with basic arithmetic and comparison
        operations. `PrimRep`/`RuntimeRep` get two new constructors for
        them. All of the primops translate into the existing `MachOP`s.
      
      - For `CmmCall`s the codegen will now zero-extend the values at call
        site (so that they can be moved to the right register) and then
        truncate them back their original width.
      
      - x86 native codegen needed some updates, since it wasn't able to deal
        with the new widths, but all the changes are quite localized. LLVM
        backend seems to just work.
      
      Bumps binary submodule.
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      
      Test Plan: ./validate with new tests
      
      Reviewers: hvr, goldfire, bgamari, simonmar
      
      Subscribers: Abhiroop, dfeuer, rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4475
      5d5307f9
  10. 18 Jul, 2018 1 commit
    • Tamar Christina's avatar
      stack: fix stack allocations on Windows · d0bbe1bf
      Tamar Christina authored
      Summary:
      On Windows one is not allowed to drop the stack by more than a page size.
      The reason for this is that the OS only allocates enough stack till what
      the TEB specifies. After that a guard page is placed and the rest of the
      virtual address space is unmapped.
      
      The intention is that doing stack allocations will cause you to hit the
      guard which will then map the next page in and move the guard.  This is
      done to prevent what in the Linux world is known as stack clash
      vulnerabilities https://access.redhat.com/security/cve/cve-2017-1000364.
      
      There are modules in GHC for which the liveliness analysis thinks the
      reserved 8KB of spill slots isn't enough.  One being DynFlags and the
      other being Cabal.
      
      Though I think the Cabal one is likely a bug:
      
      ```
        4d6544:       81 ec 00 46 00 00       sub    $0x4600,%esp
        4d654a:       8d 85 94 fe ff ff       lea    -0x16c(%ebp),%eax
        4d6550:       3b 83 1c 03 00 00       cmp    0x31c(%ebx),%eax
        4d6556:       0f 82 de 8d 02 00       jb     4ff33a <_cLpg_info+0x7a>
        4d655c:       c7 45 fc 14 3d 50 00    movl   $0x503d14,-0x4(%ebp)
        4d6563:       8b 75 0c                mov    0xc(%ebp),%esi
        4d6566:       83 c5 fc                add    $0xfffffffc,%ebp
        4d6569:       66 f7 c6 03 00          test   $0x3,%si
        4d656e:       0f 85 a6 d7 02 00       jne    503d1a <_cLpb_info+0x6>
        4d6574:       81 c4 00 46 00 00       add    $0x4600,%esp
      ```
      
      It allocates nearly 18KB of spill slots for a simple 4 line function
      and doesn't even use it.  Note that this doesn't happen on x64 or
      when making a validate build.  Only when making a build without a
      validate and build.mk.
      
      This and the allocation in DynFlags means the stack allocation will jump
      over the guard page into unmapped memory areas and GHC or an end program
      segfaults.
      
      The pagesize on x86 Windows is 4KB which means we hit it very easily for
      these two modules, which explains the total DOA of GHC 32bit for the past
      3 releases and the "random" segfaults on Windows.
      
      ```
      0:000> bp 00503d29
      0:000> gn
      Breakpoint 0 hit
      WARNING: Stack overflow detected. The unwound frames are extracted from outside
               normal stack bounds.
      eax=03b6b9c9 ebx=00dc90f0 ecx=03cac48c edx=03cac43d esi=03b6b9c9 edi=03abef40
      eip=00503d29 esp=013e96fc ebp=03cf8f70 iopl=0         nv up ei pl nz na po nc
      cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
      setup+0x103d29:
      00503d29 89442440        mov     dword ptr [esp+40h],eax ss:002b:013e973c=????????
      WARNING: Stack overflow detected. The unwound frames are extracted from outside
               normal stack bounds.
      WARNING: Stack overflow detected. The unwound frames are extracted from outside
               normal stack bounds.
      0:000> !teb
      TEB at 00384000
          ExceptionList:        013effcc
          StackBase:            013f0000
          StackLimit:           013eb000
      ```
      
      This doesn't fix the liveliness analysis but does fix the allocations, by
      emitting a function call to `__chkstk_ms` when doing allocations of larger
      than a page, this will make sure the stack is probed every page so the kernel
      maps in the next page.
      
      `__chkstk_ms` is provided by `libGCC`, which is under the
      `GNU runtime exclusion license`, so it's safe to link against it, even for
      proprietary code. (Technically we already do since we link compiled C code in.)
      
      For allocations smaller than a page we drop the stack and probe the new address.
      This avoids the function call and still makes sure we hit the guard if needed.
      
      PS: In case anyone is Wondering why we didn't notice this before, it's because we
      only test x86_64 and on Windows 10.  On x86_64 the page size is 8KB and also the
      kernel is a bit more lenient on Windows 10 in that it seems to catch the segfault
      and resize the stack if it was unmapped:
      
      ```
      0:000> t
      eax=03b6b9c9 ebx=00dc90f0 ecx=03cac48c edx=03cac43d esi=03b6b9c9 edi=03abef40
      eip=00503d2d esp=013e96fc ebp=03cf8f70 iopl=0         nv up ei pl nz na po nc
      cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
      setup+0x103d2d:
      00503d2d 8b461b          mov     eax,dword ptr [esi+1Bh] ds:002b:03b6b9e4=03cac431
      0:000> !teb
      TEB at 00384000
          ExceptionList:        013effcc
          StackBase:            013f0000
          StackLimit:           013e9000
      ```
      
      Likely Windows 10 has a guard page larger than previous versions.
      
      This fixes the stack allocations, and as soon as I get the time I will look at
      the liveliness analysis. I find it highly unlikely that simple Cabal function
      requires ~2200 spill slots.
      
      Test Plan: ./validate
      
      Reviewers: simonmar, bgamari
      
      Reviewed By: bgamari
      
      Subscribers: AndreasK, rwbarton, thomie, carter
      
      GHC Trac Issues: #15154
      
      Differential Revision: https://phabricator.haskell.org/D4917
      d0bbe1bf
  11. 31 May, 2018 1 commit
    • Andreas Klebinger's avatar
      Change jump targets in JMP_TBL from blocks to X86.JumpDest. · 5748c79e
      Andreas Klebinger authored
      Jump tables always point to blocks when we first generate them.  However
      there are rare situations where we can shortcut one of these blocks to a
      static address during the asm shortcutting pass.
      
      While we already updated the data section accordingly this patch also
      extends this to the references stored in JMP_TBL.
      
      Test Plan: ci
      
      Reviewers: bgamari
      
      Reviewed By: bgamari
      
      Subscribers: thomie, carter
      
      GHC Trac Issues: #15104
      
      Differential Revision: https://phabricator.haskell.org/D4595
      5748c79e
  12. 16 May, 2018 1 commit
    • Simon Marlow's avatar
      Allow CmmLabelDiffOff with different widths · fbd28e2c
      Simon Marlow authored
      Summary:
      This change makes it possible to generate a static 32-bit relative label
      offset on x86_64. Currently we can only generate word-sized label
      offsets.
      
      This will be used in D4634 to shrink info tables.  See D4632 for more
      details.
      
      Test Plan: See D4632
      
      Reviewers: bgamari, niteria, michalt, erikd, jrtc27, osa1
      
      Subscribers: thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4633
      fbd28e2c
  13. 13 Apr, 2018 1 commit
    • Andreas Klebinger's avatar
      Update JMP_TBL targets during shortcutting in X86 NCG. · 120a2617
      Andreas Klebinger authored
      Without updating the JMP_TBL information the block list in
      JMP_TBL contained blocks which were eliminated in some circumstances.
      
      The actual assembly generation doesn't look at these fields so this
      didn't cause any bugs yet. However as long as we carry this information
      around we should make an effort to keep it correct.
      
      Especially since it's useful for debugging purposes and can be used
      for passes near the end of the codegen pipeline.
      In particular it's used by jumpDestsOfInstr which without these changes
      returns the wrong destinations.
      
      Test Plan: ci
      
      Reviewers: bgamari
      
      Subscribers: thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4566
      120a2617
  14. 21 Jan, 2018 1 commit
    • John Ky's avatar
      Add new mbmi and mbmi2 compiler flags · f8557696
      John Ky authored
      This adds support for the bit deposit and extraction operations provided
      by the BMI and BMI2 instruction set extensions on modern amd64 machines.
      
      Implement x86 code generator for pdep and pext.  Properly initialise
      bmiVersion field.
      
      pdep and pext test cases
      
      Fix pattern match for pdep and pext instructions
      
      Fix build of pdep and pext code for 32-bit architectures
      
      Test Plan: Validate
      
      Reviewers: austin, simonmar, bgamari, angerman
      
      Reviewed By: bgamari
      
      Subscribers: trommler, carter, angerman, thomie, rwbarton, newhoggy
      
      GHC Trac Issues: #14206
      
      Differential Revision: https://phabricator.haskell.org/D4236
      f8557696
  15. 28 Nov, 2017 1 commit
    • Ben Gamari's avatar
      cmm: Use LocalBlockLabel instead of AsmTempLabel to represent blocks · 048a9138
      Ben Gamari authored
      blockLbl was originally changed in 8b007abb to
      use mkTempAsmLabel to fix an inconsistency resulting in #14221. However, this
      breaks the C code generator, which doesn't support AsmTempLabels (#14454).
      
      Instead let's try going the other direction: use a new CLabel variety,
      LocalBlockLabel. Then we can teach the C code generator to deal with
      these as well.
      048a9138
  16. 22 Nov, 2017 1 commit
  17. 15 Nov, 2017 1 commit
    • John Ky's avatar
      Add new mbmi and mbmi2 compiler flags · f5dc8ccc
      John Ky authored
      This adds support for the bit deposit and extraction operations provided
      by the BMI and BMI2 instruction set extensions on modern amd64 machines.
      
      Test Plan: Validate
      
      Reviewers: austin, simonmar, bgamari, hvr, goldfire, erikd
      
      Reviewed By: bgamari
      
      Subscribers: goldfire, erikd, trommler, newhoggy, rwbarton, thomie
      
      GHC Trac Issues: #14206
      
      Differential Revision: https://phabricator.haskell.org/D4063
      f5dc8ccc
  18. 19 Sep, 2017 1 commit
    • Herbert Valerio Riedel's avatar
      compiler: introduce custom "GhcPrelude" Prelude · f63bc730
      Herbert Valerio Riedel authored
      This switches the compiler/ component to get compiled with
      -XNoImplicitPrelude and a `import GhcPrelude` is inserted in all
      modules.
      
      This is motivated by the upcoming "Prelude" re-export of
      `Semigroup((<>))` which would cause lots of name clashes in every
      modulewhich imports also `Outputable`
      
      Reviewers: austin, goldfire, bgamari, alanz, simonmar
      
      Reviewed By: bgamari
      
      Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari
      
      Differential Revision: https://phabricator.haskell.org/D3989
      f63bc730
  19. 23 Jun, 2017 1 commit
    • Michal Terepeta's avatar
      Hoopl: remove dependency on Hoopl package · 42eee6ea
      Michal Terepeta authored
      This copies the subset of Hoopl's functionality needed by GHC to
      `cmm/Hoopl` and removes the dependency on the Hoopl package.
      
      The main motivation for this change is the confusing/noisy interface
      between GHC and Hoopl:
      - Hoopl has `Label` which is GHC's `BlockId` but different than
        GHC's `CLabel`
      - Hoopl has `Unique` which is different than GHC's `Unique`
      - Hoopl has `Unique{Map,Set}` which are different than GHC's
        `Uniq{FM,Set}`
      - GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is
        needed just to filter the exposed functions (filter out some of the
        Hoopl's and add the GHC ones)
      With this change, we'll be able to simplify this significantly.
      It'll also be much easier to do invasive changes (Hoopl is a public
      package on Hackage with users that depend on the current behavior)
      
      This should introduce no changes in functionality - it merely
      copies the relevant code.
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      
      Test Plan: ./validate
      
      Reviewers: austin, bgamari, simonmar
      
      Reviewed By: bgamari, simonmar
      
      Subscribers: simonpj, kavon, rwbarton, thomie
      
      Differential Revision: https://phabricator.haskell.org/D3616
      42eee6ea
  20. 28 Apr, 2017 1 commit
  21. 14 Feb, 2017 1 commit
    • Ben Gamari's avatar
      Debug: Use local symbols for unwind points (#13278) · 2d6e91ea
      Ben Gamari authored
      While this apparently didn't matter on Linux, the OS X toolchain seems
      to treat local and external symbols differently during linking. Namely,
      the linker assumes that an external symbol marks the beginning of a new,
      unused procedure, and consequently drops it.
      
      Fixes regression introduced in D2741.
      
      Test Plan: `debug` testcase on OS X
      
      Reviewers: austin, simonmar, rwbarton
      
      Reviewed By: rwbarton
      
      Subscribers: rwbarton, thomie
      
      Differential Revision: https://phabricator.haskell.org/D3135
      2d6e91ea
  22. 08 Feb, 2017 1 commit
    • Ben Gamari's avatar
      Generalize CmmUnwind and pass unwind information through NCG · 3eb737ee
      Ben Gamari authored
      As discussed in D1532, Trac Trac #11337, and Trac Trac #11338, the stack
      unwinding information produced by GHC is currently quite approximate.
      Essentially we assume that register values do not change at all within a
      basic block. While this is somewhat true in normal Haskell code, blocks
      containing foreign calls often break this assumption. This results in
      unreliable call stacks, especially in the code containing foreign calls.
      This is worse than it sounds as unreliable unwinding information can at
      times result in segmentation faults.
      
      This patch set attempts to improve this situation by tracking unwinding
      information with finer granularity. By dispensing with the assumption of
      one unwinding table per block, we allow the compiler to accurately
      represent the areas surrounding foreign calls.
      
      Towards this end we generalize the representation of unwind information
      in the backend in three ways,
      
       * Multiple CmmUnwind nodes can occur per block
      
       * CmmUnwind nodes can now carry unwind information for multiple
         registers (while not strictly necessary; this makes emitting
         unwinding information a bit more convenient in the compiler)
      
       * The NCG backend is given an opportunity to modify the unwinding
         records since it may need to make adjustments due to, for instance,
         native calling convention requirements for foreign calls (see
         #11353).
      
      This sets the stage for resolving #11337 and #11338.
      
      Test Plan: Validate
      
      Reviewers: scpmw, simonmar, austin, erikd
      
      Subscribers: qnikst, thomie
      
      Differential Revision: https://phabricator.haskell.org/D2741
      3eb737ee
  23. 08 Dec, 2016 1 commit
  24. 21 Aug, 2015 1 commit
    • thomie's avatar
      Delete FastBool · 3452473b
      thomie authored
      This reverses some of the work done in Trac #1405, and assumes GHC is
      smart enough to do its own unboxing of booleans now.
      
      I would like to do some more performance measurements, but the code
      changes can be reviewed already.
      
      Test Plan:
      With a perf build:
      ./inplace/bin/ghc-stage2 nofib/spectral/simple/Main.hs -fforce-recomp
      +RTS -t --machine-readable
      
      before:
      ```
        [("bytes allocated", "1300744864")
        ,("num_GCs", "302")
        ,("average_bytes_used", "8811118")
        ,("max_bytes_used", "24477464")
        ,("num_byte_usage_samples", "9")
        ,("peak_megabytes_allocated", "64")
        ,("init_cpu_seconds", "0.001")
        ,("init_wall_seconds", "0.001")
        ,("mutator_cpu_seconds", "2.833")
        ,("mutator_wall_seconds", "4.283")
        ,("GC_cpu_seconds", "0.960")
        ,("GC_wall_seconds", "0.961")
        ]
      ```
      
      after:
      ```
        [("bytes allocated", "1301088064")
        ,("num_GCs", "310")
        ,("average_bytes_used", "8820253")
        ,("max_bytes_used", "24539904")
        ,("num_byte_usage_samples", "9")
        ,("peak_megabytes_allocated", "64")
        ,("init_cpu_seconds", "0.001")
        ,("init_wall_seconds", "0.001")
        ,("mutator_cpu_seconds", "2.876")
        ,("mutator_wall_seconds", "4.474")
        ,("GC_cpu_seconds", "0.965")
        ,("GC_wall_seconds", "0.979")
        ]
      ```
      
      CPU time seems to be up a bit, but I'm not sure. Unfortunately CPU time
      measurements are rather noisy.
      
      Reviewers: austin, bgamari, rwbarton
      
      Subscribers: nomeata
      
      Differential Revision: https://phabricator.haskell.org/D1143
      
      GHC Trac Issues: #1405
      3452473b
  25. 07 Jul, 2015 1 commit
  26. 17 Dec, 2014 1 commit
    • Peter Wortmann's avatar
      Generate .loc/.file directives from source ticks · 64678e9e
      Peter Wortmann authored
      This generates DWARF, albeit indirectly using the assembler. This is
      the easiest (and, apparently, quite standard) method of generating the
      .debug_line DWARF section.
      
      Notes:
      
      * Note we have to make sure that .file directives appear correctly
        before the respective .loc. Right now we ppr them manually, which makes
        them absent from dumps. Fixing this would require .file to become a
        native instruction.
      
      * We have to pass a lot of things around the native code generator. I
        know Ian did quite a bit of refactoring already, but having one common
        monad could *really* simplify things here...
      
      * To support SplitObjcs, we need to emit/reset all DWARF data at every
        split. We use the occassion to move split marker generation to
        cmmNativeGenStream as well, so debug data extraction doesn't have to
        choke on it.
      
      (From Phabricator D396)
      64678e9e
  27. 12 Nov, 2014 1 commit
  28. 27 Sep, 2014 1 commit
    • thomie's avatar
      Stop exporting, and stop using, functions marked as deprecated · 51aa2fa3
      thomie authored
      Don't export `getUs` and `getUniqueUs`. `UniqSM` has a `MonadUnique` instance:
      
          instance MonadUnique UniqSM where
              getUniqueSupplyM = getUs
              getUniqueM  = getUniqueUs
              getUniquesM = getUniquesUs
      
      Commandline-fu used:
      
          git grep -l 'getUs\>' |
              grep -v compiler/basicTypes/UniqSupply.lhs |
              xargs sed -i 's/getUs/getUniqueSupplyM/g
      
          git grep -l 'getUniqueUs\>' |
              grep -v combiler/basicTypes/UniqSupply.lhs |
              xargs sed -i 's/getUniqueUs/getUniqueM/g'
      
      Follow up on b522d3a3
      
      Reviewed By: austin, hvr
      
      Differential Revision: https://phabricator.haskell.org/D220
      51aa2fa3
  29. 31 Aug, 2014 1 commit
  30. 23 Aug, 2014 1 commit
    • rwbarton's avatar
      Add MO_AddIntC, MO_SubIntC MachOps and implement in X86 backend · cfd08a99
      rwbarton authored
      Summary:
      These MachOps are used by addIntC# and subIntC#, which in turn are
      used in integer-gmp when adding or subtracting small Integers. The
      following benchmark shows a ~6% speedup after this commit on x86_64
      (building GHC with BuildFlavour=perf).
      
          {-# LANGUAGE MagicHash #-}
      
          import GHC.Exts
          import Criterion.Main
      
          count :: Int -> Integer
          count (I# n#) = go n# 0
            where go :: Int# -> Integer -> Integer
                  go 0# acc = acc
                  go n# acc = go (n# -# 1#) $! acc + 1
      
          main = defaultMain [bgroup "count"
                                [bench "100" $ whnf count 100]]
      
      Differential Revision: https://phabricator.haskell.org/D140
      cfd08a99
  31. 12 Aug, 2014 2 commits
  32. 11 Aug, 2014 1 commit
  33. 23 Jul, 2014 2 commits
  34. 30 Jun, 2014 1 commit
    • tibbe's avatar
      Re-add more primops for atomic ops on byte arrays · 4ee4ab01
      tibbe authored
      This is the second attempt to add this functionality. The first
      attempt was reverted in 950fcae4, due
      to register allocator failure on x86. Given how the register
      allocator currently works, we don't have enough registers on x86 to
      support cmpxchg using complicated addressing modes. Instead we fall
      back to a simpler addressing mode on x86.
      
      Adds the following primops:
      
       * atomicReadIntArray#
       * atomicWriteIntArray#
       * fetchSubIntArray#
       * fetchOrIntArray#
       * fetchXorIntArray#
       * fetchAndIntArray#
      
      Makes these pre-existing out-of-line primops inline:
      
       * fetchAddIntArray#
       * casIntArray#
      4ee4ab01
  35. 26 Jun, 2014 1 commit
  36. 24 Jun, 2014 1 commit
    • tibbe's avatar
      Add more primops for atomic ops on byte arrays · d8abf85f
      tibbe authored
      Summary:
      Add more primops for atomic ops on byte arrays
      
      Adds the following primops:
      
       * atomicReadIntArray#
       * atomicWriteIntArray#
       * fetchSubIntArray#
       * fetchOrIntArray#
       * fetchXorIntArray#
       * fetchAndIntArray#
      
      Makes these pre-existing out-of-line primops inline:
      
       * fetchAddIntArray#
       * casIntArray#
      d8abf85f