1. 19 Sep, 2017 1 commit
  2. 22 Aug, 2017 1 commit
  3. 01 Aug, 2017 1 commit
    • Ryan Scott's avatar
      Drop GHC 7.10 compatibility · c13720c8
      Ryan Scott authored
      GHC 8.2.1 is out, so now GHC's support window only extends back to GHC
      8.0. This means we can delete gobs of code that was only used for GHC
      7.10 support. Hooray!
      Test Plan: ./validate
      Reviewers: hvr, bgamari, austin, goldfire, simonmar
      Reviewed By: bgamari
      Subscribers: Phyx, rwbarton, thomie
      Differential Revision: https://phabricator.haskell.org/D3781
  4. 23 Jun, 2017 1 commit
    • Michal Terepeta's avatar
      Hoopl: remove dependency on Hoopl package · 42eee6ea
      Michal Terepeta authored
      This copies the subset of Hoopl's functionality needed by GHC to
      `cmm/Hoopl` and removes the dependency on the Hoopl package.
      The main motivation for this change is the confusing/noisy interface
      between GHC and Hoopl:
      - Hoopl has `Label` which is GHC's `BlockId` but different than
        GHC's `CLabel`
      - Hoopl has `Unique` which is different than GHC's `Unique`
      - Hoopl has `Unique{Map,Set}` which are different than GHC's
      - GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is
        needed just to filter the exposed functions (filter out some of the
        Hoopl's and add the GHC ones)
      With this change, we'll be able to simplify this significantly.
      It'll also be much easier to do invasive changes (Hoopl is a public
      package on Hackage with users that depend on the current behavior)
      This should introduce no changes in functionality - it merely
      copies the relevant code.
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      Test Plan: ./validate
      Reviewers: austin, bgamari, simonmar
      Reviewed By: bgamari, simonmar
      Subscribers: simonpj, kavon, rwbarton, thomie
      Differential Revision: https://phabricator.haskell.org/D3616
  5. 28 Apr, 2017 1 commit
  6. 10 Mar, 2017 1 commit
  7. 07 Mar, 2017 1 commit
  8. 23 Feb, 2017 1 commit
  9. 14 Feb, 2017 1 commit
    • Ben Gamari's avatar
      Debug: Use local symbols for unwind points (#13278) · 2d6e91ea
      Ben Gamari authored
      While this apparently didn't matter on Linux, the OS X toolchain seems
      to treat local and external symbols differently during linking. Namely,
      the linker assumes that an external symbol marks the beginning of a new,
      unused procedure, and consequently drops it.
      Fixes regression introduced in D2741.
      Test Plan: `debug` testcase on OS X
      Reviewers: austin, simonmar, rwbarton
      Reviewed By: rwbarton
      Subscribers: rwbarton, thomie
      Differential Revision: https://phabricator.haskell.org/D3135
  10. 08 Feb, 2017 2 commits
    • Ben Gamari's avatar
      Cmm: Add support for undefined unwinding statements · 3328ddb8
      Ben Gamari authored
      And use to mark `stg_stack_underflow_frame`, which we are unable to
      determine a caller from.
      To simplify parsing at the moment we steal the `return` keyword to
      indicate an undefined unwind value. Perhaps this should be revisited.
      Reviewers: scpmw, simonmar, austin, erikd
      Subscribers: dfeuer, thomie
      Differential Revision: https://phabricator.haskell.org/D2738
    • Ben Gamari's avatar
      Generalize CmmUnwind and pass unwind information through NCG · 3eb737ee
      Ben Gamari authored
      As discussed in D1532, Trac Trac #11337, and Trac Trac #11338, the stack
      unwinding information produced by GHC is currently quite approximate.
      Essentially we assume that register values do not change at all within a
      basic block. While this is somewhat true in normal Haskell code, blocks
      containing foreign calls often break this assumption. This results in
      unreliable call stacks, especially in the code containing foreign calls.
      This is worse than it sounds as unreliable unwinding information can at
      times result in segmentation faults.
      This patch set attempts to improve this situation by tracking unwinding
      information with finer granularity. By dispensing with the assumption of
      one unwinding table per block, we allow the compiler to accurately
      represent the areas surrounding foreign calls.
      Towards this end we generalize the representation of unwind information
      in the backend in three ways,
       * Multiple CmmUnwind nodes can occur per block
       * CmmUnwind nodes can now carry unwind information for multiple
         registers (while not strictly necessary; this makes emitting
         unwinding information a bit more convenient in the compiler)
       * The NCG backend is given an opportunity to modify the unwinding
         records since it may need to make adjustments due to, for instance,
         native calling convention requirements for foreign calls (see
      This sets the stage for resolving #11337 and #11338.
      Test Plan: Validate
      Reviewers: scpmw, simonmar, austin, erikd
      Subscribers: qnikst, thomie
      Differential Revision: https://phabricator.haskell.org/D2741
  11. 01 Oct, 2016 1 commit
    • Sylvain HENRY's avatar
      CodeGen X86: fix unsafe foreign calls wrt inlining · b61b7c24
      Sylvain HENRY authored
      Foreign calls (unsafe and safe) interact badly with inlining and
      register passing ABIs (see #11792 and #12614):
      the inlined code to compute a parameter of the call may overwrite a
      register already set to pass a preceding parameter.
      With this patch, we compute all parameters which are not simple
      expressions before assigning them to fixed registers required by the
      Test Plan:
         - Add test (test both reg and stack parameters)
         - Validate
      Reviewers: osa1, bgamari, austin, simonmar
      Reviewed By: simonmar
      Subscribers: thomie
      Differential Revision: https://phabricator.haskell.org/D2263
      GHC Trac Issues: #11792, #12614
  12. 15 Sep, 2016 1 commit
    • Simon Marlow's avatar
      Fix codegen bug in PIC version of genSwitch (#12433) · 86836a2e
      Simon Marlow authored
      * getNonClobberedReg instead of getSomeReg, because the reg needs to
        survive across t_code
      * Use a new reg for the table offset calculation instead of clobbering
        the reg returned by expr (this was the bug affecting #12433)
      Test Plan: New unit test; validate
      Reviewers: rwbarton, bgamari, austin, erikd
      Subscribers: thomie
      Differential Revision: https://phabricator.haskell.org/D2529
      GHC Trac Issues: #12433
  13. 19 Aug, 2016 1 commit
  14. 05 Aug, 2016 1 commit
    • avd's avatar
      codeGen: Remove binutils<2.17 hack, fixes T11758 · e3e2e49a
      avd authored
      There was a complication on the x86_64 platform, where pointers were 64
      bits, but the tools didn't support 64-bit relative relocations.  This
      was true before binutils 2.17, which nowadays is quite standart (even
      CentOs 5 is shipped with 2.17).
      Hacks were removed from x86 genSwitch and asm pretty printer. Also
      [x86-64-relative] note was dropped from
      includes/rts/storage/InfoTables.h as it's not referenced anywhere now.
      Reviewers: austin, simonmar, rwbarton, erikd, bgamari
      Reviewed By: simonmar, erikd, bgamari
      Subscribers: thomie
      Differential Revision: https://phabricator.haskell.org/D2426
  15. 10 Apr, 2016 1 commit
    • Herbert Valerio Riedel's avatar
      Reduce default for -fmax-pmcheck-iterations from 1e7 to 2e6 · d2e05c6b
      Herbert Valerio Riedel authored
      The commit 28f951ed introduced the
      `-fmax-pmcheck-iterations` flag and set the default limit to 1e7
      However, this value is still high enough that it can result GHC to
      exhibit memory spikes beyond 1 GiB of RAM usage (heap profile showed
      several `(:)`s, as well as `THUNK_2_0`, and `PmCon` during the memory
      A value of 2e6 seems to be a safer upper bound which still manages to
      let the checker not run into the limit in most cases.
      Test Plan: Validate, try building a few Hackage packages
      Reviewers: austin, gkaracha, bgamari
      Reviewed By: bgamari
      Subscribers: thomie
      Differential Revision: https://phabricator.haskell.org/D2095
  16. 12 Nov, 2015 1 commit
    • olsner's avatar
      Implement function-sections for Haskell code, #8405 · 4a32bf92
      olsner authored
      This adds a flag -split-sections that does similar things to
      -split-objs, but using sections in single object files instead of
      relying on the Satanic Splitter and other abominations. This is very
      similar to the GCC flags -ffunction-sections and -fdata-sections.
      The --gc-sections linker flag, which allows unused sections to actually
      be removed, is added to all link commands (if the linker supports it) so
      that space savings from having base compiled with sections can be
      Supported both in LLVM and the native code-gen, in theory for all
      architectures, but really tested on x86 only.
      In the GHC build, a new SplitSections variable enables -split-sections
      for relevant parts of the build.
      Test Plan: validate with both settings of SplitSections
      Reviewers: dterei, Phyx, austin, simonmar, thomie, bgamari
      Reviewed By: simonmar, thomie, bgamari
      Subscribers: hsyl20, erikd, kgardas, thomie
      Differential Revision: https://phabricator.haskell.org/D1242
      GHC Trac Issues: #8405
  17. 31 Oct, 2015 1 commit
  18. 15 Oct, 2015 1 commit
  19. 23 Sep, 2015 1 commit
    • Simon Marlow's avatar
      Annotate CmmBranch with an optional likely target · 939a7d63
      Simon Marlow authored
      This allows the code generator to give hints to later code generation
      steps about which branch is most likely to be taken.  Right now it
      is only taken into account in one place: a special case in
      CmmContFlowOpt that swapped branches over to maximise the chance of
      fallthrough, which is now disabled when there is a likelihood setting.
      Test Plan: validate
      Reviewers: austin, simonpj, bgamari, ezyang, tibbe
      Subscribers: thomie
      Differential Revision: https://phabricator.haskell.org/D1273
  20. 12 Sep, 2015 1 commit
  21. 21 Aug, 2015 1 commit
    • thomie's avatar
      Delete FastBool · 3452473b
      thomie authored
      This reverses some of the work done in Trac #1405, and assumes GHC is
      smart enough to do its own unboxing of booleans now.
      I would like to do some more performance measurements, but the code
      changes can be reviewed already.
      Test Plan:
      With a perf build:
      ./inplace/bin/ghc-stage2 nofib/spectral/simple/Main.hs -fforce-recomp
      +RTS -t --machine-readable
        [("bytes allocated", "1300744864")
        ,("num_GCs", "302")
        ,("average_bytes_used", "8811118")
        ,("max_bytes_used", "24477464")
        ,("num_byte_usage_samples", "9")
        ,("peak_megabytes_allocated", "64")
        ,("init_cpu_seconds", "0.001")
        ,("init_wall_seconds", "0.001")
        ,("mutator_cpu_seconds", "2.833")
        ,("mutator_wall_seconds", "4.283")
        ,("GC_cpu_seconds", "0.960")
        ,("GC_wall_seconds", "0.961")
        [("bytes allocated", "1301088064")
        ,("num_GCs", "310")
        ,("average_bytes_used", "8820253")
        ,("max_bytes_used", "24539904")
        ,("num_byte_usage_samples", "9")
        ,("peak_megabytes_allocated", "64")
        ,("init_cpu_seconds", "0.001")
        ,("init_wall_seconds", "0.001")
        ,("mutator_cpu_seconds", "2.876")
        ,("mutator_wall_seconds", "4.474")
        ,("GC_cpu_seconds", "0.965")
        ,("GC_wall_seconds", "0.979")
      CPU time seems to be up a bit, but I'm not sure. Unfortunately CPU time
      measurements are rather noisy.
      Reviewers: austin, bgamari, rwbarton
      Subscribers: nomeata
      Differential Revision: https://phabricator.haskell.org/D1143
      GHC Trac Issues: #1405
  22. 07 Jul, 2015 1 commit
  23. 16 Jun, 2015 1 commit
  24. 30 Mar, 2015 1 commit
    • Joachim Breitner's avatar
      Refactor the story around switches (#10137) · de1160be
      Joachim Breitner authored
      This re-implements the code generation for case expressions at the Stg →
      Cmm level, both for data type cases as well as for integral literal
      cases. (Cases on float are still treated as before).
      The goal is to allow for fancier strategies in implementing them, for a
      cleaner separation of the strategy from the gritty details of Cmm, and
      to run this later than the Common Block Optimization, allowing for one
      way to attack #10124. The new module CmmSwitch contains a number of
      notes explaining this changes. For example, it creates larger
      consecutive jump tables than the previous code, if possible.
      nofib shows little significant overall improvement of runtime. The
      rather large wobbling comes from changes in the code block order
      (see #8082, not much we can do about it). But the decrease in code size
      alone makes this worthwhile.
              Program           Size    Allocs   Runtime   Elapsed  TotalMem
                  Min          -1.8%      0.0%     -6.1%     -6.1%     -2.9%
                  Max          -0.7%     +0.0%     +5.6%     +5.7%     +7.8%
       Geometric Mean          -1.4%     -0.0%     -0.3%     -0.3%     +0.0%
      Compilation time increases slightly:
              -1 s.d.                -----            -2.0%
              +1 s.d.                -----            +2.5%
              Average                -----            +0.3%
      The test case T783 regresses a lot, but it is the only one exhibiting
      any regression. The cause is the changed order of branches in an
      if-then-else tree, which makes the hoople data flow analysis traverse
      the blocks in a suboptimal order. Reverting that gets rid of this
      regression, but has a consistent, if only very small (+0.2%), negative
      effect on runtime. So I conclude that this test is an extreme outlier
      and no reason to change the code.
      Differential Revision: https://phabricator.haskell.org/D720
  25. 17 Dec, 2014 1 commit
    • Peter Wortmann's avatar
      Generate .loc/.file directives from source ticks · 64678e9e
      Peter Wortmann authored
      This generates DWARF, albeit indirectly using the assembler. This is
      the easiest (and, apparently, quite standard) method of generating the
      .debug_line DWARF section.
      * Note we have to make sure that .file directives appear correctly
        before the respective .loc. Right now we ppr them manually, which makes
        them absent from dumps. Fixing this would require .file to become a
        native instruction.
      * We have to pass a lot of things around the native code generator. I
        know Ian did quite a bit of refactoring already, but having one common
        monad could *really* simplify things here...
      * To support SplitObjcs, we need to emit/reset all DWARF data at every
        split. We use the occassion to move split marker generation to
        cmmNativeGenStream as well, so debug data extraction doesn't have to
        choke on it.
      (From Phabricator D396)
  26. 16 Dec, 2014 3 commits
    • Peter Wortmann's avatar
      Add unwind information to Cmm · 711a51ad
      Peter Wortmann authored
      Unwind information allows the debugger to discover more information
      about a program state, by allowing it to "reconstruct" other states of
      the program. In practice, this means that we explain to the debugger
      how to unravel stack frames, which comes down mostly to explaining how
      to find their Sp and Ip register values.
      * We declare yet another new constructor for CmmNode - and this time
        there's actually little choice, as unwind information can and will
        change mid-block. We don't actually make use of these capabilities,
        and back-end support would be tricky (generate new labels?), but it
        feels like the right way to do it.
      * Even though we only use it for Sp so far, we allow CmmUnwind to specify
        unwind information for any register. This is pretty cheap and could
        come in useful in future.
      * We allow full CmmExpr expressions for specifying unwind values. The
        advantage here is that we don't have to make up new syntax, and can e.g.
        use the WDS macro directly. On the other hand, the back-end will now
        have to simplify the expression until it can sensibly be converted
        into DWARF byte code - a process which might fail, yielding NCG panics.
        On the other hand, when you're writing Cmm by hand you really ought to
        know what you're doing.
      (From Phabricator D169)
    • Peter Wortmann's avatar
      Tick scopes · 5fecd767
      Peter Wortmann authored
      This patch solves the scoping problem of CmmTick nodes: If we just put
      CmmTicks into blocks we have no idea what exactly they are meant to
      cover.  Here we introduce tick scopes, which allow us to create
      sub-scopes and merged scopes easily.
      * Given that the code often passes Cmm around "head-less", we have to
        make sure that its intended scope does not get lost. To keep the amount
        of passing-around to a minimum we define a CmmAGraphScoped type synonym
        here that just bundles the scope with a portion of Cmm to be assembled
      * We introduce new scopes at somewhat random places, aligning with
        getCode calls. This works surprisingly well, but we might have to
        add new scopes into the mix later on if we find things too be too
      (From Phabricator D169)
    • Peter Wortmann's avatar
      Source notes (Cmm support) · 7ceaf96f
      Peter Wortmann authored
      This patch adds CmmTick nodes to Cmm code. This is relatively
      straight-forward, but also not very useful, as many blocks will simply
      end up with no annotations whatosever.
      * We use this design over, say, putting ticks into the entry node of all
        blocks, as it seems to work better alongside existing optimisations.
        Now granted, the reason for this is that currently GHC's main Cmm
        optimisations seem to mainly reorganize and merge code, so this might
        change in the future.
      * We have the Cmm parser generate a few source notes as well. This is
        relatively easy to do - worst part is that it complicates the CmmParse
        implementation a bit.
      (From Phabricator D169)
  27. 12 Nov, 2014 1 commit
  28. 18 Oct, 2014 1 commit
    • Herbert Valerio Riedel's avatar
      Implement optimized NCG `MO_Ctz W64` op for i386 (#9340) · 612f3d12
      Herbert Valerio Riedel authored
      This is an optimization to the CTZ primops introduced for #9340
      Previously we called out to `hs_ctz64`, but we can actually generate
      better hand-tuned code while avoiding the FFI ccall.
      With this patch, the code
        {-# LANGUAGE MagicHash #-}
        module TestClz0 where
        import GHC.Prim
        ctz64 :: Word64# -> Word#
        ctz64 x = ctz64# x
      results in the following assembler generated by NCG on i386:
            movl (%ebp),%eax
            movl 4(%ebp),%ecx
            movl %ecx,%edx
            orl %eax,%edx
            movl $64,%edx
            je _nAO
            bsf %ecx,%ecx
            addl $32,%ecx
            bsf %eax,%eax
            cmovne %eax,%ecx
            movl %ecx,%edx
            movl %edx,%esi
            addl $8,%ebp
            jmp *(%ebp)
      For comparision, here's what LLVM 3.4 currently generates:
        000000fc <TestClzz_ctzz64_info>:
          fc:   0f bc 45 04             bsf    0x4(%ebp),%eax
         100:   b9 20 00 00 00          mov    $0x20,%ecx
         105:   0f 45 c8                cmovne %eax,%ecx
         108:   83 c1 20                add    $0x20,%ecx
         10b:   8b 45 00                mov    0x0(%ebp),%eax
         10e:   8b 55 08                mov    0x8(%ebp),%edx
         111:   0f bc f0                bsf    %eax,%esi
         114:   85 c0                   test   %eax,%eax
         116:   0f 44 f1                cmove  %ecx,%esi
         119:   83 c5 08                add    $0x8,%ebp
         11c:   ff e2                   jmp    *%edx
      Reviewed By: austin
      Auditors: simonmar
      Differential Revision: https://phabricator.haskell.org/D163
  29. 23 Aug, 2014 1 commit
    • rwbarton's avatar
      Add MO_AddIntC, MO_SubIntC MachOps and implement in X86 backend · cfd08a99
      rwbarton authored
      These MachOps are used by addIntC# and subIntC#, which in turn are
      used in integer-gmp when adding or subtracting small Integers. The
      following benchmark shows a ~6% speedup after this commit on x86_64
      (building GHC with BuildFlavour=perf).
          {-# LANGUAGE MagicHash #-}
          import GHC.Exts
          import Criterion.Main
          count :: Int -> Integer
          count (I# n#) = go n# 0
            where go :: Int# -> Integer -> Integer
                  go 0# acc = acc
                  go n# acc = go (n# -# 1#) $! acc + 1
          main = defaultMain [bgroup "count"
                                [bench "100" $ whnf count 100]]
      Differential Revision: https://phabricator.haskell.org/D140
  30. 14 Aug, 2014 1 commit
    • Herbert Valerio Riedel's avatar
      Implement new CLZ and CTZ primops (re #9340) · e0c1767d
      Herbert Valerio Riedel authored
      This implements the new primops
        clz#, clz32#, clz64#,
        ctz#, ctz32#, ctz64#
      which provide efficient implementations of the popular
      count-leading-zero and count-trailing-zero respectively
      (see testcase for a pure Haskell reference implementation).
      On x86, NCG as well as LLVM generates code based on the BSF/BSR
      instructions (which need extra logic to make the 0-case well-defined).
      Test Plan: validate and succesful tests on i686 and amd64
      Reviewers: rwbarton, simonmar, ezyang, austin
      Subscribers: simonmar, relrod, ezyang, carter
      Differential Revision: https://phabricator.haskell.org/D144
      GHC Trac Issues: #9340
  31. 12 Aug, 2014 1 commit
    • rwbarton's avatar
      x86: zero extend the result of 16-bit popcnt instructions (#9435) · 64151913
      rwbarton authored
      The 'popcnt r16, r/m16' instruction only writes the low 16 bits of
      the destination register, so we have to zero-extend the result to
      a full word as popCnt16# is supposed to return a Word#.
      For popCnt8# we could instead zero-extend the input to 32 bits
      and then do a 32-bit popcnt, and not have to zero-extend the result.
      LLVM produces the 16-bit popcnt sequence with two zero extensions,
      though, and who am I to argue?
      Test Plan:
       - ran "make TEST=cgrun071 EXTRA_HC_OPTS=-msse42"
       - then ran again adding "WAY=optasm", and verified that
         the popcnt sequences we generate match the ones produced
         by LLVM for its @llvm.ctpop.* intrinsics
      Reviewers: austin, hvr, tibbe
      Reviewed By: austin, hvr, tibbe
      Subscribers: phaskell, hvr, simonmar, relrod, ezyang, carter
      Differential Revision: https://phabricator.haskell.org/D147
      GHC Trac Issues: #9435
  32. 11 Aug, 2014 1 commit
  33. 10 Aug, 2014 1 commit
    • rwbarton's avatar
      Eliminate some code duplication in x86 backend (genCCall32/64) · c80d2381
      rwbarton authored
      No functional changes except in panic messages.
      These functions were identical except for
      - x87 operations in genCCall32
      - the fallback to genCCall32'/64'
      - "32" vs "64" in panic messages (one case was wrong!)
      - minor syntactic or otherwise non-functional differences.
      Test Plan:
      Ran "validate --no-dph --slow" before and after the change.
      Only differences were two tests that failed before the change but not after,
      further investigation revealed that those tests are in fact erratic.
      Reviewers: simonmar, austin
      Reviewed By: austin
      Subscribers: phaskell, simonmar, relrod, ezyang, carter
      Differential Revision: https://phabricator.haskell.org/D139
  34. 23 Jul, 2014 2 commits
  35. 21 Jul, 2014 1 commit
    • Edward Z. Yang's avatar
      Rename PackageId to PackageKey, distinguishing it from Cabal's PackageId. · 4bebab25
      Edward Z. Yang authored
      Previously, both Cabal and GHC defined the type PackageId, and we expected
      them to be roughly equivalent (but represented differently).  This refactoring
      separates these two notions.
      A package ID is a user-visible identifier; it's the thing you write in a
      Cabal file, e.g. containers-0.9.  The components of this ID are semantically
      meaningful, and decompose into a package name and a package vrsion.
      A package key is an opaque identifier used by GHC to generate linking symbols.
      Presently, it just consists of a package name and a package version, but
      pursuant to #9265 we are planning to extend it to record other information.
      Within a single executable, it uniquely identifies a package.  It is *not* an
      InstalledPackageId, as the choice of a package key affects the ABI of a package
      (whereas an InstalledPackageId is computed after compilation.)  Cabal computes
      a package key for the package and passes it to GHC using -package-name (now
      *extremely* misnamed).
      As an added bonus, we don't have to worry about shadowing anymore.
      As a follow on, we should introduce -current-package-key having the same role as
      -package-name, and deprecate the old flag.  This commit is just renaming.
      The haddock submodule needed to be updated.
      Signed-off-by: default avatarEdward Z. Yang <ezyang@cs.stanford.edu>
      Test Plan: validate
      Reviewers: simonpj, simonmar, hvr, austin
      Subscribers: simonmar, relrod, carter
      Differential Revision: https://phabricator.haskell.org/D79
  36. 30 Jun, 2014 1 commit
    • tibbe's avatar
      Re-add more primops for atomic ops on byte arrays · 4ee4ab01
      tibbe authored
      This is the second attempt to add this functionality. The first
      attempt was reverted in 950fcae4, due
      to register allocator failure on x86. Given how the register
      allocator currently works, we don't have enough registers on x86 to
      support cmpxchg using complicated addressing modes. Instead we fall
      back to a simpler addressing mode on x86.
      Adds the following primops:
       * atomicReadIntArray#
       * atomicWriteIntArray#
       * fetchSubIntArray#
       * fetchOrIntArray#
       * fetchXorIntArray#
       * fetchAndIntArray#
      Makes these pre-existing out-of-line primops inline:
       * fetchAddIntArray#
       * casIntArray#